Added story about embeddings.

2025-01-20 13:46:48 +01:00 · 2025-01-20 13:46:48 +01:00 · d364d72e51
commit d364d72e51
parent cde66da983
1 changed files with 4 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -43,6 +43,10 @@ It's also possible to give `api_key` as parameter to the initiation of the `rage
 ## For free!
 But there is a small catch. It's very easy to replace Open AI with a local hosted LLM like Olama. Olama is installed within minutes using a one liner. You will figure the replacement of the URL out in 20 minutes or so. So, with a good hour + time to download your favorite Olama LLM you have a chatbot for free in a few hours wherefrom most is waiting. I recommend models above 3b of even above 7b. My personal experience with Olama LLM's is that llama models / qwen (3b+) / gemma2 works the best. Gemma2 is made by Google and is only 2b and 4gb or so. Gemma2 is probably the most for less. You can try it out with the `python -m ragent.demo_olama` command. Just kiddin'. You really have to do this small thing yourself. I don't have the right hardware to run a decent LLM so i just didn't implement it. Don't be cheap AND lazy. It's worth it.

+To get embedding of documents working will cost you some time since the VectoreStore class will not work with Ollama in anyway. An Ollama version should have his own embedding database like chromadb and has to be filled with the documents. You have to build support for every file type (pdf, doc, xlsx etc) yourself. The art is of chunking documents the right way and a bit consistent. For importing books to a local LLM I converted all files first to TXT so i can always use the same embedding method with chromadb. Also interesting is the different results you get with different chunking methods. Methods are for example: paragraph chunking, line chunking, page chunking. If your chunks are big, your LLM will become slow(er). I had the best results with paragraph chunking. It depends on your content I guess. Line chunking would be very performant in usage. When it comes to adding data, the type of chunking doesn't matter in performance / duration AFAIK.
+
+While I do not support Ollama, I will help if you need any. You can reach me at retoor@molodetz.nl.
+
 ## Costs if you use Open AI
 You can chat the whole day with the bots for just ten cents or so. See here how much it costed to test it extensively.