I and other LibraryThing developers have done a lot of this work in the process of making Talpa.ai, so here's my quick take:
1. Fine-tuning is a poor way to add knowledge to an LLM, especially at scale. It's mostly useful for controlling how LLM "thinking" is presented—for example ensuring clean, standardized output. It can also be helpful at reducing how many input tokens you need to use, and speed up the results. This is our experience; yours may be different. But it's at least a common view. (See https://www.reddit.com/r/LocalLLaMA/comments/16q13lm/this_research_may_explain_why_finetuning_doesnt/ .) 2. RAG is more liable to get your results. It's good at validation and when the model has no clue about some facts. So, for example, if you want to use proprietary content to answer a query, you can use a vectorized search to find content, then feed them to an LLM (which is all RAG is) and see what happens. You can fine-tune the model you use for RAG to ensure the output is clean and standard. RAG can be cheap, but it tends to involve making very long prompts, so if you're using a commercial service, you'll want to think about the cost of input tokens. Although cheaper than output tokens, they add up fast! Anyway, RAG is probably what you want, but the way people throw around RAG now you'd think it was some fantastic new idea that transcends the limitations of LLMs. It's really not. RAG is just giving LLMs some of what you want them to think about, and hoping they think through it well. You still need to feed it the right data, and just because you give it something to think about doesn't mean it will think through it well. If LLMs are "unlimited, free stupid people" they are in effect "unlimited, free stupid people in possession of the text I found." You can find a deeper critique of RAG by Gary Marcus here: https://garymarcus.substack.com/p/no-rag-is-probably-not-going-to-rescue I'm eager to hear how things go! I would, of course, be grateful for any feedback on Talpa ( https://www.talpa.ai), which is in active development with a new version due any day now. It also uses a third technique, which probably has a name. That technique is using LLMs not for their knowledge or for RAG, but to parse user queries in such a way that they can be answered by library data systems, not LLMs. LLMs can parse language incorrectly, but language is their greatest strength and, unlike facts and interpretations, seldom involves hallucinations. Then we use real, authoritative library and book data, which has no hallucination problem. Best, Tim On Mon, Feb 26, 2024 at 4:07 PM Eric Lease Morgan < 00000107b9c961ae-dmarc-requ...@lists.clir.org> wrote: > Who out here in Code4Lib Land is practicing with either one or both of the > following things: 1) fine-tuning large-language models, or 2) > retrieval-augmented generation (RAG). If there is somebody out there, then > I'd love to chat. > > When it comes to generative AI -- things like ChatGPT -- one of the first > things us librarians say is, "I don't know how I can trust those results > because I don't know from whence the content originated." Thus, if we were > create our own model, then we can trust the results. Right? Well, almost. > The things of ChatGPT are "large language models" and the creation of such > things are very expensive. They require more content than we have, more > computing horsepower than we are willing to buy, and more computing > expertise than we are willing to hire. On the other hand there is a process > called "fine-tuning", where one's own content is used to supplement an > existing large-language model, and in the end the model knows about one's > own content. I plan to experiment with this process; I plan to fine-tune an > existing large-language model and experiment with it use. > > Another approach to generative AI is called RAG -- retrieval-augmented > generation. In this scenerio, one's content is first indexed using any > number of different techniques. Next, given a query, the index is searched > for matching documents. Third, the matching documents are given as input to > the large-language model, and the model uses the documents to structure the > result -- a simple sentence, a paragraph, a few paragraphs, an outline, or > some sort of structured data (CSV, JSON, etc.). In any case, only the > content given to the model is used for analysis, and the model's primary > purpose is to structure the result. Compared to fine-tuning, RAG is > computationally dirt cheap. Like fine-tuning, I plan to experiment with RAG. > > To the best of my recollection, I have not seen very much discussion on > this list about the technological aspects of fine-tuning nor RAG. If you > are working these technologies, then I'd love to hear from you. Let's share > war stories. > > -- > Eric Morgan <emor...@nd.edu> > Navari Family Center for Digital Scholarship > University of Notre Dame > -- Check out my library at https://www.librarything.com/profile/timspalding