Eric - it sounds like we may be at about the same point: I am wanting to start 
working in the area of fine-tuning, specifically focusing on Chat-GPT generated 
data management plans that would then be revised by experts and used as a 
fine-tuning data corpus for (hopefully) improving the draft DMP language 
provided by Chat-GPT. This is part of a broader experimentation with DMP 
generation prompts derived from machine-readable DMP content.

Thanks,
Karl

Karl Benedict
Director of Research Data Services/ Director of IT
College of University Libraries and Learning Sciences
University of New Mexico

Office: Centennial Science and Engineering Library, Room L173

Make an Appointment: 
https://outlook.office365.com/owa/calendar/karlbened...@unmm.onmicrosoft.com/bookings/

On 26 Feb 2024, at 14:05, Eric Lease Morgan wrote:

> [You don't often get email from 
> 00000107b9c961ae-dmarc-requ...@lists.clir.org. Learn why this is important at 
> https://aka.ms/LearnAboutSenderIdentification ]
>
>   [EXTERNAL]
>
> Who out here in Code4Lib Land is practicing with either one or both of the 
> following things: 1) fine-tuning large-language models, or 2) 
> retrieval-augmented generation (RAG). If there is somebody out there, then 
> I'd love to chat.
>
> When it comes to generative AI -- things like ChatGPT -- one of the first 
> things us librarians say is, "I don't know how I can trust those results 
> because I don't know from whence the content originated." Thus, if we were 
> create our own model, then we can trust the results. Right? Well, almost. The 
> things of ChatGPT are "large language models" and the creation of such things 
> are very expensive. They require more content than we have, more computing 
> horsepower than we are willing to buy, and more computing expertise than we 
> are willing to hire. On the other hand there is a process called 
> "fine-tuning", where one's own content is used to supplement an existing 
> large-language model, and in the end the model knows about one's own content. 
> I plan to experiment with this process; I plan to fine-tune an existing 
> large-language model and experiment with it use.
>
> Another approach to generative AI is called RAG -- retrieval-augmented 
> generation. In this scenerio, one's content is first indexed using any number 
> of different techniques. Next, given a query, the index is searched for 
> matching documents. Third, the matching documents are given as input to the 
> large-language model, and the model uses the documents to structure the 
> result -- a simple sentence, a paragraph, a few paragraphs, an outline, or 
> some sort of structured data (CSV, JSON, etc.). In any case, only the content 
> given to the model is used for analysis, and the model's primary purpose is 
> to structure the result. Compared to fine-tuning, RAG is computationally dirt 
> cheap. Like fine-tuning, I plan to experiment with RAG.
>
> To the best of my recollection, I have not seen very much discussion on this 
> list about the technological aspects of fine-tuning nor RAG. If you are 
> working these technologies, then I'd love to hear from you. Let's share war 
> stories.
>
> --
> Eric Morgan <emor...@nd.edu>
> Navari Family Center for Digital Scholarship
> University of Notre Dame

Reply via email to