Dear Jenkins Developers, Contributors, and Users,

My name is Chirag Gupta, and I am honored to be a Google Summer of Code 
2025 contributor working on a project titled:“Jenkins Domain specific LLM 
based on actual Jenkins usage using ci.jenkins.io data.” 
<https://summerofcode.withgoogle.com/programs/2025/projects/oTNbvlrM>

The core motivation behind this project is to develop a specialized Large 
Language Model (LLM) tailored for Jenkins. By fine-tuning an existing open 
sourced models with data sourced from ci.jenkins.io, the aim is to create a 
tool that can significantly aid in diagnosing Jenkins failures, reducing 
troubleshooting time, and ultimately helping teams, including the Jenkins 
infrastructure team, discover ways to troubleshoot more effectively. This 
project is for the community, and its success will be greatly enhanced by 
the community's collective expertise.

The primary focus will be to deliver:

   1. 
   
   A functional fine-tuned LLM/Agent capable of assisting with common 
   Jenkins infra failure diagnosis.
   2. 
   
   A pipeline using which the model was trained so that more models can be 
   trained as newer, better base models are introduced and more datasets can 
   be added as time progresses as well.
   

The project will span the GSoC 2025 timeline, from May to September.

Some technical aspects

   1. 
   
   Data Source: Leveraging the invaluable, real-world usage data from 
   ci.jenkins.io.
   2. 
   
   Model Selection: Evaluating and fine-tuning robust base models. Current 
   candidates for consideration include Microsoft's Phi 4 and Qwen3 (14B & 8B) 
   for mid-sized models , alongside smaller models like Phi - 4 mini Instruct 
   and Qwen3 (4B & 1.7B).
   3. 
   
   Data Curation & Preparation: Implementing a thorough strategy for 
   cleaning, structuring, and preparing the ci.jenkins.io dataset, addressing 
   challenges like log noise and token inflation.
   4. 
   
   Evaluation Strategy: Employing a combination of targeted benchmarks 
   (e.g., IFEval for instruction following, SimpleQA for factual accuracy 
   adapted for Jenkins), a custom diagnostic benchmark, and qualitative human 
   evaluation.
   

Seeking Your Expertise and Feedback – A Community Effort:

To ensure this project truly benefits the Jenkins community, your input is 
crucial. In the spirit of open source, this project is being built for the 
community, and ideally, with the community. Your collective experience can 
significantly shape its development and help us create a tool that 
effectively reduces the time we all spend deciphering build failures.

I would be incredibly grateful for any suggestions, insights, or feedback 
you might have, particularly if you have experience with:

   - 
   
   Jenkins infrastructure and common failure patterns.
   - 
   
   Analyzing ci.jenkins.io logs.
   - 
   
   Specific types of failures that are consistently challenging to diagnose.
   - 
   
   Existing tools or techniques for log analysis that have proven useful.
   - 
   
   Potential data sources or specific aspects of ci.jenkins.io data that 
   would be most valuable for training.
   

Any advice, common pain points, or even specific examples of tricky build 
failures would be immensely helpful in guiding the development and 
fine-tuning of this LLM.

Thank you for your time and consideration. I look forward to your valuable 
contributions and to building a tool that can benefit the entire Jenkins 
ecosystem.

Best regards,

Chirag Gupta

GSoC 2025 Contributor@Jenkins


-- 
The information contained in this electronic communication is intended 
solely for the individual(s) or entity to which it is addressed. It may 
contain proprietary, confidential and/or legally privileged information. 
Any review, retransmission, dissemination, printing, copying or other use 
of, or taking any action in reliance on the contents of this information by 
person(s) or entities other than the intended recipient is strictly 
prohibited and may be unlawful. If you have received this communication in 
error, please notify us by responding to this email or telephone and 
immediately and permanently delete all copies of this message and any 
attachments from your system(s). The contents of this message do not 
necessarily represent the views or policies of BITS Pilani.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/jenkinsci-dev/2a7aa11c-e51e-4387-b6aa-3c93ab38ca40n%40googlegroups.com.

Reply via email to