Respected Sir,

  1.  With reference to the previous mails.



  1.  Sir, apart from strong fundamentals of vector DB’s, python fundamentals, 
Beam docs, writing sink, is there anything much important topic to be 
covered/learnt other than these as part of project prerequisites?


  1.  Sir, I also wanted to ask, what all other topics have to be covered in 
python other than the main code language as a part of project prerequisites.



  1.  Sir, as the GSOC – 2025 organization list have been released, as well as 
the project list (for GSOC 2025) has been released. As I’ am interested in this 
project and you are the potential mentor for it, if you could please tell me 
which mode of communication would be better - either slack or through mailing 
lists? I’ am asking this because I would want to seek multiple helps when 
needed, when I’ am understanding the project/codebases, as It’s a new concept 
and environment for me. Also continuous mails won’t be appealing. Whatever you 
agree upon sir, we can follow it upon sir.

Best Regards,
Thanking you,
Siddharth Salian

From: SIDDHARTH SALIAN <[email protected]>
Date: Friday, 21 February 2025 at 12:59 AM
To: [email protected] <[email protected]>
Subject: Re: Regarding the GSOC 2025 Project
Hello Sir,
Thank you for the email. I have understood.

Thanks,
Siddharth Salian

From: Danny McCormick via user <[email protected]>
Date: Thursday, 20 February 2025 at 9:51 PM
To: [email protected] <[email protected]>
Cc: Danny McCormick <[email protected]>
Subject: Re: Regarding the GSOC 2025 Project
> Sir, as you have mentioned in the mail, Python is must for this project, I 
> just wanted to ask, what about Java and Golang SDK applications, I mean I 
> know it’s an AI/ML pipeline based project, but if you could tell me it would 
> add to my clarity.

I would expect this project to pretty much exclusively be in Python. The only 
exception is if some vector DB or feature store only offers a Go or Java client 
(but this seems unlikely)

> Sir, I wanted to also ask, as Retrieval Augmented Generation(RAG) has a close 
> relation with this project, don’t you think RAG is still limited to capturing 
> historical data, or it has capability of capturing latest/modern data’s too?

I'm not sure I understand the question, but I can try to give an overview of 
how I think Beam and RAG work together. Basically, I think Beam can be used to:


  1.  Ingest data -> generate embeddings -> write to a vector DB. This can 
include very recent data, it just depends on how you configure your source 
(e.g. you could ingest Data continuously with PubSub or Kafka)
  2.  Ingest incoming query -> enrich with embedding data from a vector DB -> 
perform inference with the additional relevant context -> write result somewhere
So I think this can handle reasonably tight data freshness requirements.

On Tue, Feb 18, 2025 at 11:01 AM SIDDHARTH SALIAN 
<[email protected]<mailto:[email protected]>> wrote:
Respected Sir,


  1.  Thank you for the email. With the reference to the previous mail , I have 
understood all the points and I shall also go through the I/O page in the 
documentation page as well as vector DB’s, features.



  1.  Sir, as you have mentioned in the mail, Python is must for this project, 
I just wanted to ask, what about Java and Golang SDK applications, I mean I 
know it’s an AI/ML pipeline based project, but if you could tell me it would 
add to my clarity.


  1.  Sir, I wanted to also ask, as Retrieval Augmented Generation(RAG) has a 
close relation with this project, don’t you think RAG is still limited to 
capturing historical data, or it has capability of capturing latest/modern 
data’s too?


Best regards,
Thanking you,
Siddharth Salian

From: Danny McCormick via user 
<[email protected]<mailto:[email protected]>>
Date: Tuesday, 18 February 2025 at 8:36 PM
To: [email protected]<mailto:[email protected]> 
<[email protected]<mailto:[email protected]>>
Cc: Danny McCormick 
<[email protected]<mailto:[email protected]>>
Subject: Re: Regarding the GSOC 2025 Project
Hey Siddharth, thanks for reaching out. I'm glad you're interested in the 
project. In general, I would expect there to be more details about projects 
once we know which ones have been accepted.

> Sir, if you could tell me the pre-required knowledge (such as major 
> programming languages used, etc., ) for this project, it would bring more 
> clarity to me sir.

I would expect it to be primarily done in Python, though it depends what 
connectors are available for each vector DB/feature store. Other than that, the 
main things you'd want to learn about are Beam itself, especially about how to 
write a sink (IO 
standards<https://beam.apache.org/documentation/io/io-standards> can help 
here), and also high level how vector DBs and feature stores work.

Thanks,
Danny



On Thu, Feb 13, 2025 at 10:55 PM SIDDHARTH SALIAN 
<[email protected]<mailto:[email protected]>> wrote:
Hello Sir,


  1.  My intention of writing this email is with reference to the GSOC 2025 
mail - https://lists.apache.org/thread/o3mwncq0k4c58c630n49l7bvhq74o2wj


  1.  I’m Siddharth Salian and I’m an undergraduate student and I’m part of 
Apache Beam and I have just joined the community. After going through the GSOC 
2025 idea list and going through the project description, I founded 
https://issues.apache.org/jira/browse/GSOC-279 this project to be interesting 
for me sir. So sir, I would like to contribute to this project in GSOC 2025, 
since AI/ML is area of my interest. Since you are the mentor, I’m letting you 
know sir.



  1.  Sir, if you could tell me the pre-required knowledge (such as major 
programming languages used, etc., ) for this project, it would bring more 
clarity to me sir.



  1.  Sir also wanted to ask is there any other project that you are thinking 
about for GSOC 2025, I would like to contribute in it sir.


Best Regards,

Thanking You
Siddharth Salian


Reply via email to