Re: LLM based data pre-processing

Russell Jurney Fri, 03 Jan 2025 09:01:09 -0800

Thanks! The first link is old, here is a more recent one:

1)
https://python.langchain.com/docs/integrations/providers/spark/#spark-sql-individual-tools


Russell

On Fri, Jan 3, 2025 at 8:50 AM Gurunandan <gurunandan....@gmail.com> wrote:

> HI Mayur,
> Please evaluate Langchain's Spark Dataframe Agent for your use case.
>
> documentation:
> 1) https://python.langchain.com/v0.1/docs/integrations/toolkits/spark/
> 2) https://python.langchain.com/docs/integrations/tools/spark_sql/
>
> regards,
> Guru
>
> On Fri, Jan 3, 2025 at 6:38 PM Mayur Dattatray Bhosale <ma...@sarvam.ai>
> wrote:
> >
> > Hi team,
> >
> > We are planning to use Spark for pre-processing the ML training data
> given the data is 500+ TBs.
> >
> > One of the steps in the data-preprocessing requires us to use a LLM (own
> deployment of model). I wanted to understand what is the right way to
> architect this? These are the options that I can think of:
> >
> > - Split this into multiple applications at the LLM use case step. Use a
> workflow manager to feed the output of the application-1 to LLM and feed
> the output of LLM to application 2
> > - Split this into multiple stages by writing the orchestration code of
> feeding output of the pre-LLM processing stages to externally hosted LLM
> and vice versa
> >
> > I wanted to know if within Spark there is an easier way to do this or
> any plans of having such functionality as a first class citizen of Spark in
> future? Also, please suggest any other better alternatives.
> >
> > Thanks,
> > Mayur
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: LLM based data pre-processing

Reply via email to