Re: [PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-10 Thread José Müller
single-node engines. + @Hyukjin Kwon > > > On Mon, Feb 10, 2025 at 8:40 PM José Müller > wrote: > >> Hi Mitch, >> >> All you said is well understood, but I believe you are missing the point, >> the proposal is not to break Spark ways of processing, but to use s

Re: [PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-10 Thread José Müller
I believe is > currently integrated as pyspark.pandas? > > HTH > > Dr Mich Talebzadeh, > Architect | Data Science | Financial Crime | Forensic Analysis | GDPR > >view my Linkedin profile > <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> > > >

[PROPOSAL] Unified PySpark-Pandas API to Bridge Data Engineering and ML Workflows

2025-02-10 Thread José Müller
ut can run using Pandas <https://flypipe.github.io/flypipe/html/release/4.1.0/notebooks/tutorial/multiple-node-types.html#4.-pandas_on_spark-nodes-as-pandas>, but it still requires ML teams to manage separate pipelines for Spark dependencies. I’d love to hear thoughts from the community on this idea, and *if there's a better approach to solving this issue*. Thanks, José Müller