As discussed on a thread over the weekend, we agreed among us including
Matei on  a shift towards a  more stable and version-independent APIs.
Spark Connect IMO is a key enabler of this shift, allowing users and
developers to build applications and libraries that are more resilient to
changes in Spark's internals as opposed to RDDs. *Moreover, **maintaining
backward compatibility fo*r the existing *RDD-based applications and
libraries* is crucial during this transition window so the timeframe is
another factor for consideration.

HTH

Mich Talebzadeh,
Architect | Data Science | Financial Crime | Forensic Analysis | GDPR

   view my Linkedin profile
<https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>





On Tue, 21 Jan 2025 at 22:40, Holden Karau <holden.ka...@gmail.com> wrote:

> Interesting. So given one of the features of Spark connect should be
> simpler migrations we should (in my mind) only declare it stable once we’ve
> gone through two releases where the previous client + its code can talk to
> the new server.
>
> Twitter: https://twitter.com/holdenkarau
> Fight Health Insurance: https://www.fighthealthinsurance.com/
> <https://www.fighthealthinsurance.com/?q=hk_email>
> Books (Learning Spark, High Performance Spark, etc.):
> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
> Pronouns: she/her
>
>
> On Tue, Jan 21, 2025 at 12:31 PM Dongjoon Hyun <dongj...@apache.org>
> wrote:
>
>> It seems that there is misinformation about the stability of Spark
>> Connect in Spark 4. I would like to reduce the gap in our dev mailing list.
>>
>> Frequently, some people claim `Spark Connect` is stable because it uses
>> Protobuf. Yes, we standardize the interface layer. However, may I ask if it
>> implies its implementation's stability?
>>
>> Since Apache Spark is an open source community, you can see the stability
>> of implementation in our public CI. In our CI, the PySpark Connect client
>> has been technically broken most of the time.
>>
>> 1.
>> https://github.com/apache/spark/actions/workflows/build_python_connect.yml
>> (Spark Connect Python-only in master)
>>
>> In addition, the Spark 3.5 client seems to face another difficulty
>> talking with Spark 4 server.
>>
>> 2.
>> https://github.com/apache/spark/actions/workflows/build_python_connect35.yml
>> (Spark Connect Python-only:master-server, 35-client)
>>
>> 3. What about the stability and the feature parities in different
>> languages? Do they work well with Apache Spark 4? I'm wondering if there is
>> any clue for the Apache Spark community to do assessment?
>>
>> Given (1), (2), and (3), how can we make sure that `Spark Connect` is
>> stable or ready in Spark 4? From my perspective, this is still actively
>> under development with an open end.
>>
>> The bottom line is `Spark Connect` needs more community love in order to
>> be claimed as Stable in Apache Spark 4. I'm looking forward to seeing the
>> healthy Spark Connect CI in Spark 4. Until then, let's clarify what is
>> stable in `Spark Connect` and what is not yet.
>>
>> Best Regards,
>> Dongjoon.
>>
>> PS.
>> This is a seperate thread from the previous flakiness issues.
>> https://lists.apache.org/thread/r5dzdr3w4ly0dr99k24mqvld06r4mzmq
>> ([FYI] Known `Spark Connect` Test Suite Flakiness)
>>
>

Reply via email to