Hi All, While I think it is great that we are trying to address this issue in Connect, I have concerns about the current proposal (see the comments in the doc). I would like to discuss this more in detail before proceeding. Given that this is an official vote, I will cast a -1 for now.
Cheers, Herman On Tue, Feb 17, 2026 at 2:39 PM Devin Petersohn via dev < [email protected]> wrote: > +1 (non-binding). We've encountered the patterns described here repeatedly > in user workflows, and this proposal will be a big step forward in the > Spark Connect user experience. > > On Tue, Feb 17, 2026 at 12:07 PM Mich Talebzadeh < > [email protected]> wrote: > >> +1 from me >> >> Dr Mich Talebzadeh, >> Data Scientist | Distributed Systems (Spark) | Financial Forensics & >> Metadata Analytics | Transaction Reconstruction | Audit & Evidence-Based >> Analytics >> >> view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> >> >> >> On Tue, 17 Feb 2026 at 17:54, Holden Karau <[email protected]> >> wrote: >> >>> +1, this fixes a key performance regression between regular Spark and >>> Spark connect. In talking with some users they ended up having to implement >>> their own caching to work around the death by 1k RPC issue called out here. >>> >>> Twitter: https://twitter.com/holdenkarau >>> Fight Health Insurance: https://www.fighthealthinsurance.com/ >>> <https://www.fighthealthinsurance.com/?q=hk_email> >>> Books (Learning Spark, High Performance Spark, etc.): >>> https://amzn.to/2MaRAG9 <https://amzn.to/2MaRAG9> >>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau >>> Pronouns: she/her >>> >>> >>> On Tue, Feb 17, 2026 at 8:28 AM vaquar khan <[email protected]> >>> wrote: >>> >>>> Hi Spark devs, >>>> >>>> I would like to call for a vote on the SPIP: Asynchronous Metadata >>>> Resolution & Lazy Prefetching for Spark Connect (Phase 1: Client-Side >>>> Plan-ID Caching). >>>> >>>> *Summary*: >>>> This proposal addresses the critical "Death by 1000 RPCs" performance >>>> regression in Spark Connect. Currently, interactive workloads suffer from >>>> blocking network latency during metadata resolution. The proposal >>>> introduces a Client-Side Plan-ID Cache to eliminate redundant RPCs for >>>> deterministic plan structures (e.g., select, withColumn), significantly >>>> improving interactive performance. >>>> >>>> *Scope*: >>>> Based on the discussion feedback (special thanks to Herman, Erik, >>>> Ruifeng, and Holden), this SPIP has been narrowed to Phase 1 only, focusing >>>> strictly on the caching infrastructure and excluding the broader >>>> asynchronous API changes for now. >>>> *Links*: >>>> >>>> *SPIP *Doc: >>>> https://docs.google.com/document/d/1xTvL5YWnHu1jfXvjlKk2KeSv8JJC08dsD7mdbjjo9YE/edit?usp=sharing >>>> >>>> *JIRA*: https://issues.apache.org/jira/browse/SPARK-55163 >>>> >>>> *Discussion Thread*: >>>> https://lists.apache.org/thread/wxj8mtopvm8bt959l58drzd4p90p6vn1 >>>> >>>> Please vote on the SPIP for the next 72 hours: >>>> >>>> [ ] +1: Accept the proposal as an official SPIP >>>> [ ] +0 >>>> [ ] -1: I don’t think this is a good idea because... >>>> >>>> >>>> Regards, >>>> Vaquar Khan >>>> *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/ >>>> *Book *- >>>> https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true >>>> *GitBook*- >>>> https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/ >>>> *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan >>>> *github*-https://github.com/vaquarkhan >>>> >>>
