Re: [VOTE] SPIP: Asynchronous Metadata Resolution & Lazy Prefetching (Phase 1)

Devin Petersohn via dev Tue, 17 Feb 2026 10:40:16 -0800

+1 (non-binding). We've encountered the patterns described here repeatedly
in user workflows, and this proposal will be a big step forward in the
Spark Connect user experience.


On Tue, Feb 17, 2026 at 12:07 PM Mich Talebzadeh <[email protected]>
wrote:

> +1 from me
>
> Dr Mich Talebzadeh,
> Data Scientist | Distributed Systems (Spark) | Financial Forensics &
> Metadata Analytics | Transaction Reconstruction | Audit & Evidence-Based
> Analytics
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
>
>
> On Tue, 17 Feb 2026 at 17:54, Holden Karau <[email protected]> wrote:
>
>> +1, this fixes a key performance regression between regular Spark and
>> Spark connect. In talking with some users they ended up having to implement
>> their own caching to work around the death by 1k RPC issue called out here.
>>
>> Twitter: https://twitter.com/holdenkarau
>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>> <https://www.fighthealthinsurance.com/?q=hk_email>
>> Books (Learning Spark, High Performance Spark, etc.):
>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>> Pronouns: she/her
>>
>>
>> On Tue, Feb 17, 2026 at 8:28 AM vaquar khan <[email protected]>
>> wrote:
>>
>>> Hi Spark devs,
>>>
>>> I would like to call for a vote on the SPIP: Asynchronous Metadata
>>> Resolution & Lazy Prefetching for Spark Connect (Phase 1: Client-Side
>>> Plan-ID Caching).
>>>
>>> *Summary*:
>>> This proposal addresses the critical "Death by 1000 RPCs" performance
>>> regression in Spark Connect. Currently, interactive workloads suffer from
>>> blocking network latency during metadata resolution. The proposal
>>> introduces a Client-Side Plan-ID Cache to eliminate redundant RPCs for
>>> deterministic plan structures (e.g., select, withColumn), significantly
>>> improving interactive performance.
>>>
>>> *Scope*:
>>> Based on the discussion feedback (special thanks to Herman, Erik,
>>> Ruifeng, and Holden), this SPIP has been narrowed to Phase 1 only, focusing
>>> strictly on the caching infrastructure and excluding the broader
>>> asynchronous API changes for now.
>>> *Links*:
>>>
>>> *SPIP *Doc:
>>> https://docs.google.com/document/d/1xTvL5YWnHu1jfXvjlKk2KeSv8JJC08dsD7mdbjjo9YE/edit?usp=sharing
>>>
>>> *JIRA*: https://issues.apache.org/jira/browse/SPARK-55163
>>>
>>> *Discussion Thread*:
>>> https://lists.apache.org/thread/wxj8mtopvm8bt959l58drzd4p90p6vn1
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because...
>>>
>>>
>>> Regards,
>>> Vaquar Khan
>>> *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/
>>> *Book *-
>>> https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true
>>> *GitBook*-
>>> https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/
>>> *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan
>>> *github*-https://github.com/vaquarkhan
>>>
>>

Re: [VOTE] SPIP: Asynchronous Metadata Resolution & Lazy Prefetching (Phase 1)

Reply via email to