Re: [VOTE] SPIP: Asynchronous Metadata Resolution & Lazy Prefetching (Phase 1)

Herman van Hovell via dev Tue, 17 Feb 2026 11:17:52 -0800

Hi All,

While I think it is great that we are trying to address this issue in
Connect, I have concerns about the current proposal (see the comments in
the doc). I would like to discuss this more in detail before proceeding.
Given that this is an official vote, I will cast a -1 for now.


Cheers,
Herman

On Tue, Feb 17, 2026 at 2:39 PM Devin Petersohn via dev <
[email protected]> wrote:

> +1 (non-binding). We've encountered the patterns described here repeatedly
> in user workflows, and this proposal will be a big step forward in the
> Spark Connect user experience.
>
> On Tue, Feb 17, 2026 at 12:07 PM Mich Talebzadeh <
> [email protected]> wrote:
>
>> +1 from me
>>
>> Dr Mich Talebzadeh,
>> Data Scientist | Distributed Systems (Spark) | Financial Forensics &
>> Metadata Analytics | Transaction Reconstruction | Audit & Evidence-Based
>> Analytics
>>
>>    view my Linkedin profile
>> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>>
>>
>>
>>
>>
>> On Tue, 17 Feb 2026 at 17:54, Holden Karau <[email protected]>
>> wrote:
>>
>>> +1, this fixes a key performance regression between regular Spark and
>>> Spark connect. In talking with some users they ended up having to implement
>>> their own caching to work around the death by 1k RPC issue called out here.
>>>
>>> Twitter: https://twitter.com/holdenkarau
>>> Fight Health Insurance: https://www.fighthealthinsurance.com/
>>> <https://www.fighthealthinsurance.com/?q=hk_email>
>>> Books (Learning Spark, High Performance Spark, etc.):
>>> https://amzn.to/2MaRAG9  <https://amzn.to/2MaRAG9>
>>> YouTube Live Streams: https://www.youtube.com/user/holdenkarau
>>> Pronouns: she/her
>>>
>>>
>>> On Tue, Feb 17, 2026 at 8:28 AM vaquar khan <[email protected]>
>>> wrote:
>>>
>>>> Hi Spark devs,
>>>>
>>>> I would like to call for a vote on the SPIP: Asynchronous Metadata
>>>> Resolution & Lazy Prefetching for Spark Connect (Phase 1: Client-Side
>>>> Plan-ID Caching).
>>>>
>>>> *Summary*:
>>>> This proposal addresses the critical "Death by 1000 RPCs" performance
>>>> regression in Spark Connect. Currently, interactive workloads suffer from
>>>> blocking network latency during metadata resolution. The proposal
>>>> introduces a Client-Side Plan-ID Cache to eliminate redundant RPCs for
>>>> deterministic plan structures (e.g., select, withColumn), significantly
>>>> improving interactive performance.
>>>>
>>>> *Scope*:
>>>> Based on the discussion feedback (special thanks to Herman, Erik,
>>>> Ruifeng, and Holden), this SPIP has been narrowed to Phase 1 only, focusing
>>>> strictly on the caching infrastructure and excluding the broader
>>>> asynchronous API changes for now.
>>>> *Links*:
>>>>
>>>> *SPIP *Doc:
>>>> https://docs.google.com/document/d/1xTvL5YWnHu1jfXvjlKk2KeSv8JJC08dsD7mdbjjo9YE/edit?usp=sharing
>>>>
>>>> *JIRA*: https://issues.apache.org/jira/browse/SPARK-55163
>>>>
>>>> *Discussion Thread*:
>>>> https://lists.apache.org/thread/wxj8mtopvm8bt959l58drzd4p90p6vn1
>>>>
>>>> Please vote on the SPIP for the next 72 hours:
>>>>
>>>> [ ] +1: Accept the proposal as an official SPIP
>>>> [ ] +0
>>>> [ ] -1: I don’t think this is a good idea because...
>>>>
>>>>
>>>> Regards,
>>>> Vaquar Khan
>>>> *Linkedin *-https://www.linkedin.com/in/vaquar-khan-b695577/
>>>> *Book *-
>>>> https://us.amazon.com/stores/Vaquar-Khan/author/B0DMJCG9W6?ref=ap_rdr&shoppingPortalEnabled=true
>>>> *GitBook*-
>>>> https://vaquarkhan.github.io/microservices-recipes-a-free-gitbook/
>>>> *Stack *-https://stackoverflow.com/users/4812170/vaquar-khan
>>>> *github*-https://github.com/vaquarkhan
>>>>
>>>

Re: [VOTE] SPIP: Asynchronous Metadata Resolution & Lazy Prefetching (Phase 1)

Reply via email to