Hi all,

Thank you for your interest in the SPIP documentation and for all your
comments so far.
I plan to hold a vote on SPIP next week.

Thanks,
Kousuke

2026年6月9日(火) 2:18 Parth Chandra <[email protected]>:

> Thanks Kousuke. Let's proceed independently. I'll get something ready
> while we wait for the SPIP review.
>
>
> Parth
>
> On Sat, Jun 6, 2026 at 9:38 AM Kousuke Saruta <[email protected]>
> wrote:
>
>> Hi Parth,
>>
>> Thank you for the thoughtful response. I think the incremental approach
>> (your path1) might be feasible. Our proposals are complementary and
>> independent. They address different problems and can proceed in parallel
>> without blocking each other.
>> Your DirectTokenProvider unblocks non-Kerberos credential providers in
>> the existing HadoopDelegationTokenManager mechanism. This solves the
>> immediate gate problem for environments where pod-level identity is
>> unavailable or insufficient.
>> My SPIP introduces per-user/per-session identity propagation with a
>> separate manager and RPC, targeting the case where executors need
>> credentials derived from a user identity they cannot obtain themselves.
>> Neither depends on the other landing first. They share no code paths
>> (Different manager, different RPC, and different SPI).
>>
>> Regarding the 2022 review feedback (sorry, I didn't know about that), the
>> constraints have shifted since then. Per-user identity propagation and
>> Spark Connect multi-tenancy require expressing per-session identity, but
>> Spark's current use of UGI is process-wide so per-session scoping would
>> require fundamental changes to how Spark interacts with UGI. In the
>> Appendix C in my SPIP doc, the rejection of UGI reflects these new
>> requirements.
>> Regarding binary payloads in ServiceCredential, for the initial
>> implementation, Base64 encoding within Map[String, String] is sufficient
>> for the S3A use case. Since the SPI is annotated @DeveloperApi, we can add
>> a byte[] field or richer payload type in a future release if concrete
>> integrations require binary credentials. I'd prefer to keep the initial
>> surface small and evolve based on real demand.
>>
>> Best,
>> Kousuke
>>
>> 2026年6月6日(土) 2:40 Parth Chandra <[email protected]>:
>>
>>>   Subject: Re: [DISCUSS] SPIP: OIDC Credential Propagation
>>>
>>>   Hi Kousuke,
>>>
>>>   Thanks for putting this together. As the author of the
>>> original SPARK-38954 [3] and PR #37558 [4], I'm glad to see this problem
>>> getting formal attention — it's been a real gap for cloud-native Spark
>>> deployments.
>>>
>>>   I think we're aligned on the problem but differ on scope. Your
>>> proposal addresses identity-aware credential propagation (per-user
>>> authorization, audit trails, Spark Connect multi-tenancy). That's a
>>> compelling long-term direction. The problem I was trying to solve in PR
>>> #37558 [4] is narrower: enable non-Kerberos credential providers to
>>> participate in the existing distribution mechanism, which is already
>>> provider-agnostic (as the Kafka provider demonstrates) but gated on
>>> Kerberos activation.
>>>
>>>   After the review feedback on PR #37558 [4] — specifically the
>>> direction that we should use a single auth-agnostic manager and UGI as the
>>> container — I've been working on a minimal approach SPARK-27252 [1][2]: a
>>> DirectTokenProvider sub-trait of the existing
>>> HadoopDelegationTokenProvider, with routing logic inside the existing
>>> HadoopDelegationTokenManager to call direct providers without doAs(). This
>>> requires ~80 lines of changes to existing
>>>   code, no new manager, no new RPC message, and no new credential store.
>>> It follows the review feedback from PR #37558 [4] exactly.
>>>
>>>   I see two paths forward and am happy with either:
>>>
>>>   1. *Incremental*: The minimal DirectTokenProvider change SPARK-27252
>>> [1][2] lands first, unblocking the immediate use case (driver-mediated
>>> credential refresh without Kerberos). Your UserCredentialManager and
>>> identity-aware  architecture can then build on top — or alongside — when
>>> the broader scope (Spark Connect, per-user identity, multi-cloud) is ready.
>>> The two aren't mutually exclusive.
>>>   2. *Unified*: If the community prefers to solve the full identity
>>> propagation problem in one shot, I'd be glad to collaborate on your
>>> proposal. In that case I'd suggest we address the relationship to the 2022
>>> review feedback explicitly — specifically the preference for a single
>>> manager and UGI as a container. Your Appendix C rejects that direction; it
>>> would strengthen the proposal to explain why the constraints have changed
>>> (Spark Connect multi-tenancy, per-user identity requirements that UGI
>>> cannot express).
>>>
>>>   One technical observation: your proposal's
>>> CredentialProvider.resolve() returns a ServiceCredential with Map[String,
>>> String] properties. For the S3A case this works well (access key, secret
>>> key, session token are strings). But some credential systems return binary
>>> payloads (signed SAML assertions, serialized protobuf tokens). Worth
>>> considering whether Map[String, byte[]] or an opaque byte[] field alongside
>>> the properties map would future-proof the SPI.
>>>
>>>   Happy to discuss further.
>>>
>>>   Best,
>>>   Parth
>>>
>>>   [1] https://issues.apache.org/jira/browse/SPARK-57252
>>>   [2]
>>> https://docs.google.com/document/d/1PPqAoJAj48MdjMJNc7DlytXi745z-imFpVaFDnt18Xg/edit?tab=t.0#heading=h.21tncge82jbl
>>>   [3] https://issues.apache.org/jira/browse/SPARK-38954
>>>   [4] https://github.com/apache/spark/pull/37558
>>>
>>>>

Reply via email to