Hi all, Thank you for your interest in the SPIP documentation and for all your comments so far. I plan to hold a vote on SPIP next week.
Thanks, Kousuke 2026年6月9日(火) 2:18 Parth Chandra <[email protected]>: > Thanks Kousuke. Let's proceed independently. I'll get something ready > while we wait for the SPIP review. > > > Parth > > On Sat, Jun 6, 2026 at 9:38 AM Kousuke Saruta <[email protected]> > wrote: > >> Hi Parth, >> >> Thank you for the thoughtful response. I think the incremental approach >> (your path1) might be feasible. Our proposals are complementary and >> independent. They address different problems and can proceed in parallel >> without blocking each other. >> Your DirectTokenProvider unblocks non-Kerberos credential providers in >> the existing HadoopDelegationTokenManager mechanism. This solves the >> immediate gate problem for environments where pod-level identity is >> unavailable or insufficient. >> My SPIP introduces per-user/per-session identity propagation with a >> separate manager and RPC, targeting the case where executors need >> credentials derived from a user identity they cannot obtain themselves. >> Neither depends on the other landing first. They share no code paths >> (Different manager, different RPC, and different SPI). >> >> Regarding the 2022 review feedback (sorry, I didn't know about that), the >> constraints have shifted since then. Per-user identity propagation and >> Spark Connect multi-tenancy require expressing per-session identity, but >> Spark's current use of UGI is process-wide so per-session scoping would >> require fundamental changes to how Spark interacts with UGI. In the >> Appendix C in my SPIP doc, the rejection of UGI reflects these new >> requirements. >> Regarding binary payloads in ServiceCredential, for the initial >> implementation, Base64 encoding within Map[String, String] is sufficient >> for the S3A use case. Since the SPI is annotated @DeveloperApi, we can add >> a byte[] field or richer payload type in a future release if concrete >> integrations require binary credentials. I'd prefer to keep the initial >> surface small and evolve based on real demand. >> >> Best, >> Kousuke >> >> 2026年6月6日(土) 2:40 Parth Chandra <[email protected]>: >> >>> Subject: Re: [DISCUSS] SPIP: OIDC Credential Propagation >>> >>> Hi Kousuke, >>> >>> Thanks for putting this together. As the author of the >>> original SPARK-38954 [3] and PR #37558 [4], I'm glad to see this problem >>> getting formal attention — it's been a real gap for cloud-native Spark >>> deployments. >>> >>> I think we're aligned on the problem but differ on scope. Your >>> proposal addresses identity-aware credential propagation (per-user >>> authorization, audit trails, Spark Connect multi-tenancy). That's a >>> compelling long-term direction. The problem I was trying to solve in PR >>> #37558 [4] is narrower: enable non-Kerberos credential providers to >>> participate in the existing distribution mechanism, which is already >>> provider-agnostic (as the Kafka provider demonstrates) but gated on >>> Kerberos activation. >>> >>> After the review feedback on PR #37558 [4] — specifically the >>> direction that we should use a single auth-agnostic manager and UGI as the >>> container — I've been working on a minimal approach SPARK-27252 [1][2]: a >>> DirectTokenProvider sub-trait of the existing >>> HadoopDelegationTokenProvider, with routing logic inside the existing >>> HadoopDelegationTokenManager to call direct providers without doAs(). This >>> requires ~80 lines of changes to existing >>> code, no new manager, no new RPC message, and no new credential store. >>> It follows the review feedback from PR #37558 [4] exactly. >>> >>> I see two paths forward and am happy with either: >>> >>> 1. *Incremental*: The minimal DirectTokenProvider change SPARK-27252 >>> [1][2] lands first, unblocking the immediate use case (driver-mediated >>> credential refresh without Kerberos). Your UserCredentialManager and >>> identity-aware architecture can then build on top — or alongside — when >>> the broader scope (Spark Connect, per-user identity, multi-cloud) is ready. >>> The two aren't mutually exclusive. >>> 2. *Unified*: If the community prefers to solve the full identity >>> propagation problem in one shot, I'd be glad to collaborate on your >>> proposal. In that case I'd suggest we address the relationship to the 2022 >>> review feedback explicitly — specifically the preference for a single >>> manager and UGI as a container. Your Appendix C rejects that direction; it >>> would strengthen the proposal to explain why the constraints have changed >>> (Spark Connect multi-tenancy, per-user identity requirements that UGI >>> cannot express). >>> >>> One technical observation: your proposal's >>> CredentialProvider.resolve() returns a ServiceCredential with Map[String, >>> String] properties. For the S3A case this works well (access key, secret >>> key, session token are strings). But some credential systems return binary >>> payloads (signed SAML assertions, serialized protobuf tokens). Worth >>> considering whether Map[String, byte[]] or an opaque byte[] field alongside >>> the properties map would future-proof the SPI. >>> >>> Happy to discuss further. >>> >>> Best, >>> Parth >>> >>> [1] https://issues.apache.org/jira/browse/SPARK-57252 >>> [2] >>> https://docs.google.com/document/d/1PPqAoJAj48MdjMJNc7DlytXi745z-imFpVaFDnt18Xg/edit?tab=t.0#heading=h.21tncge82jbl >>> [3] https://issues.apache.org/jira/browse/SPARK-38954 >>> [4] https://github.com/apache/spark/pull/37558 >>> >>>>
