jtuglu1 commented on PR #19236: URL: https://github.com/apache/druid/pull/19236#issuecomment-4238743315
> > Maybe a hybrid approach would work? We could introduce `scopeForUser` in core and run it at submit time. In your custom extension, rather than applying vended credentials at scope/submit time, you could use `scopeForUser` to embed the user's own credentials in the input source. We could add a `PasswordProvider` field to `IcebergInputSource` to support that. Then you could use them at runtime in the task to acquire vended credentials. > > I think this makes sense – my only concern is, for example, being able to reliably "touch-up" these inputSource fields in arbitrary specs (both on overlord and on broker) to apply scopeForUser(). For example, you can have an MSQ query that queries multiple iceberg tables and joins against a Druid table. > > In general, having the identity at the task-level also opens up ability for Druid to do auth with other resources that tasks might read from (e.g. Kafka), so generally want to lean in direction of having identity/credentials exposable at the task-level (not just seeing the end credentials for whatever is needed). > > Maybe a hybrid approach would work? We could introduce `scopeForUser` in core and run it at submit time. In your custom extension, rather than applying vended credentials at scope/submit time, you could use `scopeForUser` to embed the user's own credentials in the input source. We could add a `PasswordProvider` field to `IcebergInputSource` to support that. Then you could use them at runtime in the task to acquire vended credentials. > > I think this makes sense – my only concern is, for example, being able to reliably "touch-up" these inputSource fields in arbitrary specs (both on overlord and on broker) to apply scopeForUser(). For example, you can have an MSQ query that queries multiple iceberg tables and joins against a Druid table. > > In general, having the identity at the task-level also opens up ability for Druid to do auth with other resources that tasks might read from (e.g. Kafka), so generally want to lean in direction of having identity/credentials exposable at the task-level (not just seeing the end credentials for whatever is needed). > > Maybe a hybrid approach would work? We could introduce `scopeForUser` in core and run it at submit time. In your custom extension, rather than applying vended credentials at scope/submit time, you could use `scopeForUser` to embed the user's own credentials in the input source. We could add a `PasswordProvider` field to `IcebergInputSource` to support that. Then you could use them at runtime in the task to acquire vended credentials. > > I think this makes sense – my only concern is, for example, being able to reliably "touch-up" these inputSource fields in arbitrary specs (both on overlord and on broker) to apply scopeForUser(). For example, you can have an MSQ query that queries multiple iceberg tables and joins against a Druid table. > > In general, having the identity at the task-level also opens up ability for Druid to do auth with other resources that tasks might read from (e.g. Kafka), so generally want to lean in direction of having identity/credentials exposable at the task-level (not just seeing the end credentials for whatever is needed). Another thing to add here is I'd like to (if possible) avoid putting the burden on the caller for "injecting" this user identity. IMO, if we can always do it at the overlord level (and put the burden on the task type implementor to actually do something with a valid, provided identity), that should be more maintainable. Especially in a world with multiple input sources and IMO code related to input sources (e.g. Kafka, Iceberg, Delta) should not be involved on anything but a supervisor thread and the task process itself. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
