mbutrovich commented on PR #4335: URL: https://github.com/apache/datafusion-comet/pull/4335#issuecomment-4479810022
Wanted to flag that the conversation we had on this PR shaped a few changes I just added in #4309. None of these are Comet adopting #4335 wholesale, but each one came from a real concern you raised, and I want the trail to be visible. **REST vended creds reach the SPI.** `CometScanRule` used to run `filterStorageProperties` aggressively before stamping `catalogProperties` into the native scan op, so `credentials.uri`, OAuth tokens, and any vendor-custom keys never crossed JNI. I dropped the pre-filter and moved the storage-prefix narrowing to the native `iceberg_scan.rs::load_file_io` chokepoint where iceberg-rust's `FileIOBuilder.with_prop` actually requires it. The vendor SPI now sees the unfiltered FileIO bag through `initialize(Map)`. Closes the same gap your `allFileIOProperties` field opened, without a parallel field on `CometIcebergNativeScanMetadata`. **Multi-catalog isolation under shared FQCN.** Added `default initialize(Map<String,String> catalogProperties)` to `CometS3CredentialProvider` and changed `CometS3CredentialDispatcher` to key its instance cache by `(FQCN, dispatchKey)`. On the Iceberg path `dispatchKey` is the Spark V2 catalog name (extracted via `IcebergReflection.deriveCatalogName`, which calls `Table.name()` and intersects against `CatalogManager.listCatalogs(None)` so a stray dot can't fake a catalog name); on the Parquet path it's the bucket. Two catalogs sharing one provider FQCN now get isolated provider instances with their own `initialize` maps. There is an end-to-end test in `CometS3CredentialBridgeSuite` that configures `iso_a` and `iso_b` with the same vendor class and asserts the dispatcher created two distinct instances. **Reference REST vended-creds provider.** Added `IcebergRESTVendedS3Provider` under `spark/src/test`, ~30 lines wrapping Iceberg's `VendedCredentialsProvider`. Caching and refresh-near-expiry come from the AWS SDK's `CachedSupplier` inside `VendedCredentialsProvider`, so neither Comet nor the vendor reinvents either. Lives in test scope to keep `iceberg-aws` and AWS SDK v2 off Comet's runtime classpath; a follow-up can promote it to a runtime artifact behind an optional dep. **What I deliberately did not pull over.** No Rust-side TTL cache, no driver-side broadcast cache for the credential provider, no plan walker that throws on multi-catalog plans. Caching policy stays with the vendor (`CachedSupplier` for Iceberg-REST vendors; whatever cache fits for custom STS), distribution stays with the vendor too (read from `initialize`'s `catalogProperties`, or run a Spark broadcast inside the vendor's class), and `expirationEpochMillis` on the SPI return is metadata pass-through to `object_store`/opendal, which already have their own credential caches. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
