mbutrovich commented on PR #4335:
URL: 
https://github.com/apache/datafusion-comet/pull/4335#issuecomment-4479810022

   Wanted to flag that the conversation we had on this PR shaped a few changes 
I just added in #4309. None of these are Comet adopting #4335 wholesale, but 
each one came from a real concern you raised, and I want the trail to be 
visible.
   
   **REST vended creds reach the SPI.** `CometScanRule` used to run 
`filterStorageProperties` aggressively before stamping `catalogProperties` into 
the native scan op, so `credentials.uri`, OAuth tokens, and any vendor-custom 
keys never crossed JNI. I dropped the pre-filter and moved the storage-prefix 
narrowing to the native `iceberg_scan.rs::load_file_io` chokepoint where 
iceberg-rust's `FileIOBuilder.with_prop` actually requires it. The vendor SPI 
now sees the unfiltered FileIO bag through `initialize(Map)`. Closes the same 
gap your `allFileIOProperties` field opened, without a parallel field on 
`CometIcebergNativeScanMetadata`.
   
   **Multi-catalog isolation under shared FQCN.** Added `default 
initialize(Map<String,String> catalogProperties)` to 
`CometS3CredentialProvider` and changed `CometS3CredentialDispatcher` to key 
its instance cache by `(FQCN, dispatchKey)`. On the Iceberg path `dispatchKey` 
is the Spark V2 catalog name (extracted via 
`IcebergReflection.deriveCatalogName`, which calls `Table.name()` and 
intersects against `CatalogManager.listCatalogs(None)` so a stray dot can't 
fake a catalog name); on the Parquet path it's the bucket. Two catalogs sharing 
one provider FQCN now get isolated provider instances with their own 
`initialize` maps. There is an end-to-end test in 
`CometS3CredentialBridgeSuite` that configures `iso_a` and `iso_b` with the 
same vendor class and asserts the dispatcher created two distinct instances.
   
   **Reference REST vended-creds provider.** Added 
`IcebergRESTVendedS3Provider` under `spark/src/test`, ~30 lines wrapping 
Iceberg's `VendedCredentialsProvider`. Caching and refresh-near-expiry come 
from the AWS SDK's `CachedSupplier` inside `VendedCredentialsProvider`, so 
neither Comet nor the vendor reinvents either. Lives in test scope to keep 
`iceberg-aws` and AWS SDK v2 off Comet's runtime classpath; a follow-up can 
promote it to a runtime artifact behind an optional dep.
   
   **What I deliberately did not pull over.** No Rust-side TTL cache, no 
driver-side broadcast cache for the credential provider, no plan walker that 
throws on multi-catalog plans. Caching policy stays with the vendor 
(`CachedSupplier` for Iceberg-REST vendors; whatever cache fits for custom 
STS), distribution stays with the vendor too (read from `initialize`'s 
`catalogProperties`, or run a Spark broadcast inside the vendor's class), and 
`expirationEpochMillis` on the SPI return is metadata pass-through to 
`object_store`/opendal, which already have their own credential caches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to