yuqi1129 commented on PR #10480: URL: https://github.com/apache/gravitino/pull/10480#issuecomment-4118733256
> Thanks @yuqi1129 for the thorough review. Here is the status of each concern. > > ### Concern 1: `ClassLoaderKey` missing backend-URI dimensions — Fixed > Extended the isolation key with `metastore.uris`, `jdbc-url`, and `fs.defaultFS`. Also added `authorization-provider` (determines which authorization plugin JARs are loaded). > > The full set of built-in default isolation keys (catalog property names) is now: > > Category Property keys > Classpath `package` ([Catalog.PROPERTY_PACKAGE](https://github.com/apache/gravitino/blob/main/api/src/main/java/org/apache/gravitino/Catalog.java?rgh-link-date=2026-03-24T07%3A12%3A45Z#L132)), `authorization-provider` > Kerberos identity `authentication.type`, `authentication.kerberos.principal`, `authentication.kerberos.keytab-uri` > Backend URIs `metastore.uris`, `jdbc-url`, `fs.defaultFS` > These defaults cannot be removed. Operators can add more via a new server config: > > ```ini > gravitino.catalog.classloader.isolation.extra-properties = custom.backend.endpoint > ``` > > `ClassLoaderKey` stores isolation properties as a generic `Map<String, String>`, decoupled from specific property names — only `CatalogManager.buildClassLoaderKey` needs to know which properties matter. This makes the pool infrastructure key-agnostic and extensible without modifying pool classes. > > ### Concern 2: `FileSystem.closeAll()` can disconnect live catalogs — Resolved by Concern 1 fix > With backend URIs now in the key, catalogs pointing at different HDFS clusters get separate ClassLoaders. `doFinalCleanup` only runs when `refCount` reaches 0, so `FileSystem.closeAll()` cannot affect live catalogs sharing the same ClassLoader. > > ### Concern 3: ThreadLocal cross-contamination — Not fully resolved, inherent trade-off > This is a genuine limitation of ClassLoader sharing. With per-catalog ClassLoaders, ThreadLocal values are naturally isolated because each ClassLoader loads its own copy of the class. With a shared ClassLoader, catalogs on the same thread can see each other's ThreadLocal state. > > The Concern 1 fix reduces the blast radius — catalogs sharing a ClassLoader now have identical backend configurations, so leaked ThreadLocal state is less likely to cause incorrect behavior (e.g., talking to the wrong metastore). But it does not eliminate the problem. If a library sets a ThreadLocal with per-catalog state (e.g., Hive `SessionState`, Hadoop `SecurityContext`), cross-contamination is still possible between catalogs sharing the same ClassLoader on the same thread. > > **What this PR does:** > > * Reduces exposure by isolating catalogs with different backends into separate ClassLoaders (Concern 1 fix) > * Provides `extra-properties` config as an operational escape hatch — if a specific ThreadLocal issue surfaces, operators can add the relevant property key to force separation without a code release > > **What this PR does not do:** > > * It does not guarantee ThreadLocal isolation between catalogs sharing a ClassLoader. This is an inherent trade-off of sharing and cannot be fully solved at the key level. > > **Open question for the community:** Is this trade-off acceptable given the Metaspace savings, or should we consider additional safeguards (e.g., a per-catalog opt-out property to force a dedicated ClassLoader)? Happy to discuss. > > ### Concern 4: AWS SDK v2 static state — Not addressed in this PR > Low severity, not related to the key design. Can be addressed as a follow-up if this one merged. > > ### Suggestion 3: Tests for different-backend isolation — Added > New tests: > > * `testDifferentMetastoreUrisCreateDifferentEntries` > * `testSameMetastoreUrisShareEntry` > * `testDifferentJdbcUrlsCreateDifferentEntries` > * `testDifferentDefaultFsCreateDifferentEntries` > * `testKeyWithAuthorizationProvider` > > Total: 19 unit tests + 3 integration tests. I will take time to review it again. Thanks for your quick response. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
