LuciferYang commented on PR #10480:
URL: https://github.com/apache/gravitino/pull/10480#issuecomment-4115928330

   Thanks @yuqi1129 for the thorough review. Here is the status of each concern.
   
   ### Concern 1: `ClassLoaderKey` missing backend-URI dimensions — Fixed
   
   Extended the isolation key with `metastore.uris`, `jdbc-url`, and 
`fs.defaultFS`. Also added `authorization-provider` (determines which 
authorization plugin JARs are loaded).
   
   The full set of built-in default isolation keys (catalog property names) is 
now:
   
   | Category | Property keys |
   |---|---|
   | Classpath | `package` 
([Catalog.PROPERTY_PACKAGE](https://github.com/apache/gravitino/blob/main/api/src/main/java/org/apache/gravitino/Catalog.java#L132)),
 `authorization-provider` |
   | Kerberos identity | `authentication.type`, 
`authentication.kerberos.principal`, `authentication.kerberos.keytab-uri` |
   | Backend URIs | `metastore.uris`, `jdbc-url`, `fs.defaultFS` |
   
   These defaults cannot be removed. Operators can add more via a new server 
config:
   
   ```properties
   gravitino.catalog.classloader.isolation.extra-properties = 
custom.backend.endpoint
   ```
   
   `ClassLoaderKey` stores isolation properties as a generic `Map<String, 
String>`, decoupled from specific property names — only 
`CatalogManager.buildClassLoaderKey` needs to know which properties matter. 
This makes the pool infrastructure key-agnostic and extensible without 
modifying pool classes.
   
   ### Concern 2: `FileSystem.closeAll()` can disconnect live catalogs — 
Resolved by Concern 1 fix
   
   With backend URIs now in the key, catalogs pointing at different HDFS 
clusters get separate ClassLoaders. `doFinalCleanup` only runs when `refCount` 
reaches 0, so `FileSystem.closeAll()` cannot affect live catalogs sharing the 
same ClassLoader.
   
   ### Concern 3: ThreadLocal cross-contamination — Not fully resolved, 
inherent trade-off
   
   This is a genuine limitation of ClassLoader sharing. With per-catalog 
ClassLoaders, ThreadLocal values are naturally isolated because each 
ClassLoader loads its own copy of the class. With a shared ClassLoader, 
catalogs on the same thread can see each other's ThreadLocal state.
   
   The Concern 1 fix reduces the blast radius — catalogs sharing a ClassLoader 
now have identical backend configurations, so leaked ThreadLocal state is less 
likely to cause incorrect behavior (e.g., talking to the wrong metastore). But 
it does not eliminate the problem. If a library sets a ThreadLocal with 
per-catalog state (e.g., Hive `SessionState`, Hadoop `SecurityContext`), 
cross-contamination is still possible between catalogs sharing the same 
ClassLoader on the same thread.
   
   **What this PR does:**
   - Reduces exposure by isolating catalogs with different backends into 
separate ClassLoaders (Concern 1 fix)
   - Provides `extra-properties` config as an operational escape hatch — if a 
specific ThreadLocal issue surfaces, operators can add the relevant property 
key to force separation without a code release
   
   **What this PR does not do:**
   - It does not guarantee ThreadLocal isolation between catalogs sharing a 
ClassLoader. This is an inherent trade-off of sharing and cannot be fully 
solved at the key level.
   
   **Open question for the community:** Is this trade-off acceptable given the 
Metaspace savings, or should we consider additional safeguards (e.g., a 
per-catalog opt-out property to force a dedicated ClassLoader)? Happy to 
discuss.
   
   ### Concern 4: AWS SDK v2 static state — Not addressed in this PR
   
   Low severity, not related to the key design. Can be addressed as a follow-up.
   
   ### Suggestion 3: Tests for different-backend isolation — Added
   
   New tests:
   - `testDifferentMetastoreUrisCreateDifferentEntries`
   - `testSameMetastoreUrisShareEntry`
   - `testDifferentJdbcUrlsCreateDifferentEntries`
   - `testDifferentDefaultFsCreateDifferentEntries`
   - `testKeyWithAuthorizationProvider`
   
   Total: 19 unit tests + 3 integration tests.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to