Re: [PR] [#9757][followup] improvement(ClickHouse, CI): Support loading ClickHouse database with cluster mode and optimize CI [gravitino]

via GitHub Sat, 28 Feb 2026 00:48:14 -0800


Copilot commented on code in PR #10088:
URL: https://github.com/apache/gravitino/pull/10088#discussion_r2867255073



##########
catalogs-contrib/catalog-jdbc-clickhouse/src/main/java/org/apache/gravitino/catalog/clickhouse/operations/ClickHouseDatabaseOperations.java:
##########
@@ -130,6 +136,45 @@ protected void dropDatabase(String databaseName, boolean 
cascade) {
     }
   }
 
+  @Override
+  public JdbcSchema load(String databaseName) throws NoSuchSchemaException {
+    if (!exist(databaseName)) {
+      throw new NoSuchSchemaException("Database %s could not be found", 
databaseName);
+    }
+
+    Map<String, String> schemaProperties = new HashMap<>();
+    schemaProperties.put(ClusterConstants.ON_CLUSTER, String.valueOf(false));
+
+    try (Connection connection = getConnection();
+        Statement statement = connection.createStatement();
+        ResultSet resultSet =
+            statement.executeQuery(
+                String.format("SHOW CREATE DATABASE `%s`", 
databaseName.replace("`", "``")))) {
+      if (resultSet.next()) {
+        String createSql = resultSet.getString(1);
+        Matcher matcher = 
ON_CLUSTER_PATTERN.matcher(StringUtils.trimToEmpty(createSql));
+        if (matcher.find()) {
+          schemaProperties.put(ClusterConstants.ON_CLUSTER, 
String.valueOf(true));
+          schemaProperties.put(ClusterConstants.CLUSTER_NAME, 
matcher.group(1));
+        } else {
+          String inferredClusterName = detectClusterName(connection, 
databaseName);
+          if (StringUtils.isNotBlank(inferredClusterName)) {
+            schemaProperties.put(ClusterConstants.ON_CLUSTER, 
String.valueOf(true));
+            schemaProperties.put(ClusterConstants.CLUSTER_NAME, 
inferredClusterName);
+          }
+        }
+      }
+    } catch (SQLException se) {
+      throw this.exceptionMapper.toGravitinoException(se);
+    }
+
+    return JdbcSchema.builder()
+        .withName(databaseName)
+        .withProperties(ImmutableMap.copyOf(schemaProperties))
+        .withAuditInfo(AuditInfo.EMPTY)
+        .build();

Review Comment:
   `load()` now runs `SHOW CREATE DATABASE` but still returns a `JdbcSchema` 
without `comment`. Since ClickHouse supports schema comments (and Gravitino 
encodes the identifier in the stored comment), omitting the comment prevents 
`JdbcCatalogOperations.loadSchema()` from restoring the identifier into 
properties and will keep emitting warnings about missing Gravitino id. Consider 
parsing the COMMENT clause from `createSql` and populating 
`JdbcSchema.withComment(...)` (including the embedded Gravitino id) so 
loadSchema round-trips comment/identifier correctly.



##########
.github/workflows/backend-integration-test.yml:
##########
@@ -46,8 +49,38 @@ jobs:
               - gradle.properties
               - gradlew
               - setting.gradle.kts

Review Comment:
   The paths-filter includes `setting.gradle.kts`, but the repo file is 
`settings.gradle.kts`. As written, changes to the Gradle settings file won’t 
trigger this workflow’s `source_changes` filter, so Backend IT can be skipped 
unexpectedly. Update the filter entry to `settings.gradle.kts` (and consider 
aligning other workflows that have the same typo).
   ```suggestion
                 - settings.gradle.kts
   ```



##########
catalogs-contrib/catalog-jdbc-clickhouse/src/main/java/org/apache/gravitino/catalog/clickhouse/operations/ClickHouseDatabaseOperations.java:
##########
@@ -130,6 +136,45 @@ protected void dropDatabase(String databaseName, boolean 
cascade) {
     }
   }
 
+  @Override
+  public JdbcSchema load(String databaseName) throws NoSuchSchemaException {
+    if (!exist(databaseName)) {
+      throw new NoSuchSchemaException("Database %s could not be found", 
databaseName);
+    }
+
+    Map<String, String> schemaProperties = new HashMap<>();
+    schemaProperties.put(ClusterConstants.ON_CLUSTER, String.valueOf(false));
+
+    try (Connection connection = getConnection();
+        Statement statement = connection.createStatement();
+        ResultSet resultSet =
+            statement.executeQuery(
+                String.format("SHOW CREATE DATABASE `%s`", 
databaseName.replace("`", "``")))) {
+      if (resultSet.next()) {
+        String createSql = resultSet.getString(1);
+        Matcher matcher = 
ON_CLUSTER_PATTERN.matcher(StringUtils.trimToEmpty(createSql));
+        if (matcher.find()) {
+          schemaProperties.put(ClusterConstants.ON_CLUSTER, 
String.valueOf(true));
+          schemaProperties.put(ClusterConstants.CLUSTER_NAME, 
matcher.group(1));
+        } else {
+          String inferredClusterName = detectClusterName(connection, 
databaseName);
+          if (StringUtils.isNotBlank(inferredClusterName)) {
+            schemaProperties.put(ClusterConstants.ON_CLUSTER, 
String.valueOf(true));
+            schemaProperties.put(ClusterConstants.CLUSTER_NAME, 
inferredClusterName);
+          }
+        }

Review Comment:
   The new cluster inference branch (`detectClusterName(...)` when `SHOW CREATE 
DATABASE` does not contain `ON CLUSTER`) doesn’t appear to be covered by tests 
that assert the inferred cluster name is actually returned (current IT covers 
the regex path via `ON CLUSTER` and the non-cluster/null inference case). 
Please add a test that creates a database across the cluster without an `ON 
CLUSTER` clause (or otherwise triggers inference) and verifies 
`loadSchema(...).properties()` includes `on_cluster=true` and the expected 
`cluster_name`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [#9757][followup] improvement(ClickHouse, CI): Support loading ClickHouse database with cluster mode and optimize CI [gravitino]

Reply via email to