peterhunter99001-cyber opened a new issue, #10486:
URL: https://github.com/apache/gravitino/issues/10486

   ### Version
   
   main branch
   
   ### Describe what's wrong
   
   
   **Body:**
   
   ```markdown
   ## Summary
   
   When connecting ClickHouse's `DataLakeCatalog` engine (with `catalog_type = 
'rest'`) 
   to Gravitino's Iceberg REST auxiliary service (`/iceberg/v1/`), the 
connection always 
   fails with a `NoSuchCatalogException`. This makes Gravitino incompatible 
with 
   ClickHouse's REST catalog client, while other engines (Trino, StarRocks, 
Spark) 
   work fine against the same endpoint.
   
   ## Environment
   
   | Component   | Version |
   |-------------|---------|
   | Gravitino   | 1.2.0   |
   | ClickHouse  | 26.2    |
   | Iceberg REST backend | SQLite (JDBC catalog backend) |
   | Object Storage | MinIO / RustFS (S3-compatible, path-style) |
   
   ## Steps to Reproduce
   
   **1. Gravitino configuration (iceberg-rest auxiliary service):**
   
   ```properties
   gravitino.iceberg-rest.catalog-backend = jdbc
   gravitino.iceberg-rest.uri = jdbc:sqlite:/catalog/iceberg.db
   gravitino.iceberg-rest.warehouse = s3a://iceberg-warehouse/
   gravitino.iceberg-rest.io-impl = org.apache.iceberg.aws.s3.S3FileIO
   ```
   
   **2. ClickHouse SQL to create the DataLakeCatalog database:**
   
   ```sql
   SET allow_experimental_database_iceberg = 1;
   
   CREATE DATABASE gravitino_catalog
   ENGINE = DataLakeCatalog('http://gravitino:9001/iceberg/', '', '')
   SETTINGS
       catalog_type     = 'rest',
       storage_endpoint = 'http://minio:9000/iceberg-warehouse',
       warehouse        = 'lakehouse';
   ```
   
   **3. Attempt to use the catalog:**
   
   ```sql
   USE gravitino_catalog;
   SHOW TABLES;
   ```
   
   ## Actual Error
   
   ```
   Code: 1060. DB::Exception: Failed to get config from REST catalog:
   NoSuchCatalogException: Catalog 'lakehouse' does not exist
   ```
   
   ClickHouse's REST client sends the following HTTP request on startup:
   
   ```
   GET /iceberg/v1/config?warehouse=lakehouse
   ```
   
   Gravitino's Iceberg REST service interprets the `warehouse` query parameter 
as 
   a **catalog name** within its internal metalake registry — looking for a 
catalog 
   named `lakehouse` — which does not exist. The request fails before any table 
   operations can be performed.
   
   ## Root Cause Analysis
   
   The Apache Iceberg REST Catalog specification defines `warehouse` as:
   
   > *"An optional identifier for the target warehouse"* (from `GET /v1/config`)
   
   The spec intentionally leaves the **semantic interpretation** of this 
parameter 
   to the server implementation. This has led to divergent behaviors across 
implementations:
   
   | Implementation | `warehouse` parameter meaning |
   |---|---|
   | **Nessie** | Logical warehouse name (pre-registered named storage 
location) |
   | **Lakekeeper** | Logical warehouse name (pre-registered) |
   | **Gravitino** | Treated as a Gravitino catalog name (internal registry 
lookup) |
   | **AWS Glue REST** | Account ID + S3 table bucket name |
   | **Databricks Unity** | Catalog name |
   
   **ClickHouse's `DataLakeCatalog` C++ client** requires `warehouse` as a 
mandatory 
   routing key and always includes it in `GET /v1/config?warehouse=<value>`, 
following 
   the Nessie/Lakekeeper convention. Gravitino's behavior of looking up the 
value 
   as a catalog name in its metalake registry is incompatible with this 
expectation.
   
   **Why Trino and StarRocks work:** Both use the Apache Iceberg Java SDK's 
REST 
   catalog client, which treats `warehouse` as **optional** — when not 
configured, 
   the parameter is simply omitted from the request. Gravitino then returns its 
   default configuration without triggering the catalog lookup. This is 
accidental 
   compatibility, not intentional support.
   
   ## Expected Behavior
   
   One of the following would resolve this incompatibility:
   
   **Option A (Recommended): When `warehouse` parameter is unrecognized, fall 
back 
   to default configuration instead of throwing an exception.**
   
   ```
   GET /iceberg/v1/config?warehouse=<any_value>
   → If <any_value> does not match a known catalog name,
     return the default catalog configuration (HTTP 200)
     instead of raising NoSuchCatalogException (HTTP 404/400)
   ```
   
   **Option B: Document a supported `warehouse` value that ClickHouse users can 
   configure to successfully connect.**
   
   For example, if there is a specific string (e.g., the configured warehouse 
path, 
   or an empty string) that Gravitino accepts without triggering an internal 
lookup, 
   documenting this would allow ClickHouse users to work around the issue.
   
   **Option C: Add support for a named warehouse identifier in the Iceberg REST 
   auxiliary service** — similar to Nessie's multi-warehouse model — so that 
   `warehouse=<logical_name>` routes to the correct storage configuration.
   
   ## Impact
   
   - ClickHouse `DataLakeCatalog` with `catalog_type = 'rest'` **cannot connect 
to 
     Gravitino's Iceberg REST service** at all
   - Users cannot use Gravitino as a shared Iceberg catalog for multi-engine 
     environments that include ClickHouse
   - The workaround is to use ClickHouse's `Iceberg` table engine directly 
     (bypassing Gravitino entirely), which prevents true multi-engine metadata 
sharing
   
   ## References
   
   - [ClickHouse DataLakeCatalog — REST Catalog 
docs](https://clickhouse.com/docs/use-cases/data-lake/rest-catalog)
   - [ClickHouse DataLakeCatalog — Lakekeeper 
docs](https://clickhouse.com/docs/use-cases/data-lake/lakekeeper-catalog) 
(shows `warehouse = 'demo'` as logical name)
   - [Apache Iceberg REST Catalog spec — GET 
/v1/config](https://github.com/apache/iceberg/blob/main/open-api/rest-catalog-open-api.yaml)
   - [Nessie warehouse 
semantics](https://projectnessie.org/guides/iceberg-rest/#warehouses--storage-locations)
   ```
   
   ### Error message and/or stacktrace
   
   Code: 1060. DB::Exception: Failed to get config from REST catalog:
   NoSuchCatalogException: Catalog 'lakehouse' does not exist
   
   ### How to reproduce
   
   Gravitino     1.2.0 
   
   ### Additional context
   
   Gravitino     1.2.0 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to