(gravitino) branch branch-1.2 updated: [Cherry-pick to branch-1.2] [#10357] docs(table-maintenance-service): improve optimizer docs and architecture workflow (#10356) (#10363)

jshao Tue, 10 Mar 2026 23:07:44 -0700

This is an automated email from the ASF dual-hosted git repository.

jshao pushed a commit to branch branch-1.2
in repository https://gitbox.apache.org/repos/asf/gravitino.git



The following commit(s) were added to refs/heads/branch-1.2 by this push:
     new f140d47ffb [Cherry-pick to branch-1.2] [#10357] 
docs(table-maintenance-service): improve optimizer docs and architecture 
workflow (#10356) (#10363)
f140d47ffb is described below

commit f140d47ffba7b23361fe59fa8820797e1bd208db
Author: github-actions[bot] 
<41898282+github-actions[bot]@users.noreply.github.com>
AuthorDate: Wed Mar 11 14:07:25 2026 +0800

    [Cherry-pick to branch-1.2] [#10357] docs(table-maintenance-service): 
improve optimizer docs and architecture workflow (#10356) (#10363)
    
    **Cherry-pick Information:**
    - Original commit: 3532ed965df04db419ce057e2aa904b2d087c3d7
    - Target branch: `branch-1.2`
    - Status: ✅ Clean cherry-pick (no conflicts)
    
    Co-authored-by: FANNG <[email protected]>
---
 .../optimizer-architecture-workflow.png            | Bin 0 -> 220473 bytes
 .../optimizer-cli-reference.md                     |   7 +++
 .../optimizer-configuration.md                     |   3 +-
 .../optimizer-quick-start.md                       |  15 +++---
 .../optimizer-troubleshooting.md                   |  57 ++++++++++++++++++++-
 docs/table-maintenance-service/optimizer.md        |  29 +++++++++++
 6 files changed, 103 insertions(+), 8 deletions(-)

diff --git 
a/docs/assets/table-maintenance-service/optimizer-architecture-workflow.png 
b/docs/assets/table-maintenance-service/optimizer-architecture-workflow.png
new file mode 100644
index 0000000000..8b60984c6e
Binary files /dev/null and 
b/docs/assets/table-maintenance-service/optimizer-architecture-workflow.png 
differ
diff --git a/docs/table-maintenance-service/optimizer-cli-reference.md 
b/docs/table-maintenance-service/optimizer-cli-reference.md
index 6df4e44696..7d5213b38f 100644
--- a/docs/table-maintenance-service/optimizer-cli-reference.md
+++ b/docs/table-maintenance-service/optimizer-cli-reference.md
@@ -150,6 +150,10 @@ Rule format is `scope:metricName:aggregation:comparison`:
 - `aggregation`: `max|min|avg|latest`
 - `comparison`: `lt|le|gt|ge|eq|ne`
 
+When metrics are produced by `submit-update-stats-job --update-mode metrics`, 
metric names are
+often `custom-*` (for example `custom-data-file-mse`). Use 
`list-table-metrics` first and
+configure rules with the exact metric names returned by your environment.
+
 ### Submit built-in update stats jobs
 
 Submit built-in Iceberg update stats/metrics Spark jobs directly.
@@ -168,6 +172,9 @@ Notes:
 - `--identifiers` supports `catalog.schema.table` or `schema.table` (when 
default catalog is configured).
 - `--update-mode` supports `stats|metrics|all` (default `all`).
 - For `stats` or `all`, `--updater-options` must include `gravitino_uri` and 
`metalake`.
+- If `--updater-options` includes external JDBC metrics settings
+  (`gravitino.optimizer.jdbcMetrics.*`), ensure the JDBC driver JAR is 
available to Spark
+  runtime classpath (for example via `spark.jars` in `--spark-conf`).
 - `--spark-conf` and `--updater-options` are flat JSON maps.
 
 ### List table metrics
diff --git a/docs/table-maintenance-service/optimizer-configuration.md 
b/docs/table-maintenance-service/optimizer-configuration.md
index bb67584452..5ee0ac9d1f 100644
--- a/docs/table-maintenance-service/optimizer-configuration.md
+++ b/docs/table-maintenance-service/optimizer-configuration.md
@@ -25,7 +25,8 @@ gravitino.job.statusPullIntervalInMs=300000
 gravitino.jobExecutor.local.sparkHome=/path/to/spark
 ```
 
-For local demo environments, you can reduce 
`gravitino.job.statusPullIntervalInMs` to get faster status updates.
+For local demo environments, you can reduce 
`gravitino.job.statusPullIntervalInMs` (for example
+`10000`) to get faster status updates. Restart Gravitino after changing this 
value.
 
 ## Built-in update stats `jobConf`
 
diff --git a/docs/table-maintenance-service/optimizer-quick-start.md 
b/docs/table-maintenance-service/optimizer-quick-start.md
index 182402dbcd..1be5fe793d 100644
--- a/docs/table-maintenance-service/optimizer-quick-start.md
+++ b/docs/table-maintenance-service/optimizer-quick-start.md
@@ -10,6 +10,8 @@ license: This software is licensed under the Apache License 
version 2.
 - Prepare a running Gravitino server.
 - Ensure target metalake exists (examples use `test`).
 - Configure `SPARK_HOME` or `gravitino.jobExecutor.local.sparkHome` for Spark 
templates.
+- For faster status feedback during verification, set 
`gravitino.job.statusPullIntervalInMs`
+  to a smaller value (for example `10000`) and restart Gravitino.
 - If your Iceberg REST backend is in-memory, avoid restarting it during this 
quick start because
   restart resets metadata and data files.
 
@@ -17,7 +19,8 @@ For full config details, see [Optimizer 
Configuration](./optimizer-configuration
 
 ## Success criteria
 
-- Update-stats job finishes and statistics include `custom-data-file-mse` and 
`custom-delete-file-number`.
+- Update-stats job finishes and table statistics/metrics include 
`custom-data-file-mse` and
+  `custom-delete-file-number`.
 - `submit-strategy-jobs` prints `SUBMIT` with a rewrite job ID.
 - Rewrite job log shows `Rewritten data files: <N>` where `N > 0` for 
non-empty tables.
 
@@ -98,12 +101,12 @@ Use Spark SQL to create enough small files so compaction 
has visible effect:
 ```bash
 ${SPARK_HOME}/bin/spark-sql \
   --conf spark.hadoop.fs.defaultFS=file:/// \
-  --conf spark.sql.catalog.rest_demo=org.apache.iceberg.spark.SparkCatalog \
-  --conf spark.sql.catalog.rest_demo.type=rest \
-  --conf spark.sql.catalog.rest_demo.uri=http://localhost:9001/iceberg \
-  -e "CREATE NAMESPACE IF NOT EXISTS rest_demo.db; \
+  --conf spark.sql.catalog.rest_catalog=org.apache.iceberg.spark.SparkCatalog \
+  --conf spark.sql.catalog.rest_catalog.type=rest \
+  --conf spark.sql.catalog.rest_catalog.uri=http://localhost:9001/iceberg \
+  -e "CREATE NAMESPACE IF NOT EXISTS rest_catalog.db; \
       SET spark.sql.files.maxRecordsPerFile=1000; \
-      INSERT INTO rest_demo.db.t1 \
+      INSERT INTO rest_catalog.db.t1 \
       SELECT id, concat('name_', CAST(id AS STRING)) FROM range(0, 100000);"
 ```
 
diff --git a/docs/table-maintenance-service/optimizer-troubleshooting.md 
b/docs/table-maintenance-service/optimizer-troubleshooting.md
index bb9879da9d..7901e69d71 100644
--- a/docs/table-maintenance-service/optimizer-troubleshooting.md
+++ b/docs/table-maintenance-service/optimizer-troubleshooting.md
@@ -31,13 +31,27 @@ Check `gravitino.job.statusPullIntervalInMs` and local 
staging logs under:
 
 
`/tmp/gravitino/jobs/staging/<metalake>/<job-template-name>/<job-id>/error.log`.
 
+For local verification, reduce `gravitino.job.statusPullIntervalInMs` (for 
example `10000`) and
+restart Gravitino so REST status can refresh faster.
+
 ## `No identifiers matched strategy name ...`
 
 `--strategy-name` must be the policy name (for example 
`iceberg_compaction_default`), not the policy type 
(`system_iceberg_compaction`) and not the strategy type 
(`iceberg-data-compaction`).
 
 ## Dry-run returns no `DRY-RUN` or `SUBMIT` lines
 
-This usually means trigger conditions are not met. For compaction, verify 
`custom-data-file-mse` and `custom-delete-file-number` in table statistics are 
large enough to satisfy policy rules.
+This usually means trigger conditions are not met. For compaction, verify
+`custom-data-file-mse` and `custom-delete-file-number` in table 
statistics/metrics are large
+enough to satisfy policy rules.
+
+## `monitor-metrics` returns `evaluation=false` unexpectedly
+
+Check both rule names and metric samples:
+
+1. Query current metrics first with `list-table-metrics` (and 
`--partition-path` for partition scope).
+2. Use the exact metric names returned by your environment in
+   `gravitino.optimizer.monitor.gravitinoMetricsEvaluator.rules`.
+3. Ensure `--action-time` is inside the range where both before and after 
samples exist.
 
 ## `No StrategyHandler class configured for strategy type ...`
 
@@ -57,6 +71,47 @@ Set local filesystem explicitly in Spark config:
 spark.hadoop.fs.defaultFS=file:///
 ```
 
+## Rewrite fails on multi-level partition (`identity + day(...)`)
+
+In release `1.2.0`, rewrite may fail for partition filters combining identity 
and day transform
+(for example `PARTITIONED BY (p, days(ts))`) with error:
+
+```text
+Cannot translate Spark expression ... day(cast(ts as date)) ... to data source 
filter
+```
+
+How to verify:
+
+1. Check job run status by rewrite job id under
+   `/api/metalakes/<metalake>/jobs/runs/<job-id>`.
+2. Check staging log:
+   
`/tmp/gravitino/jobs/staging/<metalake>/builtin-iceberg-rewrite-data-files/<job-id>/error.log`.
+
+Workaround:
+
+- Use identity-only partition compaction path for release `1.2.0`.
+- Keep this failure case as a reproducible regression test for later fix 
validation.
+
+Observed compatibility matrix in release `1.2.0` (rewrite path):
+
+- PASS: `p`, `p, c2` (identity-only partition transforms)
+- FAIL: `p, years(ts)`, `p, months(ts)`, `p, days(ts)`, `p, hours(ts)`,
+  `p, truncate(1, c2)`, `p, bucket(8, id)`
+
+## `submit-update-stats-job` fails with JDBC metrics errors
+
+When `--updater-options` includes `gravitino.optimizer.jdbcMetrics.*`, ensure 
the JDBC driver is
+available to Spark runtime classpath. Typical failures include 
`ClassNotFoundException` for driver
+class or `No suitable driver`.
+
+Example in `--spark-conf`:
+
+```json
+{
+  "spark.jars": "/path/to/postgresql-42.7.4.jar"
+}
+```
+
 ## `Specified optimizer config file does not exist`
 
 Check your `--conf-path` and file permissions.
diff --git a/docs/table-maintenance-service/optimizer.md 
b/docs/table-maintenance-service/optimizer.md
index ed19ead78f..d7c6cda38f 100644
--- a/docs/table-maintenance-service/optimizer.md
+++ b/docs/table-maintenance-service/optimizer.md
@@ -15,6 +15,30 @@ The Table Maintenance Service (Optimizer) automates table 
maintenance by connect
 
 The CLI commands and configuration keys use the `optimizer` name.
 
+## Alpha status and current limitations
+
+The current Table Maintenance Service is in **alpha** stage.
+
+Current limitations:
+
+- It is operated through the optimizer CLI workflow.
+- The built-in maintenance strategy focuses on Iceberg table compaction.
+- Compaction support is currently limited to Iceberg tables with identity 
partition transforms.
+
+## Extensibility and roadmap
+
+Although the built-in capability is intentionally narrow in alpha, the 
framework is designed for
+extension:
+
+- Integrate external systems by implementing custom providers and adapters.
+- Add new strategies and handlers beyond built-in compaction.
+- Plug in custom metrics, evaluators, and job submitters for different 
environments.
+
+See [Optimizer Extension Guide](./optimizer-extension-guide.md) for extension 
points.
+
+Future versions will continue improving the out-of-the-box experience and 
evolve toward a more
+ready-to-use maintenance service.
+
 ## Architecture overview
 
 The optimizer workflow is based on six parts:
@@ -26,6 +50,11 @@ The optimizer workflow is based on six parts:
 5. Job executor: local or custom backend that runs submitted jobs.
 6. Status and logs: REST job state plus local staging logs.
 
+![Optimizer architecture and 
workflow](../assets/table-maintenance-service/optimizer-architecture-workflow.png)
+
+The following diagram shows the end-to-end interactions between CLI, Gravitino 
server, Spark jobs,
+JDBC metrics repository, and the Recommender/Updater/Monitor modules.
+
 Typical data flow:
 
 1. Collect statistics and metrics for target tables.

(gravitino) branch branch-1.2 updated: [Cherry-pick to branch-1.2] [#10357] docs(table-maintenance-service): improve optimizer docs and architecture workflow (#10356) (#10363)

Reply via email to