(gravitino) branch main updated: [#10357] docs(table-maintenance-service): improve optimizer docs and architecture workflow (#10356)

jshao Tue, 10 Mar 2026 23:02:43 -0700

This is an automated email from the ASF dual-hosted git repository.

jshao pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/gravitino.git



The following commit(s) were added to refs/heads/main by this push:
     new 3532ed965d [#10357] docs(table-maintenance-service): improve optimizer 
docs and architecture workflow (#10356)
3532ed965d is described below

commit 3532ed965df04db419ce057e2aa904b2d087c3d7
Author: FANNG <[email protected]>
AuthorDate: Wed Mar 11 15:02:22 2026 +0900

    [#10357] docs(table-maintenance-service): improve optimizer docs and 
architecture workflow (#10356)
    
    ### What changes were proposed in this pull request?
    
    This PR improves optimizer documentation under
    `docs/table-maintenance-service/` and adds an architecture diagram
    asset.
    
    Main updates:
    - Improved quick start prerequisites and end-to-end verification flow.
    - Clarified optimizer configuration notes, especially status polling
    interval behavior in local verification.
    - Improved CLI reference around metrics naming and JDBC metrics driver
    classpath requirements.
    - Expanded troubleshooting for common runtime and configuration issues.
    - Added explicit alpha-stage scope/limitations and extensibility
    direction in optimizer overview.
    - Added optimizer architecture workflow image and linked it from the
    optimizer overview doc.
    
    ### Why are the changes needed?
    
    Users can get blocked when following optimizer docs in real environments
    because some prerequisites, sequencing, and verification checkpoints
    were not explicit enough.
    
    This PR makes the documentation more executable and reproducible, and
    reduces setup/troubleshooting friction for the end-to-end flow.
    
    Fix: #10357
    
    ### Does this PR introduce _any_ user-facing change?
    
    Yes, documentation-only user-facing changes:
    - Updated optimizer docs content and workflow guidance.
    - Added one new docs asset image:
    -
    `docs/assets/table-maintenance-service/optimizer-architecture-workflow.png`
    
    No user-facing API change and no runtime property key addition/removal
    in code.
    
    ### How was this patch tested?
    
    - Verified markdown links and file paths in updated docs.
    - Confirmed the new architecture image is referenced correctly from:
      - `docs/table-maintenance-service/optimizer.md`
    - Verified this is docs-only change set (no runtime code changes).
---
 .../optimizer-architecture-workflow.png            | Bin 0 -> 220473 bytes
 .../optimizer-cli-reference.md                     |   7 +++
 .../optimizer-configuration.md                     |   3 +-
 .../optimizer-quick-start.md                       |  15 +++---
 .../optimizer-troubleshooting.md                   |  57 ++++++++++++++++++++-
 docs/table-maintenance-service/optimizer.md        |  29 +++++++++++
 6 files changed, 103 insertions(+), 8 deletions(-)

diff --git 
a/docs/assets/table-maintenance-service/optimizer-architecture-workflow.png 
b/docs/assets/table-maintenance-service/optimizer-architecture-workflow.png
new file mode 100644
index 0000000000..8b60984c6e
Binary files /dev/null and 
b/docs/assets/table-maintenance-service/optimizer-architecture-workflow.png 
differ
diff --git a/docs/table-maintenance-service/optimizer-cli-reference.md 
b/docs/table-maintenance-service/optimizer-cli-reference.md
index 6df4e44696..7d5213b38f 100644
--- a/docs/table-maintenance-service/optimizer-cli-reference.md
+++ b/docs/table-maintenance-service/optimizer-cli-reference.md
@@ -150,6 +150,10 @@ Rule format is `scope:metricName:aggregation:comparison`:
 - `aggregation`: `max|min|avg|latest`
 - `comparison`: `lt|le|gt|ge|eq|ne`
 
+When metrics are produced by `submit-update-stats-job --update-mode metrics`, 
metric names are
+often `custom-*` (for example `custom-data-file-mse`). Use 
`list-table-metrics` first and
+configure rules with the exact metric names returned by your environment.
+
 ### Submit built-in update stats jobs
 
 Submit built-in Iceberg update stats/metrics Spark jobs directly.
@@ -168,6 +172,9 @@ Notes:
 - `--identifiers` supports `catalog.schema.table` or `schema.table` (when 
default catalog is configured).
 - `--update-mode` supports `stats|metrics|all` (default `all`).
 - For `stats` or `all`, `--updater-options` must include `gravitino_uri` and 
`metalake`.
+- If `--updater-options` includes external JDBC metrics settings
+  (`gravitino.optimizer.jdbcMetrics.*`), ensure the JDBC driver JAR is 
available to Spark
+  runtime classpath (for example via `spark.jars` in `--spark-conf`).
 - `--spark-conf` and `--updater-options` are flat JSON maps.
 
 ### List table metrics
diff --git a/docs/table-maintenance-service/optimizer-configuration.md 
b/docs/table-maintenance-service/optimizer-configuration.md
index bb67584452..5ee0ac9d1f 100644
--- a/docs/table-maintenance-service/optimizer-configuration.md
+++ b/docs/table-maintenance-service/optimizer-configuration.md
@@ -25,7 +25,8 @@ gravitino.job.statusPullIntervalInMs=300000
 gravitino.jobExecutor.local.sparkHome=/path/to/spark
 ```
 
-For local demo environments, you can reduce 
`gravitino.job.statusPullIntervalInMs` to get faster status updates.
+For local demo environments, you can reduce 
`gravitino.job.statusPullIntervalInMs` (for example
+`10000`) to get faster status updates. Restart Gravitino after changing this 
value.
 
 ## Built-in update stats `jobConf`
 
diff --git a/docs/table-maintenance-service/optimizer-quick-start.md 
b/docs/table-maintenance-service/optimizer-quick-start.md
index 182402dbcd..1be5fe793d 100644
--- a/docs/table-maintenance-service/optimizer-quick-start.md
+++ b/docs/table-maintenance-service/optimizer-quick-start.md
@@ -10,6 +10,8 @@ license: This software is licensed under the Apache License 
version 2.
 - Prepare a running Gravitino server.
 - Ensure target metalake exists (examples use `test`).
 - Configure `SPARK_HOME` or `gravitino.jobExecutor.local.sparkHome` for Spark 
templates.
+- For faster status feedback during verification, set 
`gravitino.job.statusPullIntervalInMs`
+  to a smaller value (for example `10000`) and restart Gravitino.
 - If your Iceberg REST backend is in-memory, avoid restarting it during this 
quick start because
   restart resets metadata and data files.
 
@@ -17,7 +19,8 @@ For full config details, see [Optimizer 
Configuration](./optimizer-configuration
 
 ## Success criteria
 
-- Update-stats job finishes and statistics include `custom-data-file-mse` and 
`custom-delete-file-number`.
+- Update-stats job finishes and table statistics/metrics include 
`custom-data-file-mse` and
+  `custom-delete-file-number`.
 - `submit-strategy-jobs` prints `SUBMIT` with a rewrite job ID.
 - Rewrite job log shows `Rewritten data files: <N>` where `N > 0` for 
non-empty tables.
 
@@ -98,12 +101,12 @@ Use Spark SQL to create enough small files so compaction 
has visible effect:
 ```bash
 ${SPARK_HOME}/bin/spark-sql \
   --conf spark.hadoop.fs.defaultFS=file:/// \
-  --conf spark.sql.catalog.rest_demo=org.apache.iceberg.spark.SparkCatalog \
-  --conf spark.sql.catalog.rest_demo.type=rest \
-  --conf spark.sql.catalog.rest_demo.uri=http://localhost:9001/iceberg \
-  -e "CREATE NAMESPACE IF NOT EXISTS rest_demo.db; \
+  --conf spark.sql.catalog.rest_catalog=org.apache.iceberg.spark.SparkCatalog \
+  --conf spark.sql.catalog.rest_catalog.type=rest \
+  --conf spark.sql.catalog.rest_catalog.uri=http://localhost:9001/iceberg \
+  -e "CREATE NAMESPACE IF NOT EXISTS rest_catalog.db; \
       SET spark.sql.files.maxRecordsPerFile=1000; \
-      INSERT INTO rest_demo.db.t1 \
+      INSERT INTO rest_catalog.db.t1 \
       SELECT id, concat('name_', CAST(id AS STRING)) FROM range(0, 100000);"
 ```
 
diff --git a/docs/table-maintenance-service/optimizer-troubleshooting.md 
b/docs/table-maintenance-service/optimizer-troubleshooting.md
index bb9879da9d..7901e69d71 100644
--- a/docs/table-maintenance-service/optimizer-troubleshooting.md
+++ b/docs/table-maintenance-service/optimizer-troubleshooting.md
@@ -31,13 +31,27 @@ Check `gravitino.job.statusPullIntervalInMs` and local 
staging logs under:
 
 
`/tmp/gravitino/jobs/staging/<metalake>/<job-template-name>/<job-id>/error.log`.
 
+For local verification, reduce `gravitino.job.statusPullIntervalInMs` (for 
example `10000`) and
+restart Gravitino so REST status can refresh faster.
+
 ## `No identifiers matched strategy name ...`
 
 `--strategy-name` must be the policy name (for example 
`iceberg_compaction_default`), not the policy type 
(`system_iceberg_compaction`) and not the strategy type 
(`iceberg-data-compaction`).
 
 ## Dry-run returns no `DRY-RUN` or `SUBMIT` lines
 
-This usually means trigger conditions are not met. For compaction, verify 
`custom-data-file-mse` and `custom-delete-file-number` in table statistics are 
large enough to satisfy policy rules.
+This usually means trigger conditions are not met. For compaction, verify
+`custom-data-file-mse` and `custom-delete-file-number` in table 
statistics/metrics are large
+enough to satisfy policy rules.
+
+## `monitor-metrics` returns `evaluation=false` unexpectedly
+
+Check both rule names and metric samples:
+
+1. Query current metrics first with `list-table-metrics` (and 
`--partition-path` for partition scope).
+2. Use the exact metric names returned by your environment in
+   `gravitino.optimizer.monitor.gravitinoMetricsEvaluator.rules`.
+3. Ensure `--action-time` is inside the range where both before and after 
samples exist.
 
 ## `No StrategyHandler class configured for strategy type ...`
 
@@ -57,6 +71,47 @@ Set local filesystem explicitly in Spark config:
 spark.hadoop.fs.defaultFS=file:///
 ```
 
+## Rewrite fails on multi-level partition (`identity + day(...)`)
+
+In release `1.2.0`, rewrite may fail for partition filters combining identity 
and day transform
+(for example `PARTITIONED BY (p, days(ts))`) with error:
+
+```text
+Cannot translate Spark expression ... day(cast(ts as date)) ... to data source 
filter
+```
+
+How to verify:
+
+1. Check job run status by rewrite job id under
+   `/api/metalakes/<metalake>/jobs/runs/<job-id>`.
+2. Check staging log:
+   
`/tmp/gravitino/jobs/staging/<metalake>/builtin-iceberg-rewrite-data-files/<job-id>/error.log`.
+
+Workaround:
+
+- Use identity-only partition compaction path for release `1.2.0`.
+- Keep this failure case as a reproducible regression test for later fix 
validation.
+
+Observed compatibility matrix in release `1.2.0` (rewrite path):
+
+- PASS: `p`, `p, c2` (identity-only partition transforms)
+- FAIL: `p, years(ts)`, `p, months(ts)`, `p, days(ts)`, `p, hours(ts)`,
+  `p, truncate(1, c2)`, `p, bucket(8, id)`
+
+## `submit-update-stats-job` fails with JDBC metrics errors
+
+When `--updater-options` includes `gravitino.optimizer.jdbcMetrics.*`, ensure 
the JDBC driver is
+available to Spark runtime classpath. Typical failures include 
`ClassNotFoundException` for driver
+class or `No suitable driver`.
+
+Example in `--spark-conf`:
+
+```json
+{
+  "spark.jars": "/path/to/postgresql-42.7.4.jar"
+}
+```
+
 ## `Specified optimizer config file does not exist`
 
 Check your `--conf-path` and file permissions.
diff --git a/docs/table-maintenance-service/optimizer.md 
b/docs/table-maintenance-service/optimizer.md
index ed19ead78f..d7c6cda38f 100644
--- a/docs/table-maintenance-service/optimizer.md
+++ b/docs/table-maintenance-service/optimizer.md
@@ -15,6 +15,30 @@ The Table Maintenance Service (Optimizer) automates table 
maintenance by connect
 
 The CLI commands and configuration keys use the `optimizer` name.
 
+## Alpha status and current limitations
+
+The current Table Maintenance Service is in **alpha** stage.
+
+Current limitations:
+
+- It is operated through the optimizer CLI workflow.
+- The built-in maintenance strategy focuses on Iceberg table compaction.
+- Compaction support is currently limited to Iceberg tables with identity 
partition transforms.
+
+## Extensibility and roadmap
+
+Although the built-in capability is intentionally narrow in alpha, the 
framework is designed for
+extension:
+
+- Integrate external systems by implementing custom providers and adapters.
+- Add new strategies and handlers beyond built-in compaction.
+- Plug in custom metrics, evaluators, and job submitters for different 
environments.
+
+See [Optimizer Extension Guide](./optimizer-extension-guide.md) for extension 
points.
+
+Future versions will continue improving the out-of-the-box experience and 
evolve toward a more
+ready-to-use maintenance service.
+
 ## Architecture overview
 
 The optimizer workflow is based on six parts:
@@ -26,6 +50,11 @@ The optimizer workflow is based on six parts:
 5. Job executor: local or custom backend that runs submitted jobs.
 6. Status and logs: REST job state plus local staging logs.
 
+![Optimizer architecture and 
workflow](../assets/table-maintenance-service/optimizer-architecture-workflow.png)
+
+The following diagram shows the end-to-end interactions between CLI, Gravitino 
server, Spark jobs,
+JDBC metrics repository, and the Recommender/Updater/Monitor modules.
+
 Typical data flow:
 
 1. Collect statistics and metrics for target tables.

(gravitino) branch main updated: [#10357] docs(table-maintenance-service): improve optimizer docs and architecture workflow (#10356)

Reply via email to