This is an automated email from the ASF dual-hosted git repository.
jshao pushed a commit to branch branch-1.2
in repository https://gitbox.apache.org/repos/asf/gravitino.git
The following commit(s) were added to refs/heads/branch-1.2 by this push:
new f140d47ffb [Cherry-pick to branch-1.2] [#10357]
docs(table-maintenance-service): improve optimizer docs and architecture
workflow (#10356) (#10363)
f140d47ffb is described below
commit f140d47ffba7b23361fe59fa8820797e1bd208db
Author: github-actions[bot]
<41898282+github-actions[bot]@users.noreply.github.com>
AuthorDate: Wed Mar 11 14:07:25 2026 +0800
[Cherry-pick to branch-1.2] [#10357] docs(table-maintenance-service):
improve optimizer docs and architecture workflow (#10356) (#10363)
**Cherry-pick Information:**
- Original commit: 3532ed965df04db419ce057e2aa904b2d087c3d7
- Target branch: `branch-1.2`
- Status: ✅ Clean cherry-pick (no conflicts)
Co-authored-by: FANNG <[email protected]>
---
.../optimizer-architecture-workflow.png | Bin 0 -> 220473 bytes
.../optimizer-cli-reference.md | 7 +++
.../optimizer-configuration.md | 3 +-
.../optimizer-quick-start.md | 15 +++---
.../optimizer-troubleshooting.md | 57 ++++++++++++++++++++-
docs/table-maintenance-service/optimizer.md | 29 +++++++++++
6 files changed, 103 insertions(+), 8 deletions(-)
diff --git
a/docs/assets/table-maintenance-service/optimizer-architecture-workflow.png
b/docs/assets/table-maintenance-service/optimizer-architecture-workflow.png
new file mode 100644
index 0000000000..8b60984c6e
Binary files /dev/null and
b/docs/assets/table-maintenance-service/optimizer-architecture-workflow.png
differ
diff --git a/docs/table-maintenance-service/optimizer-cli-reference.md
b/docs/table-maintenance-service/optimizer-cli-reference.md
index 6df4e44696..7d5213b38f 100644
--- a/docs/table-maintenance-service/optimizer-cli-reference.md
+++ b/docs/table-maintenance-service/optimizer-cli-reference.md
@@ -150,6 +150,10 @@ Rule format is `scope:metricName:aggregation:comparison`:
- `aggregation`: `max|min|avg|latest`
- `comparison`: `lt|le|gt|ge|eq|ne`
+When metrics are produced by `submit-update-stats-job --update-mode metrics`,
metric names are
+often `custom-*` (for example `custom-data-file-mse`). Use
`list-table-metrics` first and
+configure rules with the exact metric names returned by your environment.
+
### Submit built-in update stats jobs
Submit built-in Iceberg update stats/metrics Spark jobs directly.
@@ -168,6 +172,9 @@ Notes:
- `--identifiers` supports `catalog.schema.table` or `schema.table` (when
default catalog is configured).
- `--update-mode` supports `stats|metrics|all` (default `all`).
- For `stats` or `all`, `--updater-options` must include `gravitino_uri` and
`metalake`.
+- If `--updater-options` includes external JDBC metrics settings
+ (`gravitino.optimizer.jdbcMetrics.*`), ensure the JDBC driver JAR is
available to Spark
+ runtime classpath (for example via `spark.jars` in `--spark-conf`).
- `--spark-conf` and `--updater-options` are flat JSON maps.
### List table metrics
diff --git a/docs/table-maintenance-service/optimizer-configuration.md
b/docs/table-maintenance-service/optimizer-configuration.md
index bb67584452..5ee0ac9d1f 100644
--- a/docs/table-maintenance-service/optimizer-configuration.md
+++ b/docs/table-maintenance-service/optimizer-configuration.md
@@ -25,7 +25,8 @@ gravitino.job.statusPullIntervalInMs=300000
gravitino.jobExecutor.local.sparkHome=/path/to/spark
```
-For local demo environments, you can reduce
`gravitino.job.statusPullIntervalInMs` to get faster status updates.
+For local demo environments, you can reduce
`gravitino.job.statusPullIntervalInMs` (for example
+`10000`) to get faster status updates. Restart Gravitino after changing this
value.
## Built-in update stats `jobConf`
diff --git a/docs/table-maintenance-service/optimizer-quick-start.md
b/docs/table-maintenance-service/optimizer-quick-start.md
index 182402dbcd..1be5fe793d 100644
--- a/docs/table-maintenance-service/optimizer-quick-start.md
+++ b/docs/table-maintenance-service/optimizer-quick-start.md
@@ -10,6 +10,8 @@ license: This software is licensed under the Apache License
version 2.
- Prepare a running Gravitino server.
- Ensure target metalake exists (examples use `test`).
- Configure `SPARK_HOME` or `gravitino.jobExecutor.local.sparkHome` for Spark
templates.
+- For faster status feedback during verification, set
`gravitino.job.statusPullIntervalInMs`
+ to a smaller value (for example `10000`) and restart Gravitino.
- If your Iceberg REST backend is in-memory, avoid restarting it during this
quick start because
restart resets metadata and data files.
@@ -17,7 +19,8 @@ For full config details, see [Optimizer
Configuration](./optimizer-configuration
## Success criteria
-- Update-stats job finishes and statistics include `custom-data-file-mse` and
`custom-delete-file-number`.
+- Update-stats job finishes and table statistics/metrics include
`custom-data-file-mse` and
+ `custom-delete-file-number`.
- `submit-strategy-jobs` prints `SUBMIT` with a rewrite job ID.
- Rewrite job log shows `Rewritten data files: <N>` where `N > 0` for
non-empty tables.
@@ -98,12 +101,12 @@ Use Spark SQL to create enough small files so compaction
has visible effect:
```bash
${SPARK_HOME}/bin/spark-sql \
--conf spark.hadoop.fs.defaultFS=file:/// \
- --conf spark.sql.catalog.rest_demo=org.apache.iceberg.spark.SparkCatalog \
- --conf spark.sql.catalog.rest_demo.type=rest \
- --conf spark.sql.catalog.rest_demo.uri=http://localhost:9001/iceberg \
- -e "CREATE NAMESPACE IF NOT EXISTS rest_demo.db; \
+ --conf spark.sql.catalog.rest_catalog=org.apache.iceberg.spark.SparkCatalog \
+ --conf spark.sql.catalog.rest_catalog.type=rest \
+ --conf spark.sql.catalog.rest_catalog.uri=http://localhost:9001/iceberg \
+ -e "CREATE NAMESPACE IF NOT EXISTS rest_catalog.db; \
SET spark.sql.files.maxRecordsPerFile=1000; \
- INSERT INTO rest_demo.db.t1 \
+ INSERT INTO rest_catalog.db.t1 \
SELECT id, concat('name_', CAST(id AS STRING)) FROM range(0, 100000);"
```
diff --git a/docs/table-maintenance-service/optimizer-troubleshooting.md
b/docs/table-maintenance-service/optimizer-troubleshooting.md
index bb9879da9d..7901e69d71 100644
--- a/docs/table-maintenance-service/optimizer-troubleshooting.md
+++ b/docs/table-maintenance-service/optimizer-troubleshooting.md
@@ -31,13 +31,27 @@ Check `gravitino.job.statusPullIntervalInMs` and local
staging logs under:
`/tmp/gravitino/jobs/staging/<metalake>/<job-template-name>/<job-id>/error.log`.
+For local verification, reduce `gravitino.job.statusPullIntervalInMs` (for
example `10000`) and
+restart Gravitino so REST status can refresh faster.
+
## `No identifiers matched strategy name ...`
`--strategy-name` must be the policy name (for example
`iceberg_compaction_default`), not the policy type
(`system_iceberg_compaction`) and not the strategy type
(`iceberg-data-compaction`).
## Dry-run returns no `DRY-RUN` or `SUBMIT` lines
-This usually means trigger conditions are not met. For compaction, verify
`custom-data-file-mse` and `custom-delete-file-number` in table statistics are
large enough to satisfy policy rules.
+This usually means trigger conditions are not met. For compaction, verify
+`custom-data-file-mse` and `custom-delete-file-number` in table
statistics/metrics are large
+enough to satisfy policy rules.
+
+## `monitor-metrics` returns `evaluation=false` unexpectedly
+
+Check both rule names and metric samples:
+
+1. Query current metrics first with `list-table-metrics` (and
`--partition-path` for partition scope).
+2. Use the exact metric names returned by your environment in
+ `gravitino.optimizer.monitor.gravitinoMetricsEvaluator.rules`.
+3. Ensure `--action-time` is inside the range where both before and after
samples exist.
## `No StrategyHandler class configured for strategy type ...`
@@ -57,6 +71,47 @@ Set local filesystem explicitly in Spark config:
spark.hadoop.fs.defaultFS=file:///
```
+## Rewrite fails on multi-level partition (`identity + day(...)`)
+
+In release `1.2.0`, rewrite may fail for partition filters combining identity
and day transform
+(for example `PARTITIONED BY (p, days(ts))`) with error:
+
+```text
+Cannot translate Spark expression ... day(cast(ts as date)) ... to data source
filter
+```
+
+How to verify:
+
+1. Check job run status by rewrite job id under
+ `/api/metalakes/<metalake>/jobs/runs/<job-id>`.
+2. Check staging log:
+
`/tmp/gravitino/jobs/staging/<metalake>/builtin-iceberg-rewrite-data-files/<job-id>/error.log`.
+
+Workaround:
+
+- Use identity-only partition compaction path for release `1.2.0`.
+- Keep this failure case as a reproducible regression test for later fix
validation.
+
+Observed compatibility matrix in release `1.2.0` (rewrite path):
+
+- PASS: `p`, `p, c2` (identity-only partition transforms)
+- FAIL: `p, years(ts)`, `p, months(ts)`, `p, days(ts)`, `p, hours(ts)`,
+ `p, truncate(1, c2)`, `p, bucket(8, id)`
+
+## `submit-update-stats-job` fails with JDBC metrics errors
+
+When `--updater-options` includes `gravitino.optimizer.jdbcMetrics.*`, ensure
the JDBC driver is
+available to Spark runtime classpath. Typical failures include
`ClassNotFoundException` for driver
+class or `No suitable driver`.
+
+Example in `--spark-conf`:
+
+```json
+{
+ "spark.jars": "/path/to/postgresql-42.7.4.jar"
+}
+```
+
## `Specified optimizer config file does not exist`
Check your `--conf-path` and file permissions.
diff --git a/docs/table-maintenance-service/optimizer.md
b/docs/table-maintenance-service/optimizer.md
index ed19ead78f..d7c6cda38f 100644
--- a/docs/table-maintenance-service/optimizer.md
+++ b/docs/table-maintenance-service/optimizer.md
@@ -15,6 +15,30 @@ The Table Maintenance Service (Optimizer) automates table
maintenance by connect
The CLI commands and configuration keys use the `optimizer` name.
+## Alpha status and current limitations
+
+The current Table Maintenance Service is in **alpha** stage.
+
+Current limitations:
+
+- It is operated through the optimizer CLI workflow.
+- The built-in maintenance strategy focuses on Iceberg table compaction.
+- Compaction support is currently limited to Iceberg tables with identity
partition transforms.
+
+## Extensibility and roadmap
+
+Although the built-in capability is intentionally narrow in alpha, the
framework is designed for
+extension:
+
+- Integrate external systems by implementing custom providers and adapters.
+- Add new strategies and handlers beyond built-in compaction.
+- Plug in custom metrics, evaluators, and job submitters for different
environments.
+
+See [Optimizer Extension Guide](./optimizer-extension-guide.md) for extension
points.
+
+Future versions will continue improving the out-of-the-box experience and
evolve toward a more
+ready-to-use maintenance service.
+
## Architecture overview
The optimizer workflow is based on six parts:
@@ -26,6 +50,11 @@ The optimizer workflow is based on six parts:
5. Job executor: local or custom backend that runs submitted jobs.
6. Status and logs: REST job state plus local staging logs.
+
+
+The following diagram shows the end-to-end interactions between CLI, Gravitino
server, Spark jobs,
+JDBC metrics repository, and the Recommender/Updater/Monitor modules.
+
Typical data flow:
1. Collect statistics and metrics for target tables.