bobhan1 opened a new pull request, #63832:
URL: https://github.com/apache/doris/pull/63832
### What problem does this PR solve?
Issue Number: None
Related PR: #62501
Problem Summary:
This PR adds table-level event-driven cloud warm-up support and improves
active incremental warm-up progress observability.
Before this change, event-driven warm-up was only controlled at
compute-group granularity. Once a load-event warm-up job was enabled for a
source and target compute group pair, all source-side table writes could
trigger warm-up to the target compute group. That is inefficient for workloads
where only selected core tables, high-frequency query tables, or selected async
materialized views need to stay warm.
This PR lets users define the warm-up scope with `ON TABLES` when creating
an event-driven load warm-up job. FE persists the normalized table filter in
the warm-up job, resolves matched table ids dynamically, sends the table ids to
BE, and lets BE filter warm-up rowsets by table id.
User-visible behavior:
- `WARM UP ... ON TABLES` supports table-level event-driven warm-up.
- Table filters support `INCLUDE` and `EXCLUDE` rules.
- Rules support `*` and `?` wildcards, for example `db.table`, `db.*`,
`*.orders_*`, and `log_db.log_?`.
- `INCLUDE` defines the candidate warm-up scope, and `EXCLUDE` removes
tables from that included scope.
- Rules are canonicalized before duplicate checks, so semantically
equivalent filters do not create duplicate jobs just because rule order differs.
- Matching covers both regular OLAP tables and async materialized views.
- Matched table ids are refreshed as tables or async materialized views are
created, dropped, or renamed.
- The same source compute group can create independent table-level warm-up
jobs to different target compute groups with different table filters.
- `SHOW WARM UP JOB` exposes the table-level job type, table filter, matched
tables, and SyncStats.
- `SHOW WARM UP JOB` list output keeps compact SyncStats, while single-job
lookup keeps detailed windowed SyncStats.
Example:
```sql
WARM UP COMPUTE GROUP query_cg WITH COMPUTE GROUP write_cg
ON TABLES (
INCLUDE 'core_db.config',
INCLUDE 'report_db.monthly_*',
INCLUDE '*.sales_*',
EXCLUDE '*.*_archive'
)
PROPERTIES (
"sync_mode" = "event_driven",
"sync_event" = "load"
);
```
Conflict and virtual compute group behavior:
- Table-level load-event warm-up and cluster-level load-event warm-up are
mutually exclusive for the same source and target compute group pair.
- If a conflicting job already exists, creation returns an error that
includes the conflicting job id; table-level conflicts also include the table
filter.
- Duplicate checks within the same job type still follow the existing
duplicate-check logic.
- VCG-managed cluster-level load-event warm-up creation does not fail on
conflict. Because VCG jobs are created by the MS HTTP API path, FE cancels
existing table-level load-event warm-up jobs with the same source and target
first, then recreates the VCG-managed cluster-level job.
- Manually creating a table-level load-event warm-up job is rejected only
when both source and target compute groups are owned by the same VCG.
- SQL still cannot use a virtual compute group directly as the source or
target compute group.
Warm-up progress observation:
- BE records per-job windowed requested, finished, and failed warm-up
statistics.
- BE exposes per-job warm-up statistics through
`/api/warmup_event_driven_stats`.
- FE aggregates BE statistics and caches the aggregated result in the
warm-up job.
- SyncStats includes source-side and target-side warm-up size/count progress
across windows.
- SyncStats includes trigger-time progress, so users can observe whether the
target compute group is behind the latest source-side warm-up trigger.
- FE `/metrics` exposes per-job active warm-up metadata, synchronized size,
and trigger gap metrics for cloud event-driven warm-up jobs.
### Release note
Support table-level event-driven cloud warm-up with `ON TABLES` filters and
per-job warm-up sync statistics.
### Check List (For Author)
- Test
- [x] Regression test
- [x] Unit Test
- [x] Manual test
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason
- Behavior changed:
- [ ] No.
- [x] Yes. `WARM UP` supports table-level `ON TABLES` filters for
event-driven load warm-up, and warm-up job output/metrics expose table filter,
matched tables, SyncStats, and trigger-gap information.
- Does this need documentation?
- [ ] No.
- [x] Yes. Documentation for the new `ON TABLES` syntax and metrics
should be added separately.
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]