liaoxin01 opened a new pull request, #64734:
URL: https://github.com/apache/doris/pull/64734
## Proposed changes
Add a `file_cache_warm_up_job_num` bvar metric that tracks the number of
warm up jobs currently held in a BE's memory. This gives operators per-BE
visibility into how many warm up jobs each backend is currently holding.
### What changed
In `be/src/cloud/cloud_warm_up_manager.cpp`:
- New `bvar::Adder<int64_t>
g_file_cache_warm_up_job_num("file_cache_warm_up_job_num")`.
- **+1** when FE dispatches a new job to this BE:
- `check_and_set_job_id` — regular (cluster/table) warm up `SET_JOB`, on
`_cur_job_id` transition `0 -> job_id`.
- `check_and_set_batch_id` — defensive, same `0 -> job_id` transition
(e.g. if a `SET_BATCH` were to arrive first).
- `set_event` — event-driven `SET_JOB`, when a new `job_id` is inserted
into `_tablet_replica_cache`.
- **-1** when the job is cleared:
- `clear_job` — regular `CLEAR_JOB`, only when a live job actually existed.
- `set_event` (clear) — event-driven `CLEAR_JOB`, only when
`_tablet_replica_cache.erase()` actually removed an entry.
### Why it is dedup-safe
All increments/decrements are gated on real state transitions rather than on
RPC arrival:
- `_cur_job_id` is a single slot whose only reset to `0` is in `clear_job`
(which carries the matching `-1`), so a regular job contributes exactly one
`+1` and one `-1` per lifecycle. Repeated `SET_JOB` (retry, FE failover replay)
hits the `_cur_job_id != 0` guard and does not double count.
- Event-driven counting is guarded by
`!_tablet_replica_cache.contains(job_id)` (add) and `erase(job_id) > 0`
(clear), so duplicate `SET_JOB`/`CLEAR_JOB` are no-ops.
The metric is process-local: on BE restart it resets to `0` along with
`_cur_job_id` / `_tablet_replica_cache`, so an abandoned-but-not-cleared job
(e.g. BE down during `CLEAR_JOB`) self-heals on restart.
The value is exposed via the BE `/vars` endpoint as
`file_cache_warm_up_job_num`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]