nozjkoitop opened a new pull request, #19548:
URL: https://github.com/apache/druid/pull/19548
### Description
#### Fixed a race condition in off-heap cached lookups
This PR fixes a race between lookup cache refresh and concurrent query
execution for cached-global lookups, using off-heap mmap-backed storage.
Previously, query paths could obtain a `LookupExtractor` backed by the
current cache version while a lookup refresh concurrently replaced and deleted
that cache version. In off-heap mode, this could leave in-flight query work
referencing disposed mmap-backed data.
This PR adds lookup cache retirement and retained lookup extractors. When a
cache refresh replaces a lookup cache version, the old version is retired
instead of closed immediately. Retired cache versions remain alive while
retained references exist, and are disposed once those references are released
or after a configured timeout.
#### Added retained lookup extractor lifecycle support
`LookupExtractorFactory` now has an optional
`acquireRetainedLookupExtractor()` method. Factories that support retained
backing resources can return a `RetainedLookupExtractor`, which wraps the
actual `LookupExtractor` and closes the retained backing reference when query
work is finished.
`RetainedLookupExtractor` supports explicit `close()` for query paths with a
lifecycle hook, and also uses a `Cleaner` fallback for paths that hand lookup
extractors to APIs without an explicit close hook.
Retained lookup extractors are now used across lookup query paths, including:
- registered lookup extraction functions
- lookup dimension specs
- lookup joins
- lookup segment iteration
#### Added cache retirement to cached-global lookups
`CacheScheduler.VersionedCache` now tracks cache lifecycle states: live,
retired, and disposed. Cache versions can be retained by active query work.
When a refresh swaps in a new cache version, the previous version is retired
and tracked until it can be safely disposed.
To avoid unbounded retention, refreshes are skipped if the number of
retained retired cache versions reaches the configured limit. Retired cache
entries may also be forcibly disposed after a timeout to prevent abandoned
references from blocking future refreshes indefinitely.
Two new cached-global lookup configuration properties were added:
| Property | Description | Default |
|---|---|---:|
| `druid.lookup.namespace.maxRetiredCacheEntries` | The maximum number of
retired lookup cache versions to keep while they are still retained by
in-flight query work. Further cache refreshes are skipped until the retired
cache versions are released or time out. | `1` |
| `druid.lookup.namespace.retiredCacheEntryTimeoutMillis` | The amount of
time after retirement before a retained cache version may be disposed even if
its retained references have not been closed yet. This prevents abandoned
retained references from blocking future cache refreshes indefinitely. |
`900,000` |
#### Release note
Cached-global lookups now retain retired cache versions while in-flight
queries are still using them, preventing queries from referencing disposed
off-heap lookup cache data during concurrent lookup refreshes.
Two new configuration properties control this behavior:
`druid.lookup.namespace.maxRetiredCacheEntries` and
`druid.lookup.namespace.retiredCacheEntryTimeoutMillis`.
<hr>
##### Key changed/added classes in this PR
* `RetainedLookupExtractor`
* `LookupExtractorFactory`
* `NamespaceLookupExtractorFactory`
* `CacheScheduler`
* `NamespaceExtractionConfig`
* `LookupDimensionSpec`
* `RegisteredLookupExtractionFn`
* `LookupSegment`
* `LookupJoinable`
* `LookupJoinableFactory`
<hr>
This PR has:
- [x] been self-reviewed.
- [x] using the [concurrency
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
(Remove this item if the PR doesn't have any relation to concurrency.)
- [x] added documentation for new or modified features or behaviors.
- [x] a release note entry in the PR description.
- [ ] added Javadocs for most classes and all non-trivial methods. Linked
related entities via Javadoc links.
- [ ] added or updated version, license, or notice information in
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
- [x] added comments explaining the "why" and the intent of the code
wherever would not be obvious for an unfamiliar reader.
- [x] added unit tests or modified existing tests to cover new code paths,
ensuring the threshold for [code
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
is met.
- [ ] added integration tests.
- [x] been tested in a test Druid cluster.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]