kbuci opened a new pull request, #18280: URL: https://github.com/apache/hudi/pull/18280
…ion .requested instants exist ### Describe the issue this Pull Request addresses https://github.com/apache/hudi/issues/17907 ### Summary and Changelog **Summary:** When `PreferWriterConflictResolutionStrategy` is used for write conflict resolution, clustering commits can now be configured to self-abort if there are any pending ingestion instants (in `.requested` state) that have an active heartbeat but have not yet transitioned to `.inflight`. This prevents clustering from committing and potentially causing a conflict with an ongoing ingestion write, since ingestion should have precedence over clustering. The heartbeat timeout is inferred from the write config's heartbeat interval and tolerable misses, eliminating the need for a separate hardcoded timeout. A new dedicated exception (`HoodieWriteConflictAwaitingIngestionInflightException`) and metric (`conflict_resolution.awaiting_ingestion_inflight.counter`) are emitted when this scenario occurs, allowing operators to distinguish this specific failure mode from general write conflicts. **Changelog:** - **hudi-common** - `HoodieWriteConflictAwaitingIngestionInflightException`: New exception extending `HoodieWriteConflictException`, thrown when clustering detects a pending ingestion instant with an active heartbeat. - **hudi-client-common** - `ConflictResolutionStrategy`: Added a new default overload `getCandidateInstants(metaClient, currentInstant, lastSuccessfulInstant, Option<HoodieWriteConfig>)` that delegates to the existing method. Implementations can override to use the write config for additional behavior. - `PreferWriterConflictResolutionStrategy`: Overrides the new `getCandidateInstants` overload. When the current operation is clustering and `hoodie.clustering.block_for_pending_ingestion` is enabled, checks ingestion `.requested` instants for active heartbeats. If an active heartbeat is found, throws `HoodieWriteConflictAwaitingIngestionInflightException`. If the heartbeat is expired, the instant is ignored (assumed to be a failed write). - `HoodieWriteConfig`: Added `CLUSTERING_BLOCK_FOR_PENDING_INGESTION` config property (default `false`), getter `isClusteringBlockForPendingIngestion()`, and builder method `withClusteringBlockForPendingIngestion(boolean)`. - `TransactionUtils`: Updated `resolveWriteConflictIfAny` to pass `Option.of(config)` to the new `getCandidateInstants` overload, threading the write config through. - `BaseHoodieClient`: Updated `resolveWriteConflict` to catch `HoodieWriteConflictAwaitingIngestionInflightException` and emit a dedicated metric before re-throwing. - `HoodieMetrics`: Added `emitConflictResolutionAwaitingIngestionInflight()` counter metric. - `TestPreferWriterConflictResolutionStrategy`: Added 3 tests covering: (1) clustering self-aborts with active heartbeat ingestion `.requested` instant, (2) no blocking when config is disabled, (3) old method signature delegates with defaults (blocking disabled). ### Impact - **User-facing:** New opt-in config `hoodie.clustering.block_for_pending_ingestion` (default `false`). No change to existing behavior unless explicitly enabled. - **Behavior:** When enabled, clustering writes using `PreferWriterConflictResolutionStrategy` will fail with `HoodieWriteConflictAwaitingIngestionInflightException` if they detect an ingestion `.requested` instant with an active heartbeat, preventing clustering from committing and risking a conflict with ongoing ingestion. - **Performance:** Negligible. When enabled, adds heartbeat file existence checks for `.requested` ingestion instants during clustering commit conflict resolution. ### Risk Level **Low.** Config defaults to `false`, so existing behavior is unchanged. The new exception is a subclass of `HoodieWriteConflictException`, so existing catch blocks continue to work. All existing tests pass; 3 new tests validate the new behavior. ### Contributor's checklist - [x] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [x] Enough context is provided in the sections above - [x] Adequate tests were added if applicable Made-with: Cursor ### Describe the issue this Pull Request addresses <!-- Either describe the issue inline here with motivation behind the changes (or) link to an issue by including `Closes #<issue-number>` for context. If this PR includes changes to the storage format, public APIs, or has breaking changes, use `!` (e.g., feat!: ...) --> ### Summary and Changelog <!-- Short, plain-English summary of what users gain or what changed in behavior. Followed by a detailed log of all the changes. Highlight if any code was copied. --> ### Impact <!-- Describe any public API or user-facing feature change or any performance impact. --> ### Risk Level <!-- Accepted values: none, low, medium or high. Other than `none`, explain the risk. If medium or high, explain what verification was done to mitigate the risks. --> ### Documentation Update <!-- Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none". - The config description must be updated if new configs are added or the default value of the configs are changed. - Any new feature or user-facing change requires updating the Hudi website. Please follow the [instruction](https://hudi.apache.org/contribute/developer-setup#website) to make changes to the website. --> ### Contributor's checklist - [ ] Read through [contributor's guide](https://hudi.apache.org/contribute/how-to-contribute) - [ ] Enough context is provided in the sections above - [ ] Adequate tests were added if applicable -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
