kbuci commented on code in PR #12856:
URL: https://github.com/apache/hudi/pull/12856#discussion_r2025326598


##########
rfc/rfc-79/rfc-90.md:
##########
@@ -0,0 +1,310 @@
+w<!-- Licensed to the Apache Software Foundation (ASF) under one or more 
contributor license agreements. See the NOTICE
+file distributed with this work for additional information regarding copyright 
ownership. The ASF licenses this file to
+You under the Apache License, Version 2.0
+(the "License"); you may not use this file except in compliance with the 
License. You may obtain a copy of the License
+at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software 
distributed under the License is distributed on an "
+AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied. See the License for the specific
+language governing permissions and limitations under the License. -->
+
+# Add support for cancellable clustering table service plans
+
+## Proposers
+
+Krishen Bhan (kbuci)
+
+## Approvers
+
+Sivabalan Narayanan (nsivabalan)
+
+## Status
+
+In Progress
+
+JIRA: HUDI-7946
+
+## Abstract
+
+Clustering is a table service useed to optimize table/files layout in HUDI in 
order to speed up read queries. Currently
+ingestion writers will abort if they attempt to write to the same data 
targetted by a pending clustering write.
+As a result, clustering table service plans can indirectly delay ingestion 
writes from updating a dataset with recent data. 
+Furthermore, a clustering plan that isn't executed to completion for a large 
amount of time (due to repeated failures, application
+misconfiguration, or insufficient resources) will degrade the read/write 
performance of a dataset due to delaying clean,
+archival, and metadata table compaction. This is because currently HUDI 
clustering plans, upon being scheduled, must be
+executed to completion. This RFC proposes to support "Cancellable" Clustering 
plans. Support for such cancellable clustering plans 
+will provide HUDI an avenue to fully cancel a clustering plan and allow other 
table service and ingestion writers to proceed and avoiding
+starvation based on user needs.
+
+## Background
+
+### Current state of Execution of table service operations in Hudi
+
+As of now, the table service operations `COMPACT` and `CLUSTER` are implicitly 
"immutable" plans by default, meaning
+that once a plan is scheduled, it will stay as a pending instant until a 
caller invokes the table service execute API on
+the table service instant and successfully completes it (referred to as 
"executing" a table service). Specifically, if an inflight
+execution fails after transitioning the instant to inflight, the next 
execution attempt will implictly create and execute a rollback
+plan (which will delete all new instant/data files), but will keep the table 
service plan. And then the table service will be
+re-attempted. This process will repeat until the instant is completed. The 
below visualization captures these transitions at a high level
+
+![table service lifecycle 
(1)](https://github.com/user-attachments/assets/4a656bde-4046-4d37-9398-db96144207aa)
+
+## Goals

Review Comment:
   > Granularity of cancellation: Right now, cancellation is at the entire plan 
level. Could we consider more granular partial cancellation of only the file 
groups that conflict with an ingestion writer? This would be more complex but 
could preserve partial progress in some advanced scenarios.
   
   We could update this RFC to handle that, by making the file in .cancel 
non-empty if only some file groups are "cancelled", and continue the commit if 
that is the case. My main concern though is that I'm not sure the best way to 
implement that on the clustering side, since a committed clustering where some 
file groups were "cancelled" will have replaced files in the final 
.replacecommit that don't appear in the clustering repalcecommit.requested 
plan. So I'm not sure if that might cause some confusion or other issue. 
   
   > Applicability to other Table Services: I agree compaction is typically not 
as big a concern for blocking ingestion, but from a user’s perspective, having 
a consistent approach to cancellation could be valuable across all table 
services. Moreover, I was wondering if introducing an abort state in the 
timeline that can be applicable for all actions might be more useful instead of 
having cancellable action with abort state. Any action could be aborted due to 
usual conflict resolution or force quit (cancelled).
   
   For some background context, when I first brainstormed this RFC I did not 
have an .aborted state in my design, and instead I was thinking that a 
clustering plan that was "cancelled" could be rolled back completely similar to 
ingestion. But after discussing with @nsivabalan and other HUDI committers, I 
got the impression that HUDI users/committers wanted move away from having 
non-archival operations delete instants from the active timeline forever, and 
instead a different system/format where the "aborted" plan would still be on 
active timeline. The idea being that this would make debugging and future 
implementation changes easier. Because of this, I thought of adding a .aborted 
state to this RFC to make it aligned with that goal. 
   
   Going back to your original question here, as part of the implementation of 
supported .aborted state, I was thinking of enabling HUDI to consider .aborted 
as a valid state for all action types, since the implementation should involve 
ignoring data files/log file blocks that are associated with an aborted 
instant, the same way inflight instants are handled. In addition, .aborted 
instants should be deleted during archival (as mentioned in rollout plan). In 
terms of implementing cancellation for compaction though, if there are log 
blocks that were added to a file group associated with an inflight compaction 
plan, and said plan was aborted, I'm not sure if any other changes are needed 
to handle those log files. In 1.x I think this might not be a big issue but we 
would need to verify. Because of this complication (and the fact that there 
isn't as strong motivation as far as I know ) I was thinking of adding 
clean/compaction as an optional or later addition.
   
   > Conflict Resolution vs. Hard Cancel: Instead of automatically canceling 
every inflight plan that conflicts, an alternative is to "wait or forcibly 
proceed" if the concurrency model supports it (such as custom lock with 
ingestion writer preference). But that naturally re‐introduces the risk of 
ingestion blocking. So your approach is simpler and safer from ingestion’s 
perspective, but it might be worth stating explicitly that partial or "smarter" 
concurrency resolution is out of scope.
   
   Sure, let me clarify in RFC that we are opting for a simple/safer approach 
where, if a clustering plan is cancellable, we assume that ingestion should 
take "precedence".
   
   > Performance Overheads - The design calls for heartbeats, cancel folder 
checks, new transactions, etc. In practice, these operations should be 
relatively light overhead compared to typical big Spark tasks, but it is good 
to note a minimal overhead might exist.
   
   Yes these checks are lightweight (since during ingestion we anyways right 
now are reading all inflight clustering plans while holding a table lock). But 
let me state at the end that there will be a little bit more overhead now 
during clustering execution (Due to taking a lock and checking for heartbeat).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@hudi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to