nsivabalan commented on PR #13064:
URL: https://github.com/apache/hudi/pull/13064#issuecomment-2872951702

   hey @danny0405 : 
   wrt to not introducing TransactionManager in HoodieTable, here are the 
changes it might introduce. 
   
   1. Flink: CompactionPlanOperator instantiate TransactionManager and pass it 
to CompactionUtil.rollbackCompaction(table). Or instantiate TransactionManager 
in CompactionUtil. 
   2. RunCompactionActionExecutor.execute() calls into preCommit which can 
trigger rollback. And so, this percolates to Hoodie*Table.logCompact, 
Hoodie*Table.compact 
   -> which in turn goes into Hoodie*TableServiceClient.logCompact and 
Hoodie*TableServiceClient.comact(). 
   but this means, we need to change public apis in HoodieTable for compact() 
and logCompact(), rollback APIs to take in an Option<TransactionManager>. All 
of our public apis are very tightly designed. This causes divergence. 
   3. Flink: CompactionCommitSink calls into rollbackCompaction(). We need to 
introduce TransactionManager in CompactionCommitSink.
   4. HoodieFlinkCompactor.compact() could call into rollbackCompaction(). 
Might have to add TransactionManager to HoodieFlinkCompactor.
   5. BaseHoodieTableServiceClient.purgePendingClustering() can call into 
rollback. BaseHoodieTableServiceClient.cluster() also could call into rollback. 
We just need to pass in the TransactionManager. 
   
   
   May main concern is the change in public apis like HoodieTable.compact(), 
HoodieTable.logCompact(), HoodieTable.rollback*() APIs. 
   
   Attaching pic on the APIs we have in HoodieTable : 
   <img width="593" alt="image" 
src="https://github.com/user-attachments/assets/c374e817-afda-4fd4-b2a5-7dca8728bec7";
 />
   <img width="817" alt="image" 
src="https://github.com/user-attachments/assets/25f69a48-bedd-48ee-905c-5a8d4bd8383c";
 />
   
   For Flink specifically, I might need some help as well (whether 
CompactionCommitSink is the right place to introduce the TransactionManager or 
not). 
   
   Also, I was looking at places where we instantiate TransactionManager, looks 
like we don't have it clean either. 
   <img width="1185" alt="image" 
src="https://github.com/user-attachments/assets/dc4fdcaf-8c95-4b59-9078-cc5909c5b38e";
 />
   
   We have many usages in ActionExecutors which are expected to be called from 
Hoodie*Table. So, not sure if its worth cleaning it up in this patch where we 
are looking to fix rollback scheduling. This could be much larger scope if we 
wanted to clean up the instantiations of TransactionManager to standardize it 
and keep it at either WriteClient or TableServiceClient. 
   
   Let me know what do you think. 
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to