sanjiv1980 commented on issue #1070: How to do the bulk update .?
URL: https://github.com/apache/incubator-hudi/issues/1070#issuecomment-561019203
 
 
   @vinothchandar Thanks for prompt reply , Actually we have existing  
data-lake which contains around 5-years of historical data (near about in 
petabytes) , on top of that so many existing jobs are running, recently I have 
one use case where I need to update/delete existing user data based on their 
unique identification.
   
   For that I did one POC on Delta-Lake( I formed delta-lake (for one month 
data) and then run the delta lake API to perform update/delete operation) , 
which went well , but somehow we are not satisfy with timeline which has been 
taken by delta-lake.(compaction is one of the concern) , which I saw in your 
documentation, like It has handled significantly well. So I wanted to evaluate 
the same thing on HUDI.
   
   My job nature is to either update/delete those records. after that same data 
can be sharable with other job to do the further evaluation .
   
   Hope you understand my job nature. 
   
   
   
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to