[GitHub] [hudi] Gatsby-Lee commented on issue #3975: [SUPPORT] Question on hudi's delete statment taking too long

GitBox Wed, 17 Nov 2021 13:58:12 -0800


Gatsby-Lee commented on issue #3975:
URL: https://github.com/apache/hudi/issues/3975#issuecomment-972113121



   > Hi @xushiyan, MOR is not possible because it is not supported by AWS tools 
like Athena and this particular dataset has no filed guaranteed to be 100% 
immutable, and fields "near-immutable" would go trough the same problem. If 
fact the date could be considered near-immutable as on each load, I am 
upsetting over 100k rows and deleting only a few hundreds.
   > 
   > Ay other ideas on how to make the "getting small files from partitions" 
jobs run faster? And why are there 3 of such jobs running sequentially with 
different number of stages and tasks?
   > 
   > Thanks
   
   Hi, I happened to see your issue.
   I am also using Apache Hudi in AWS Glue.
   
   I am using MoR and I can query data through Amazon Athena.
   
   I picked MoR over CoW since I want to prevent "hudi writing"  spending time 
on rewriting Parquet.
   Do you have any reason to pick CoW over MoW?
   
   Thank you
   Gatsby


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [hudi] Gatsby-Lee commented on issue #3975: [SUPPORT] Question on hudi's delete statment taking too long

Reply via email to