Gatsby-Lee commented on issue #3975: URL: https://github.com/apache/hudi/issues/3975#issuecomment-972113121
> Hi @xushiyan, MOR is not possible because it is not supported by AWS tools like Athena and this particular dataset has no filed guaranteed to be 100% immutable, and fields "near-immutable" would go trough the same problem. If fact the date could be considered near-immutable as on each load, I am upsetting over 100k rows and deleting only a few hundreds. > > Ay other ideas on how to make the "getting small files from partitions" jobs run faster? And why are there 3 of such jobs running sequentially with different number of stages and tasks? > > Thanks Hi, I happened to see your issue. I am also using Apache Hudi in AWS Glue. I am using MoR and I can query data through Amazon Athena. I picked MoR over CoW since I want to prevent "hudi writing" spending time on rewriting Parquet. Do you have any reason to pick CoW over MoW? Thank you Gatsby -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
