Rap70r commented on issue #5770: URL: https://github.com/apache/hudi/issues/5770#issuecomment-1164766592
I'm a bit confused. We are running Spark jobs on AWS EMR to merge changes against Hudi table on S3. We are passing the following parameter: `hoodie.parquet.max.file.size` which according to the documentation, it should maintain the size of files provided in this parameter. During initial load the files are split evenly according to that param. However, after several upserts, the files merge into a large one that greatly exceed the size specified in above parameter. Kindly point me to the configs in the documentation necessary to maintain the file size specified in above parameter for follow up upserts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
