Rap70r commented on issue #5770:
URL: https://github.com/apache/hudi/issues/5770#issuecomment-1164766592

   I'm a bit confused. We are running Spark jobs on AWS EMR to merge changes 
against Hudi table on S3. We are passing the following parameter: 
`hoodie.parquet.max.file.size` which according to the documentation, it should 
maintain the size of files provided in this parameter. During initial load the 
files are split evenly according to that param. However, after several upserts, 
the files merge into a large one that greatly exceed the size specified in 
above parameter. Kindly point me to the configs in the documentation necessary 
to maintain the file size specified in above parameter for follow up upserts.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to