sivabalan narayanan created HUDI-4919:
-----------------------------------------

             Summary: Sql MERGE INTO incurs too much memory overhead
                 Key: HUDI-4919
                 URL: https://issues.apache.org/jira/browse/HUDI-4919
             Project: Apache Hudi
          Issue Type: Bug
          Components: spark-sql
            Reporter: sivabalan narayanan


When using spark-sql MERGE INTO, memory requirement shoots up. To merge new 
incoming data for 120MB parquet file, memory requirement shoots up > 10GB. 

 

from user:

We are trying to process some input data which is of 5 GB (Parquet snappy 
compression) and this will try to insert/update Hudi table for 4 days (Day is 
partition).
My Data size in Hudi target table for each partition is like around 3.5GB to 
10GB.We are trying to process the data and our process is keep failing with OOM 
(java.lang.OutOfMemoryError: GC overhead limit exceeded).
We have tried with 32GB and 64GB of executor memory as well with 3 cores.
Our process is running fine when we have less updates and more inserts.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to