[jira] [Created] (HUDI-2777) Data import performance deteriorates because multiple Spark jobs are started when data is written to disks.

liuhe0702 (Jira) Tue, 16 Nov 2021 18:15:07 -0800

liuhe0702 created HUDI-2777:
-------------------------------

             Summary: Data import performance deteriorates because multiple 
Spark jobs are started when data is written to disks.
                 Key: HUDI-2777
                 URL: https://issues.apache.org/jira/browse/HUDI-2777
             Project: Apache Hudi
          Issue Type: Bug
          Components: Spark Integration
    Affects Versions: 0.9.0
         Environment: hudi 0.9.0
spark3.1.1
hive3.1.1
hadoop3.1.1
            Reporter: liuhe0702
         Attachments: image-2021-11-17-10-14-29-308.png


If multiple partitions exist and the final result of RDD.isEmpty is true, Spark 
starts multiple jobs in 5-fold increment mode. As a result, the computing 
performance deteriorates.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

[jira] [Created] (HUDI-2777) Data import performance deteriorates because multiple Spark jobs are started when data is written to disks.

Reply via email to