liuhe0702 created HUDI-2777:
-------------------------------

             Summary: Data import performance deteriorates because multiple 
Spark jobs are started when data is written to disks.
                 Key: HUDI-2777
                 URL: https://issues.apache.org/jira/browse/HUDI-2777
             Project: Apache Hudi
          Issue Type: Bug
          Components: Spark Integration
    Affects Versions: 0.9.0
         Environment: hudi 0.9.0
spark3.1.1
hive3.1.1
hadoop3.1.1
            Reporter: liuhe0702
         Attachments: image-2021-11-17-10-14-29-308.png

If multiple partitions exist and the final result of RDD.isEmpty is true, Spark 
starts multiple jobs in 5-fold increment mode. As a result, the computing 
performance deteriorates.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to