liuhe0702 created HUDI-2777:
-------------------------------
Summary: Data import performance deteriorates because multiple
Spark jobs are started when data is written to disks.
Key: HUDI-2777
URL: https://issues.apache.org/jira/browse/HUDI-2777
Project: Apache Hudi
Issue Type: Bug
Components: Spark Integration
Affects Versions: 0.9.0
Environment: hudi 0.9.0
spark3.1.1
hive3.1.1
hadoop3.1.1
Reporter: liuhe0702
Attachments: image-2021-11-17-10-14-29-308.png
If multiple partitions exist and the final result of RDD.isEmpty is true, Spark
starts multiple jobs in 5-fold increment mode. As a result, the computing
performance deteriorates.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)