[ https://issues.apache.org/jira/browse/HUDI-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Raymond Xu updated HUDI-2777: ----------------------------- Sprint: Cont' improve - 2022/03/7 > Data import performance deteriorates because multiple Spark jobs are started > when data is written to disks. > ----------------------------------------------------------------------------------------------------------- > > Key: HUDI-2777 > URL: https://issues.apache.org/jira/browse/HUDI-2777 > Project: Apache Hudi > Issue Type: Improvement > Components: spark > Affects Versions: 0.9.0 > Environment: hudi 0.9.0 > spark3.1.1 > hive3.1.1 > hadoop3.1.1 > Reporter: liuhe0702 > Assignee: liuhe0702 > Priority: Critical > Labels: hudi-on-call, pull-request-available, query-eng, sev:high > Fix For: 0.11.0 > > > If multiple partitions exist and the final result of RDD.isEmpty is true, > Spark starts multiple jobs in 5-fold increment mode. As a result, the > computing performance deteriorates. -- This message was sent by Atlassian Jira (v8.20.1#820001)