Jun Zhang created FLINK-16818: --------------------------------- Summary: Optimize data skew when flink write data to hive dynamic partition table Key: FLINK-16818 URL: https://issues.apache.org/jira/browse/FLINK-16818 Project: Flink Issue Type: Improvement Components: Connectors / Hive Affects Versions: 1.10.0 Environment: {code:java} {code} Reporter: Jun Zhang Fix For: 1.11.0
I read the source table data of hive through flink sql, and then write the target table of hive. The target table is a partitioned table. When the data of a partition is particularly large, data skew occurs, resulting in a particularly long execution time. By default Configuration, the same sql, hive on spark takes five minutes, and flink takes about 40 minutes. example: {code:java} // the schema of myparttable name string, age int, PARTITIONED BY ( type string, day string ) INSERT OVERWRITE myparttable SELECT name, age, type,day from sourcetable; {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)