[jira] [Created] (FLINK-16818) Optimize data skew when flink write data to hive dynamic partition table

Jun Zhang (Jira) Thu, 26 Mar 2020 19:18:21 -0700

Jun Zhang created FLINK-16818:
---------------------------------

             Summary: Optimize data skew when flink write data to hive dynamic 
partition table
                 Key: FLINK-16818
                 URL: https://issues.apache.org/jira/browse/FLINK-16818
             Project: Flink
          Issue Type: Improvement
          Components: Connectors / Hive
    Affects Versions: 1.10.0
         Environment: {code:java}
 {code}
            Reporter: Jun Zhang
             Fix For: 1.11.0



I read the source table data of hive through flink sql, and then write the 
target table of hive. The target table is a partitioned table. When the data of 
a partition is particularly large, data skew occurs, resulting in a 
particularly long execution time.

By default Configuration, the same sql, hive on spark takes five minutes, and 
flink takes about 40 minutes.

example:

 
{code:java}
// the schema of myparttable

name string,
age int,
PARTITIONED BY ( 
type string, 
day string
)

INSERT OVERWRITE myparttable SELECT name, age, type,day from sourcetable;
{code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-16818) Optimize data skew when flink write data to hive dynamic partition table

Reply via email to