[ https://issues.apache.org/jira/browse/HIVE-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Deepak Jaiswal updated HIVE-18350: ---------------------------------- Attachment: (was: HIVE-18350.4.patch) > load data should rename files consistent with insert statements > --------------------------------------------------------------- > > Key: HIVE-18350 > URL: https://issues.apache.org/jira/browse/HIVE-18350 > Project: Hive > Issue Type: Bug > Reporter: Deepak Jaiswal > Assignee: Deepak Jaiswal > Priority: Major > Attachments: HIVE-18350.1.patch, HIVE-18350.2.patch, > HIVE-18350.3.patch, HIVE-18350.4.patch > > > Insert statements create files of format ending with 0000_0, 0001_0 etc. > However, the load data uses the input file name. That results in inconsistent > naming convention which makes SMB joins difficult in some scenarios and may > cause trouble for other types of queries in future. > We need consistent naming convention. > For non-bucketed table, hive renames all the files regardless of how they > were named by the user. > For bucketed table, hive relies on user to name the files matching the bucket > in non-strict mode. Hive assumes that the data belongs to same bucket in a > file. In strict mode, loading bucketed table is disabled. > This will likely affect most of the tests which load data which is pretty > significant due to which it is further divided into two subtasks for smoother > merge. -- This message was sent by Atlassian JIRA (v7.6.3#76005)