[ https://issues.apache.org/jira/browse/HIVE-21100?focusedWorklogId=716215&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-716215 ]
ASF GitHub Bot logged work on HIVE-21100: ----------------------------------------- Author: ASF GitHub Bot Created on: 27/Jan/22 06:43 Start Date: 27/Jan/22 06:43 Worklog Time Spent: 10m Work Description: pvary commented on a change in pull request #2921: URL: https://github.com/apache/hive/pull/2921#discussion_r793293292 ########## File path: ql/src/java/org/apache/hadoop/hive/ql/exec/MoveTask.java ########## @@ -97,6 +101,40 @@ public MoveTask() { super(); } + public void flattenUnionSubdirectories(Path sourcePath) throws HiveException { + try { + FileSystem fs = sourcePath.getFileSystem(conf); + LOG.info("Checking " + sourcePath + " for subdirectories to flatten"); + Set<Path> unionSubdirs = new HashSet<>(); + if (fs.exists(sourcePath)) { + RemoteIterator<LocatedFileStatus> i = fs.listFiles(sourcePath, true); + String prefix = AbstractFileMergeOperator.UNION_SUDBIR_PREFIX; + while (i.hasNext()) { + Path path = i.next().getPath(); + Path parent = path.getParent(); + if (parent.getName().startsWith(prefix)) { + // We do rename by including the name of parent directory into the filename so that there are no clashes + // when we move the files to the parent directory. Ex. HIVE_UNION_SUBDIR_1/000000_0 -> 1_000000_0 + String parentOfParent = parent.getParent().toString(); + String parentNameSuffix = parent.getName().substring(prefix.length()); + + fs.rename(path, new Path(parentOfParent + "/" + parentNameSuffix + "_" + path.getName())); Review comment: What happens if we already has this filename used? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: gitbox-unsubscr...@hive.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 716215) Time Spent: 2h (was: 1h 50m) > Allow flattening of table subdirectories resulted when using TEZ engine and > UNION clause > ---------------------------------------------------------------------------------------- > > Key: HIVE-21100 > URL: https://issues.apache.org/jira/browse/HIVE-21100 > Project: Hive > Issue Type: Improvement > Reporter: George Pachitariu > Assignee: George Pachitariu > Priority: Minor > Labels: pull-request-available > Attachments: HIVE-21100.1.patch, HIVE-21100.2.patch, > HIVE-21100.3.patch, HIVE-21100.patch > > Time Spent: 2h > Remaining Estimate: 0h > > Right now, when writing data into a table with Tez engine and the clause > UNION ALL is the last step of the query, Hive on Tez will create a > subdirectory for each branch of the UNION ALL. > With this patch the subdirectories are removed, and the files are renamed and > moved to the parent directory. -- This message was sent by Atlassian Jira (v8.20.1#820001)