[ https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412096#comment-15412096 ]
Sergio Peña commented on HIVE-14270: ------------------------------------ [~ashutoshc] I need a little advice here. The tests failing are because the new MR scratch directory created for the stats. I need to update all those files. But, instead of that, I was thinking on doing this: {code} String statsTmpLoc; if (BlobStorageUtils.isBlobStoragePath(conf, dest_path)) { statsTmpLoc = ctx.getMRScratchDir().toString(); } else { statsTmpLoc = ctx.getExtTmpPathRelTo(queryTmpdir).toString(); } fileSinkDesc.setStatsTmpDir(statsTmpLoc); LOG.debug("Set stats collection dir : " + statsTmpLoc); {code} As you see in the code, if the 'dest_path' is on S3, then I use the scratch dir, or I use the 'queryTmpDir' external path as scratch dir (currently working this way). I don't like the conditional, but it is either use this or update all 72 tests to use the new scratch directory name. Btw, this code is to fix the issue with INSERT OVERWRITE that was leaving .hive-staging directories on S3. Any advice for here? > Write temporary data to HDFS when doing inserts on tables located on S3 > ----------------------------------------------------------------------- > > Key: HIVE-14270 > URL: https://issues.apache.org/jira/browse/HIVE-14270 > Project: Hive > Issue Type: Sub-task > Reporter: Sergio Peña > Assignee: Sergio Peña > Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, > HIVE-14270.3.patch, HIVE-14270.4.patch > > > Currently, when doing INSERT statements on tables located at S3, Hive writes > and reads temporary (or intermediate) files to S3 as well. > If HDFS is still the default filesystem on Hive, then we can keep such > temporary files on HDFS to keep things run faster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)