[jira] [Commented] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3

JIRA Mon, 08 Aug 2016 10:25:06 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412096#comment-15412096
 ]


Sergio Peña commented on HIVE-14270:
------------------------------------

[~ashutoshc] I need a little advice here. The tests failing are because the new 
MR scratch directory created for the stats. I need to update all those files. 
But, instead of that, I was thinking on doing this:
{code}
      String statsTmpLoc;
      if (BlobStorageUtils.isBlobStoragePath(conf, dest_path)) {
        statsTmpLoc = ctx.getMRScratchDir().toString();
      } else {
        statsTmpLoc = ctx.getExtTmpPathRelTo(queryTmpdir).toString();
      }
      fileSinkDesc.setStatsTmpDir(statsTmpLoc);
      LOG.debug("Set stats collection dir : " + statsTmpLoc);
{code}

As you see in the code, if the 'dest_path' is on S3, then I use the scratch 
dir, or I use the 'queryTmpDir' external path as scratch dir (currently working 
this way). I don't like the conditional, but it is either use this or update 
all 72 tests to use the new scratch directory name.

Btw, this code is to fix the issue with INSERT OVERWRITE that was leaving 
.hive-staging directories on S3.

Any advice for here?

> Write temporary data to HDFS when doing inserts on tables located on S3
> -----------------------------------------------------------------------
>
>                 Key: HIVE-14270
>                 URL: https://issues.apache.org/jira/browse/HIVE-14270
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>         Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, 
> HIVE-14270.3.patch, HIVE-14270.4.patch
>
>
> Currently, when doing INSERT statements on tables located at S3, Hive writes 
> and reads temporary (or intermediate) files to S3 as well. 
> If HDFS is still the default filesystem on Hive, then we can keep such 
> temporary files on HDFS to keep things run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-14270) Write temporary data to HDFS when doing inserts on tables located on S3

Reply via email to