[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

Lefty Leverenz (JIRA) Tue, 22 Nov 2016 16:10:05 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15121?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688365#comment-15688365
 ]


Lefty Leverenz commented on HIVE-15121:
---------------------------------------

Doc note:  This adds *hive.blobstore.optimizations.enabled* to HiveConf.java, 
so it needs to be documented in the wiki for release 2.2.0.  I recommend using 
the description in patch 2 (revision 3 on the Review Board) instead of 
referring back here for details:

{quote}
This parameter enables a number of optimizations when running on blobstores:
(1) If hive.blobstore.use.blobstore.as.scratchdir is false, force the last Hive 
job to write to the blobstore. This is a performance optimization that forces 
the final FileSinkOperator to write to the blobstore. The advantage is that any 
copying of data that needs to be done from the scratch directory to the final 
table directory can be done server-side, within the blobstore. The MoveTask 
simply renames data from the scratch directory to the final table location, 
which should translate to a server-side COPY request. This way HiveServer2 
doesn't have to actually copy any data, it just tells the blobstore to do all 
the work.
{quote}

I'm not sure if *hive.blobstore.optimizations.enabled* belongs in the general 
query execution section or a new blobstore section, along with the two 
parameters created by HIVE-14270.

* [Hive Configuration Properties | 
https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties]

Added a TODOC2.2 label.

> Last MR job in Hive should be able to write to a different scratch directory
> ----------------------------------------------------------------------------
>
>                 Key: HIVE-15121
>                 URL: https://issues.apache.org/jira/browse/HIVE-15121
>             Project: Hive
>          Issue Type: Sub-task
>          Components: Hive
>            Reporter: Sahil Takiar
>            Assignee: Sahil Takiar
>              Labels: TODOC2.2
>             Fix For: 2.2.0
>
>         Attachments: HIVE-15121.1.patch, HIVE-15121.2.patch, 
> HIVE-15121.3.patch, HIVE-15121.WIP.1.patch, HIVE-15121.WIP.2.patch, 
> HIVE-15121.WIP.patch, HIVE-15121.patch
>
>
> Hive should be able to configure all intermediate MR jobs to write to HDFS, 
> but the final MR job to write to S3.
> This will be useful for implementing parallel renames on S3. The idea is that 
> for a multi-job query, all intermediate MR jobs write to HDFS, and then the 
> final job writes to S3. Writing to HDFS should be faster than writing to S3, 
> so it makes more sense to write intermediate data to HDFS.
> The advantage is that any copying of data that needs to be done from the 
> scratch directory to the final table directory can be done server-side, 
> within the blobstore. The MoveTask simply renames data from the scratch 
> directory to the final table location, which should translate to a 
> server-side COPY request. This way HiveServer2 doesn't have to actually copy 
> any data, it just tells the blobstore to do all the work.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15121) Last MR job in Hive should be able to write to a different scratch directory

Reply via email to