[ 
https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15409688#comment-15409688
 ] 

Ashutosh Chauhan commented on HIVE-14270:
-----------------------------------------

There is also a use case where data on S3 is much larger than that what HDFS 
cluster can hold. Imagine a case of TBs of table on S3 with only 3 node cluster 
with minimal space used for processing. This is a corner case, but may exist.
 In such cases it will be better to still use S3 (although slow) so that query 
succeeds. Till we figure out an automated way to discover such a scenario, one 
option could be to introduce another boolean config variable to optionally use 
blob storage for scratch dir even when blostorage is detected.

> Write temporary data to HDFS when doing inserts on tables located on S3
> -----------------------------------------------------------------------
>
>                 Key: HIVE-14270
>                 URL: https://issues.apache.org/jira/browse/HIVE-14270
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>         Attachments: HIVE-14270.1.patch, HIVE-14270.2.patch, 
> HIVE-14270.3.patch, HIVE-14270.4.patch
>
>
> Currently, when doing INSERT statements on tables located at S3, Hive writes 
> and reads temporary (or intermediate) files to S3 as well. 
> If HDFS is still the default filesystem on Hive, then we can keep such 
> temporary files on HDFS to keep things run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to