[ 
https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396090#comment-15396090
 ] 

Chris Nauroth commented on HIVE-14270:
--------------------------------------

Any approach that involves matching on scheme is going to be incomplete and 
error-prone, regardless of whether the logic lives in Hive or Hadoop Common.  
Users have flexibility to define new schemes or even remap existing schemes in 
their runtime configuration by setting configuration property 
{{fs.<SCHEME>.impl}}.  In practice, it's rare, but I have seen it done.

An API or an {{instanceof}} check to identify an object store would be more 
reliable, but then there is the additional challenge of 
[ViewFs|http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/ViewFs.html]
 defining a client-side mount table.  In that case, there is a single 
{{FileSystem}} instance visible to the caller, but it may route different 
{{Path}} instances to HDFS vs. S3A vs. something else.  This is something else 
that is a bit rare in practice, but I know at least Twitter does it.  This 
might imply that the HADOOP-9565 API needs to be sensitive to {{Path}}, not 
only the {{FileSystem}} instance.

> Write temporary data to HDFS when doing inserts on tables located on S3
> -----------------------------------------------------------------------
>
>                 Key: HIVE-14270
>                 URL: https://issues.apache.org/jira/browse/HIVE-14270
>             Project: Hive
>          Issue Type: Sub-task
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>         Attachments: HIVE-14270.1.patch
>
>
> Currently, when doing INSERT statements on tables located at S3, Hive writes 
> and reads temporary (or intermediate) files to S3 as well. 
> If HDFS is still the default filesystem on Hive, then we can keep such 
> temporary files on HDFS to keep things run faster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to