[ https://issues.apache.org/jira/browse/HIVE-14270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15396090#comment-15396090 ]
Chris Nauroth commented on HIVE-14270: -------------------------------------- Any approach that involves matching on scheme is going to be incomplete and error-prone, regardless of whether the logic lives in Hive or Hadoop Common. Users have flexibility to define new schemes or even remap existing schemes in their runtime configuration by setting configuration property {{fs.<SCHEME>.impl}}. In practice, it's rare, but I have seen it done. An API or an {{instanceof}} check to identify an object store would be more reliable, but then there is the additional challenge of [ViewFs|http://hadoop.apache.org/docs/r2.7.2/hadoop-project-dist/hadoop-hdfs/ViewFs.html] defining a client-side mount table. In that case, there is a single {{FileSystem}} instance visible to the caller, but it may route different {{Path}} instances to HDFS vs. S3A vs. something else. This is something else that is a bit rare in practice, but I know at least Twitter does it. This might imply that the HADOOP-9565 API needs to be sensitive to {{Path}}, not only the {{FileSystem}} instance. > Write temporary data to HDFS when doing inserts on tables located on S3 > ----------------------------------------------------------------------- > > Key: HIVE-14270 > URL: https://issues.apache.org/jira/browse/HIVE-14270 > Project: Hive > Issue Type: Sub-task > Reporter: Sergio Peña > Assignee: Sergio Peña > Attachments: HIVE-14270.1.patch > > > Currently, when doing INSERT statements on tables located at S3, Hive writes > and reads temporary (or intermediate) files to S3 as well. > If HDFS is still the default filesystem on Hive, then we can keep such > temporary files on HDFS to keep things run faster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)