[ 
https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15767027#comment-15767027
 ] 

Thomas Poepping commented on HIVE-1620:
---------------------------------------

Hi Sahil,
 
Yes, direct write works well in production. There are definitely some difficult 
design decisions to be made, and as you say, there is no great solution to 
clean up after failure. Some other issues are: self-referencing insert 
overwrite data loss, metadata loss in dynamic partitioning, no good visibility 
of partial results. There are workarounds / best practices for these, though. 
We are happy to engage in conversation about pros and cons.
 
The biggest thing we would like to stress with these implementations is that 
they should be pluggable. The solution should be as generic as possible to 
avoid spaghetti code.
 
We think the best solution is to make this a conversation about the best 
design. We are happy to participate in a community design and implementation, 
drawing on our experience with these types of issues.

> Patch to write directly to S3 from Hive
> ---------------------------------------
>
>                 Key: HIVE-1620
>                 URL: https://issues.apache.org/jira/browse/HIVE-1620
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Vaibhav Aggarwal
>            Assignee: Vaibhav Aggarwal
>         Attachments: HIVE-1620.patch
>
>
> We want to submit a patch to Hive which allows user to write files directly 
> to S3.
> This patch allow user to specify an S3 location as the table output location 
> and hence eliminates the need  of copying data from HDFS to S3.
> Users can run Hive queries directly over the data stored in S3.
> This patch helps integrate hive with S3 better and quicker.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to