[jira] Commented: (HIVE-1620) Patch to write directly to S3 from Hive

Joydeep Sen Sarma (JIRA) Wed, 13 Oct 2010 16:38:58 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-1620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12920808#action_12920808
 ]


Joydeep Sen Sarma commented on HIVE-1620:
-----------------------------------------

i agree that the speed efficiency may be worth the tradeoff in consistency. as 
you say - the messaging is critical. can we gate this feature on a new hive 
option that makes the user conscious of this tradeoff?

regarding the cleanup - please look at jobClose method in FileSinkOperator (I 
think). if the hive client is still functioning at the time the job fails - we 
can make an attempt to clean things up there (assuming that the file names are 
unique - which i am not sure about right now because we made some changes to 
shorten file names (that might have to be undone for this feature)).

one thing we have experienced in the past is that hadoop tasks continue to do 
stuff even after the job is technically 'complete'. so i think while the 
cleanup can help the 99% use case - there will be marginal cases where the 
output directory gets written to when it shouldn't. so having this gated on an 
option would still be worthwhile IMHO (for users who cannot afford 
speed-accuracy tradeoff).


> Patch to write directly to S3 from Hive
> ---------------------------------------
>
>                 Key: HIVE-1620
>                 URL: https://issues.apache.org/jira/browse/HIVE-1620
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Vaibhav Aggarwal
>            Assignee: Vaibhav Aggarwal
>         Attachments: HIVE-1620.patch
>
>
> We want to submit a patch to Hive which allows user to write files directly 
> to S3.
> This patch allow user to specify an S3 location as the table output location 
> and hence eliminates the need  of copying data from HDFS to S3.
> Users can run Hive queries directly over the data stored in S3.
> This patch helps integrate hive with S3 better and quicker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-1620) Patch to write directly to S3 from Hive

Reply via email to