[jira] [Comment Edited] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

Sahil Takiar (JIRA) Wed, 16 Nov 2016 10:35:13 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15671234#comment-15671234
 ]


Sahil Takiar edited comment on HIVE-15199 at 11/16/16 6:34 PM:
---------------------------------------------------------------

[~spena] a few comments:

* It may be better to take a hybrid of the list files approach + the exists 
approach; for blobstores like S3 listfiles is only eventually consistent; this 
means listfiles may not return all the files that are actually there. One way 
to get around this is to first do the listfiles, and then checks if the 
targetFilename exists or not. This has the advantage of the perf gains of using 
listfiles, but avoids the consistency problems
* I remember we discussed offline about concerns w.r.t multiple INSERT INTO 
queries running against the same table, but I just remembered that Hive Locking 
(https://cwiki.apache.org/confluence/display/Hive/Locking) should prevent that 
from ever happening, correct?
* It would be nice (although not necessary) if we changed the name of 
{{renameNonLocal}} to something more descriptive


was (Author: stakiar):
@spena a few comments:

* It may be better to take a hybrid of the list files approach + the exists 
approach; for blobstores like S3 listfiles is only eventually consistent; this 
means listfiles may not return all the files that are actually there. One way 
to get around this is to first do the listfiles, and then checks if the 
targetFilename exists or not. This has the advantage of the perf gains of using 
listfiles, but avoids the consistency problems
* I remember we discussed offline about concerns w.r.t multiple INSERT INTO 
queries running against the same table, but I just remembered that Hive Locking 
(https://cwiki.apache.org/confluence/display/Hive/Locking) should prevent that 
from ever happening, correct?
* It would be nice (although not necessary) if we changed the name of 
{{renameNonLocal}} to something more descriptive

> INSERT INTO data on S3 is replacing the old rows with the new ones
> ------------------------------------------------------------------
>
>                 Key: HIVE-15199
>                 URL: https://issues.apache.org/jira/browse/HIVE-15199
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>            Priority: Critical
>         Attachments: HIVE-15199.1.patch
>
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1       name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2       name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

Reply via email to