[ https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15672593#comment-15672593 ]
Sahil Takiar commented on HIVE-15199: ------------------------------------- * Is the goal to trigger mvFile when the destination file is a blobstore? I don't think thats the right approach because a {{FileUtils.copy}} will do a client-side copy when running on S3, data will be downloaded from HDFS to HS2 and then uploaded to S3; the target should be to do a server-side copy (happens internally on S3). A server side copy can only be triggered by called {{FileSystem.rename}}. * The listing optimization can be applied to HDFS too, right? It should increase perf when running on HDFS too. * A bit orthogonal to this JIRA, but {{mvFile}} should probably be called copyFile because it always copies data. > INSERT INTO data on S3 is replacing the old rows with the new ones > ------------------------------------------------------------------ > > Key: HIVE-15199 > URL: https://issues.apache.org/jira/browse/HIVE-15199 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Sergio Peña > Assignee: Sergio Peña > Priority: Critical > Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, > HIVE-15199.3.patch > > > Any INSERT INTO statement run on S3 tables and when the scratch directory is > saved on S3 is deleting old rows of the table. > {noformat} > hive> set hive.blobstore.use.blobstore.as.scratchdir=true; > hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1'; > hive> insert into table t1 values (1,'name1'); > hive> select * from t1; > 1 name1 > hive> insert into table t1 values (2,'name2'); > hive> select * from t1; > 2 name2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)