[ https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674630#comment-15674630 ]
Sahil Takiar commented on HIVE-15199: ------------------------------------- * The patch does a {{listStatus}} for each file it needs to rename, is that necessary? It may be more efficient if only one {{listStatus}} is done outside the for loop in {{copyFiles}} I would suggest modifying the following code block: {code} fs.listStatus(dirPath, new PathFilter() { @Override public boolean accept(Path path) { if (path.getName().startsWith(filename)) { fileSet.add(path); } return false; } }); {code} Is the use of the {{PathFilter}} necessary, a {{PathFilter}} is typically used to filter out certain files from a {{listStatus}} call, but here is it being used to populate the {{fileSet}} object; we should be able to accomplish the same thing without using the {{PathFilter}}. > INSERT INTO data on S3 is replacing the old rows with the new ones > ------------------------------------------------------------------ > > Key: HIVE-15199 > URL: https://issues.apache.org/jira/browse/HIVE-15199 > Project: Hive > Issue Type: Bug > Components: Hive > Reporter: Sergio Peña > Assignee: Sergio Peña > Priority: Critical > Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, > HIVE-15199.3.patch, HIVE-15199.4.patch > > > Any INSERT INTO statement run on S3 tables and when the scratch directory is > saved on S3 is deleting old rows of the table. > {noformat} > hive> set hive.blobstore.use.blobstore.as.scratchdir=true; > hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1'; > hive> insert into table t1 values (1,'name1'); > hive> select * from t1; > 1 name1 > hive> insert into table t1 values (2,'name2'); > hive> select * from t1; > 2 name2 > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)