[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

Sahil Takiar (JIRA) Thu, 17 Nov 2016 11:48:12 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15674630#comment-15674630
 ]


Sahil Takiar commented on HIVE-15199:
-------------------------------------

* The patch does a {{listStatus}} for each file it needs to rename, is that 
necessary? It may be more efficient if only one {{listStatus}} is done outside 
the for loop in {{copyFiles}}

I would suggest modifying the following code block:

{code}
    fs.listStatus(dirPath, new PathFilter() {
      @Override
      public boolean accept(Path path) {
        if (path.getName().startsWith(filename)) {
          fileSet.add(path);
        }

        return false;
      }
    });
{code}

Is the use of the {{PathFilter}} necessary, a {{PathFilter}} is typically used 
to filter out certain files from a {{listStatus}} call, but here is it being 
used to populate the {{fileSet}} object; we should be able to accomplish the 
same thing without using the {{PathFilter}}.

> INSERT INTO data on S3 is replacing the old rows with the new ones
> ------------------------------------------------------------------
>
>                 Key: HIVE-15199
>                 URL: https://issues.apache.org/jira/browse/HIVE-15199
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>            Reporter: Sergio Peña
>            Assignee: Sergio Peña
>            Priority: Critical
>         Attachments: HIVE-15199.1.patch, HIVE-15199.2.patch, 
> HIVE-15199.3.patch, HIVE-15199.4.patch
>
>
> Any INSERT INTO statement run on S3 tables and when the scratch directory is 
> saved on S3 is deleting old rows of the table.
> {noformat}
> hive> set hive.blobstore.use.blobstore.as.scratchdir=true;
> hive> create table t1 (id int, name string) location 's3a://spena-bucket/t1';
> hive> insert into table t1 values (1,'name1');
> hive> select * from t1;
> 1       name1
> hive> insert into table t1 values (2,'name2');
> hive> select * from t1;
> 2       name2
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-15199) INSERT INTO data on S3 is replacing the old rows with the new ones

Reply via email to