[ 
https://issues.apache.org/jira/browse/HIVE-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14163012#comment-14163012
 ] 

Sushanth Sowmyan commented on HIVE-8371:
----------------------------------------

It is going to flip the hive behaviour in that it will disallow insert-into if 
there is already data - that was intentional, to be consistent between hive and 
hcatalog. The question is - do we want to allow appends to data? If so, hive 
and hcatalog should both allow it. If not, hive and hcatalog should both deny 
it.

I do understand the concern that HCatStorer behaviour has changed after being 
out for a long time, but from that same perspective, this new behaviour of 
HCatStorer has also been out for a while now, for publicly released hive.

This could still be preserved with yet another warehouse-level parameter for 
legacy behaviour that makes HCatStorer default to immutable, and hive default 
to mutable, but honestly, I think that's ugly and will cause more problems 
going forward for maintainability.

> HCatStorer should fail by default when publishing to an existing partition
> --------------------------------------------------------------------------
>
>                 Key: HIVE-8371
>                 URL: https://issues.apache.org/jira/browse/HIVE-8371
>             Project: Hive
>          Issue Type: Bug
>          Components: HCatalog
>    Affects Versions: 0.13.0, 0.14.0, 0.13.1
>            Reporter: Thiruvel Thirumoolan
>            Assignee: Thiruvel Thirumoolan
>              Labels: hcatalog, partition
>
> In Hive-12 and before (on in previous HCatalog releases) HCatStorer would 
> fail if the partition already exists (whether before launching the job or 
> during commit depending on the partitioning). HIVE-6406 changed that behavior 
> and by default does an append. This causes data quality issues since an rerun 
> (or duplicate run) won't fail (when it used to) and will just append to the 
> partition.
> A preferable approach would be to leave HCatStorer behavior as is (fail 
> during a duplicate publish) and support append through an option. Overwrite 
> also can be implemented in a similar fashion. Eg:
> store A into 'db.table' using 
> org.apache.hive.hcatalog.pig.HCatStorer('partspec', '', ' -append');



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to