Hi Shivam,

There were a lot of changes around ACID with the Hive 3.0 release.
I assume below, that your question is about Hive 3.x release.

Hive ACID v2 implements UPDATE as deleting the old row, and creating a new one 
for performance reasons. See Eugene's nice presentation for the details:
https://www.slideshare.net/Hadoop_Summit/transactional-operations-in-apache-hive-present-and-future-102803358
 
<https://www.slideshare.net/Hadoop_Summit/transactional-operations-in-apache-hive-present-and-future-102803358>
https://www.youtube.com/watch?v=GyzU9wG0cFQ&t=834s 
<https://www.youtube.com/watch?v=GyzU9wG0cFQ&t=834s>

So if your UPDATE command changes every raw in the partition, then yes, 
essentially the whole partition is rewritten.

Just a side-note: Currently UPDATEs are only working for full ACID tables. With 
the current implementation full ACID tables should be stored in ORC file format.

I hope this helps,
Peter

> On Nov 20, 2019, at 08:34, Shivam Sharma <28shivamsha...@gmail.com> wrote:
> 
> Hi All,
> 
> If we do update column in Hive with data stored in parquet format does Hive 
> rewrite the whole partition or it upsert the only subset of files in that 
> partition?
> 
> Thanks
> 
> -- 
> Shivam Sharma
> Indian Institute Of Information Technology, Design and Manufacturing Jabalpur
> Email:- 28shivamsha...@gmail.com <mailto:28shivamsha...@gmail.com>
> LinkedIn:-https://www.linkedin.com/in/28shivamsharma 
> <https://www.linkedin.com/in/28shivamsharma>

Reply via email to