Update Performance in Hive with data stored as Parquet, ORC

2019-11-19 Thread Shivam Sharma
Hi All, If we do update column in Hive with data stored in parquet format does Hive rewrite the whole partition or it upsert the only subset of files in that partition? Thanks -- Shivam Sharma Indian Institute Of Information Technology, Design and Manufacturing Jabalpur Email:- 28shivamsha...@g

Re: ORC: duplicate record - rowid meaning ?

2019-11-19 Thread David Morin
here after more details about ORC content and the fact we have duplicate rows: /delta_0011365_0011365_/bucket_3 {"operation":0,"originalTransaction":11365,"bucket":3,"rowId":0,"currentTransaction":11365,"row":{"TS":1574156027915254212,"cle":5218,...}} {"operation":0,"originalTransaction":

Hive server issue when accessing table created by spark shell

2019-11-19 Thread Justin Zhang (Gongming)
Dear All, We had an issue that: *Environment:* OS: Centos 7.5 Hive Server 2.3.4 Metastore Mysql Server 5.7.25 *Purpose:* I want to create a hive table using spark-shell/spark-submit jobs under account robot.prod and this table can be accessed(select) by beeline logged with the same account.