Understanding hive query plan for Join operation

2016-09-17 Thread Nitin Kumar
highly appreciate it if someone could clear the inconsistencies I observe in the query plan and the actual result. Thanks and regards, Nitin Kumar

Re: Populating tables using hive and spark

2016-08-22 Thread Nitin Kumar
ew?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* > > > > http://talebzadehmich.wordpress.com > > > *Disclaimer:* Use it at your own risk. Any and all responsibility for any > loss,

Re: Populating tables using hive and spark

2016-08-22 Thread Nitin Kumar
or with Spark), you end up with an inconsistency. > > So I guess we can call it a bug: > > Hive should detect that the files changed and invalidate its > pre-calculated count. > Optionally, Spark should be nice with Hive and update the the count when > inserting. > >

Populating tables using hive and spark

2016-08-22 Thread Nitin Kumar
Hi! I've noticed that hive has problems in registering new data records if the same table is written to using both the hive terminal and spark sql. The problem is demonstrated through the commands listed below hive> use default;

Varying vcores/ram for hive queries running Tez engine

2016-04-25 Thread Nitin Kumar
I was trying to benchmark some hive queries. I am using the tez execution engine. I varied the values of the following properties: 1. hive.tez.container.size 2. tez.task.resource.memory.mb 3. tez.task.resource.cpu.vcores Changes in values for property 1 is reflected properly.

Managing input split sizes in Hive running the tez engine

2016-04-20 Thread Nitin Kumar
Hi, I want to gain a better understanding of how in the input splits are calculated in the tez engine. I am aware that the *hive.input.format* property can be set to either *HiveInputFormat* (default) or to *CombineHiveInputFormat* (generally accepted for large number of files having sizes << hdf