Re: Populating tables using hive and spark

2016-08-22 Thread Nitin Kumar
Hi Mich! There is no problem is displaying records or performing any aggregations on the records after inserting data from spark into the hive table. It is the count query (in hive) that returns the wrong result in hive prior to issuing the compute statistics command. On Mon, Aug 22, 2016 at 4:50

Re: Populating tables using hive and spark

2016-08-22 Thread Mich Talebzadeh
Ok This is my test 1) create table in Hive and populate it with two rows hive> create table testme (col1 int, col2 string); OK hive> insert into testme values (1,'London'); Query ID = hduser_20160821212812_2a8384af-23f1-4f28-9395-a99a5f4c1a4a OK hive> insert into testme values (2,'NY'); Query ID

Re: Populating tables using hive and spark

2016-08-22 Thread Nitin Kumar
Hi Furcy, If I execute the command "ANALYZE TABLE TEST_ORC COMPUTE STATISTICS" before checking the count from hive, Hive returns the correct count albeit it does not spawn a map-reduce job for computing the count. I'm running a HDP 2.4 Cluster with Hive 1.2.1.2.4 and Spark 1.6.1 If others can co

Re: Populating tables using hive and spark

2016-08-22 Thread Furcy Pin
Hi Nitin, I confirm that there is something odd here. I did the following test : create table test_orc (id int, name string, dept string) stored as ORC; insert into table test_orc values (1, 'abc', 'xyz'); insert into table test_orc values (2, 'def', 'xyz'); insert into table test_orc values (3,