Hi Mich!
There is no problem is displaying records or performing any aggregations on
the records after inserting data from spark into the hive table. It is the
count query (in hive) that returns the wrong result in hive prior to
issuing the compute statistics command.
On Mon, Aug 22, 2016 at 4:50
Ok This is my test
1) create table in Hive and populate it with two rows
hive> create table testme (col1 int, col2 string);
OK
hive> insert into testme values (1,'London');
Query ID = hduser_20160821212812_2a8384af-23f1-4f28-9395-a99a5f4c1a4a
OK
hive> insert into testme values (2,'NY');
Query ID
Hi Furcy,
If I execute the command "ANALYZE TABLE TEST_ORC COMPUTE STATISTICS" before
checking the count from hive, Hive returns the correct count albeit it does
not spawn a map-reduce job for computing the count.
I'm running a HDP 2.4 Cluster with Hive 1.2.1.2.4 and Spark 1.6.1
If others can co
Hi Nitin,
I confirm that there is something odd here.
I did the following test :
create table test_orc (id int, name string, dept string) stored as ORC;
insert into table test_orc values (1, 'abc', 'xyz');
insert into table test_orc values (2, 'def', 'xyz');
insert into table test_orc values (3,