Na Yang created HIVE-8756:
-----------------------------
Summary: numRows and rawDataSize are not collected by the Spark
stats [Spark Branch]
Key: HIVE-8756
URL: https://issues.apache.org/jira/browse/HIVE-8756
Project: Hive
Issue Type: Bug
Reporter: Na Yang
Run the following hive queries
{noformat}
set datanucleus.cache.collections=false;
set hive.stats.autogather=true;
set hive.merge.mapfiles=false;
set hive.merge.mapredfiles=false;
set hive.map.aggr=true;
create table tmptable(key string, value string);
INSERT OVERWRITE TABLE tmptable
SELECT unionsrc.key, unionsrc.value
FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1
UNION ALL
SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc;
DESCRIBE FORMATTED tmptable;
{noformat}
The hive on spark prints the following table parameters:
{noformat}
COLUMN_STATS_ACCURATE true
numFiles 2
numRows 0
rawDataSize 0
totalSize 225
{noformat}
The hive on mr prints the following table parameters:
{noformat}
able Parameters:
COLUMN_STATS_ACCURATE true
numFiles 2
numRows 26
rawDataSize 199
totalSize 225
{noformat}
As above we can see the numRows and rawDataSize are not collected by hive on
spark stats
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)