Na Yang created HIVE-8756:
-----------------------------

             Summary: numRows and rawDataSize are not collected by the Spark 
stats [Spark Branch]
                 Key: HIVE-8756
                 URL: https://issues.apache.org/jira/browse/HIVE-8756
             Project: Hive
          Issue Type: Bug
            Reporter: Na Yang


Run the following hive queries
{noformat}
set datanucleus.cache.collections=false;
set hive.stats.autogather=true;
set hive.merge.mapfiles=false;
set hive.merge.mapredfiles=false;
set hive.map.aggr=true;

create table tmptable(key string, value string);
INSERT OVERWRITE TABLE tmptable
SELECT unionsrc.key, unionsrc.value 
FROM (SELECT 'tst1' AS key, cast(count(1) AS string) AS value FROM src s1
      UNION  ALL  
      SELECT s2.key AS key, s2.value AS value FROM src1 s2) unionsrc;
DESCRIBE FORMATTED tmptable;
{noformat}

The hive on spark prints the following table parameters:
{noformat}
COLUMN_STATS_ACCURATE   true                
        numFiles                2                   
        numRows                 0                   
        rawDataSize             0                   
        totalSize               225
{noformat}

The hive on mr prints the following table parameters:
{noformat}
able Parameters:                 
        COLUMN_STATS_ACCURATE   true                
        numFiles                2                   
        numRows                 26                  
        rawDataSize             199                 
        totalSize               225 
{noformat}

As above we can see the numRows and rawDataSize are not collected by hive on 
spark stats



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to