Thanks. I think using metastore api is what I wanted.

Thanks,
Bharath

From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com]
Sent: Thursday, August 18, 2016 2:04 PM
To: user
Subject: Re: Getting column statistics on paritioned Hive tables

In general in Hive 2 you can get statistics for partitions by running:

hive> analyze table sales partition (year, month) compute statistics;
Partition oraclehadoop.sales{year=2000, month=10} stats: [numFiles=256, 
numRows=21034, totalSize=1651890, rawDataSize=6226064]
Partition oraclehadoop.sales{year=1999, month=4} stats: [numFiles=256, 
numRows=16512, totalSize=1533145, rawDataSize=4887552]
Partition oraclehadoop.sales{year=1999, month=8} stats: [numFiles=256, 
numRows=22979, totalSize=1697346, rawDataSize=6801784]
Partition oraclehadoop.sales{year=2001, month=8} stats: [numFiles=256, 
numRows=23879, totalSize=1744781, rawDataSize=7068184]
Partition oraclehadoop.sales{year=1998, month=2} stats: [numFiles=256, 
numRows=14149, totalSize=1438496, rawDataSize=4188104]
Partition oraclehadoop.sales{year=1999, month=7} stats: [numFiles=256, 
numRows=21648, totalSize=1657439, rawDataSize=6407808]
Partition oraclehadoop.sales{year=1999, month=5} stats: [numFiles=256, 
numRows=19733, totalSize=1623643, rawDataSize=5840968]
Partition oraclehadoop.sales{year=1999, month=1} stats: [numFiles=256, 
numRows=20637, totalSize=1638403, rawDataSize=6108552]

The partition statistics are shown for each partition as above.

Here not only there are partitions but also each partition is bucketed into 256 
buckets.

Individual column stats can be obtained from the metadata table  part_col_stats

[Inline images 1]

If you are using ORC file, the statistics can be obtained from

 hive --orcfiledump --rowindex <FILE_PATH_ON_HDFS>


HTH



Dr Mich Talebzadeh



LinkedIn  
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw



http://talebzadehmich.wordpress.com



Disclaimer: Use it at your own risk. Any and all responsibility for any loss, 
damage or destruction of data or any other property which may arise from 
relying on this email's technical content is explicitly disclaimed. The author 
will in no case be liable for any monetary damages arising from such loss, 
damage or destruction.



On 18 August 2016 at 21:32, Gopal Vijayaraghavan 
<gop...@apache.org<mailto:gop...@apache.org>> wrote:
> Is there any way to access the column statistics for the whole table?

There's no column statistics for the whole table - the only way to get one
is to merge all the partition column statistics.

The metastore API actually exposes this (if you're looking for schema info
to read in a program).

https://hive.apache.org/javadocs/r2.0.1/api/org/apache/hadoop/hive/metastor
e/api/ThriftHiveMetastore.Processor.get_aggr_stats_for.html

+
https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/o
rg/apache/hadoop/hive/metastore/hbase/TestHBaseAggrStatsCacheIntegration.ja
va#L184



Cheers,
Gopal



Reply via email to