Thanks. I think using metastore api is what I wanted. Thanks, Bharath
From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Thursday, August 18, 2016 2:04 PM To: user Subject: Re: Getting column statistics on paritioned Hive tables In general in Hive 2 you can get statistics for partitions by running: hive> analyze table sales partition (year, month) compute statistics; Partition oraclehadoop.sales{year=2000, month=10} stats: [numFiles=256, numRows=21034, totalSize=1651890, rawDataSize=6226064] Partition oraclehadoop.sales{year=1999, month=4} stats: [numFiles=256, numRows=16512, totalSize=1533145, rawDataSize=4887552] Partition oraclehadoop.sales{year=1999, month=8} stats: [numFiles=256, numRows=22979, totalSize=1697346, rawDataSize=6801784] Partition oraclehadoop.sales{year=2001, month=8} stats: [numFiles=256, numRows=23879, totalSize=1744781, rawDataSize=7068184] Partition oraclehadoop.sales{year=1998, month=2} stats: [numFiles=256, numRows=14149, totalSize=1438496, rawDataSize=4188104] Partition oraclehadoop.sales{year=1999, month=7} stats: [numFiles=256, numRows=21648, totalSize=1657439, rawDataSize=6407808] Partition oraclehadoop.sales{year=1999, month=5} stats: [numFiles=256, numRows=19733, totalSize=1623643, rawDataSize=5840968] Partition oraclehadoop.sales{year=1999, month=1} stats: [numFiles=256, numRows=20637, totalSize=1638403, rawDataSize=6108552] The partition statistics are shown for each partition as above. Here not only there are partitions but also each partition is bucketed into 256 buckets. Individual column stats can be obtained from the metadata table part_col_stats [Inline images 1] If you are using ORC file, the statistics can be obtained from hive --orcfiledump --rowindex <FILE_PATH_ON_HDFS> HTH Dr Mich Talebzadeh LinkedIn https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw http://talebzadehmich.wordpress.com Disclaimer: Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 18 August 2016 at 21:32, Gopal Vijayaraghavan <gop...@apache.org<mailto:gop...@apache.org>> wrote: > Is there any way to access the column statistics for the whole table? There's no column statistics for the whole table - the only way to get one is to merge all the partition column statistics. The metastore API actually exposes this (if you're looking for schema info to read in a program). https://hive.apache.org/javadocs/r2.0.1/api/org/apache/hadoop/hive/metastor e/api/ThriftHiveMetastore.Processor.get_aggr_stats_for.html + https://github.com/apache/hive/blob/master/itests/hive-unit/src/test/java/o rg/apache/hadoop/hive/metastore/hbase/TestHBaseAggrStatsCacheIntegration.ja va#L184 Cheers, Gopal