RE: Getting column statistics on paritioned Hive tables

2016-08-18 Thread Mankalale, Bharath
Thanks. I think using metastore api is what I wanted. Thanks, Bharath From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Thursday, August 18, 2016 2:04 PM To: user Subject: Re: Getting column statistics on paritioned Hive tables In general in Hive 2 you can get statistics for

Re: Getting column statistics on paritioned Hive tables

2016-08-18 Thread Mich Talebzadeh
In general in Hive 2 you can get statistics for partitions by running: hive> analyze table sales partition (year, month) compute statistics; Partition oraclehadoop.sales{year=2000, month=10} stats: [numFiles=256, numRows=21034, totalSize=1651890, rawDataSize=6226064] Partition oraclehadoop.sales{y

Re: Getting column statistics on paritioned Hive tables

2016-08-18 Thread Gopal Vijayaraghavan
> Is there any way to access the column statistics for the whole table? There's no column statistics for the whole table - the only way to get one is to merge all the partition column statistics. The metastore API actually exposes this (if you're looking for schema info to read in a program). ht

Getting column statistics on paritioned Hive tables

2016-08-18 Thread Mankalale, Bharath
Hi, I was trying to get column statistics for a partitioned hive table. Generally for a non-partitioned table I can run `ANALYZE TABLE TABLENAME COMPUTE STATISTICS FOR COLUMNS` and I can access the column statistics by `DESCRIBE FORMATTED TABLENAME COLUMNNAME` from the hive client. This does