Hi Ranjith, Here are the steps for using an index in Hive 0.7.1. 1) Create the index CREATE INDEX x ON TABLE t(j) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD;
2) Build the index (since you specified the 'DEFERRED REBUILD' flag in the create index statement ALTER INDEX x ON t REBUILD; 3) Use the index If your original query is: select a, count(*) from t where j='and' group by a; and you wanted to use the index on column j, do something like: INSERT OVERWRITE DIRECTORY '/tmp/indexes/x' SELECT `_bucketname`, `_offsets` FROM default__t_x__ where j='and'; (The name default__t_x__ can be found in the output of step 2. Also, /tmp/indexes directory needs to exist in HDFS. You can substitute this to be any pre-existing directory in HDFS) SET hive.index.compact.file=/tmp/indexes/x; SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat; SELECT a, count(*) from t where j='and' group by a; Since the semantics of this usage make you specify the compact file, I have not been able to figure out a way to use multiple indexes in the same query. In this case we are using the index on j, the column in the where clause. I hope you now understand why indexing in Hive is a work in progress:-) Good luck! Mark Mark Grover, Business Intelligence Analyst OANDA Corporation www: oanda.com www: fxtrade.com ----- Original Message ----- From: "Ranjith Raghunath" <ranjith.raghuna...@usaa.com> To: "user@hive.apache.org" <user@hive.apache.org> Cc: "terry.zhao...@huawei.com" <terry.zhao...@huawei.com> Sent: Wednesday, May 16, 2012 9:46:23 PM Subject: Re: Indexing in hive Thanks Mark, Carl, and Ransom. I really appreciate the answers here. I am using Hive 0.7.1 and currently trying to create an index to help with performance associated to a particular where clause. I have not set any properties as mentioned below. I can try the options you listed below. Based on the feedback, I do have a few questions: 1. It seems like if you optimize for a groupby you cannot optimize for the where clause. I am reading this correctly? 2. How do you build the index? 3. Does the build process create a dataset that contains the index keys along with an offset value associated with the row? 4. And I am guessing that you need to use the columns in the same order as it is defined in the index? Thanks, Ranjith ----- Original Message ----- From: Mark Grover [mailto:mgro...@oanda.com] Sent: Wednesday, May 16, 2012 07:52 PM To: user@hive.apache.org <user@hive.apache.org> Cc: Zhaojun (Terry) <terry.zhao...@huawei.com> Subject: Re: Indexing in hive Ransom, >From this JIRA (https://issues.apache.org/jira/browse/HIVE-1644), it looks >like automatic use of indexes using hive.optimize.index.filter was introduced >in Hive 0.8. However, Ranjith seems to be using Hive 0.7.1 which doesn't >support those properties. Ranjith, you need to set the appropriate priorities before calling your query to make use of indexes. Are you setting any properties? If so, what? I will try to dig up what those properties are, in the meanwhile. Mark ----- Original Message ----- From: "Hezhiqiang (Ransom)" <ransom.hezhiqi...@huawei.com> To: user@hive.apache.org Cc: "Zhaojun (Terry)" <terry.zhao...@huawei.com> Sent: Wednesday, May 16, 2012 8:32:55 PM Subject: RE: Indexing in hive “ hive.optimize.index.filter ” is the conf automatically use indexes If u set hive.optimize.index.groupby = true. It will set hive.optimize.index.filter =false. See your configurations. And you need to build index after create index. Best regards Ransom.