Hi Rohan As of now, we support queries with aggregate function count on columns which are the same as index key column and are used in group-by construct. For example, CREATE TABLE tbl(key int, value int); CREATE INDEX tbl_key_idx ON TABLE tbl(key) AS 'org.apache.hadoop.hive.ql.index.AggregateIndexHandler' WITH DEFERRED REBUILD IDXPROPERTIES("AGGREGATES"="count(key)"); ALTER INDEX tbl_key_idx ON tbl REBUILD;
It is valid to use the following query: select key, count(key) from tbl where key < 1000 group by key; But not valid to use: select value, count(key) from tbl group by value; (I think this is the same as your case) As of now, you cannot use the aggregate index in case of join queries. For the semantic analysis error, I apologize for the typo in the code. It worked for us as it was uniform in all the classes :) You forgot to change it in a few more instances and hence the semantic error. The aggregate index itself creates the column references with a 'count_Of..' in it. We have fixed the issue and should be updated in the trunk soon. Thanks for noticing it though. :) Regards, Prajakta On Wed, Sep 21, 2011 at 12:28 AM, rohan monga <monga.ro...@gmail.com> wrote: > Hi Prajakta, > Thanks, I was able to generate a plan which was using indices by > following your advise. > However, I want to group on one key and count the others > "select count(id2) from table_t1 group by id1", would that not be possible > ? > > Also, could you give me an example of using indices to speed up joins? > > And for the semantic analysis thing, I sent the error in my first > email, here are the lines that I modified to get it to work. Basically > changed "_count_Of" to "_count_of" > > Regards, > -- > Rohan Monga > > > > On Wed, Sep 21, 2011 at 4:22 AM, Prajakta Kalmegh <pkalm...@gmail.com> > wrote: > > Hi Rohan > > I run your queries on the same version from trunk and did not get any > > errors. Not sure why you are getting a semantic analysis error. It would > be > > good if you could send me a snapshot of the error message. > > About the second issue of getting the same plan even if you > > set hive.optimize.index.groupby to true or false, we do not optimize for > > cases where the count (aggregate functions) are applied on keys other > than > > the group-by keys. Since you are trying to get a count of id2 and have > the > > index key (as well as the group-by key) on id1, the optimization is not > > applied. Hence the same plan. > > Hope this helps. Please let me know if you have any questions. > > Regards, > > Prajakta > > > > > >> From: rohan monga <monga.ro...@gmail.com> > >> Date: September 19, 2011 11:26:29 PM PDT > >> To: <user@hive.apache.org> > >> Subject: problems with indices > >> Reply-To: <user@hive.apache.org> > >> > >> Hi, > >> I have a table and index that look like > >> > >> <snip> > >> CREATE TABLE table_t1(id1 int, id2 int) > >> CREATE INDEX table_t1_idx ON TABLE table_t1(id1) AS > >> 'org.apache.hadoop.hive.ql.index.AggregateIndexHandler' WITH DEFERRED > >> REBUILD IDXPROPERTIES("AGGREGATES"="count(id2)"); > >> ALTER INDEX table_t1_idx ON table_t1 REBUILD; > >> </snip> > >> > >> Now, I am seeing 2 problems with the query 'set > >> hive.optimize.index.groupby=false;select count(id2) from table_t1 > >> group by id1', > >> firstly, > >> > >> <snip> > >> FAILED: Error in semantic analysis: Line 1:11 Invalid table alias or > >> column reference '`_count_Of_id2`': (possible column names are: id1, > >> _bucketname, _offsets, _count_of_id2) > >> </snip> > >> > >> So, I assumed that it was a typo or something, and I change all 'Of' > >> to 'of' in the relevant places in the code. > >> > >> Then I ran the query again, with the option set to true and then to > >> false, the run times were almost the same. Also 'explain' on the > >> queries show identical query plans. > >> I am using hive revision 1172989 from trunk > >> > >> Could someone help me with this? > >> > >> Regards, > >> > >> -- > >> Rohan Monga > > > > > > >