Hi Prajakta, Thanks, I was able to generate a plan which was using indices by following your advise. However, I want to group on one key and count the others "select count(id2) from table_t1 group by id1", would that not be possible ?
Also, could you give me an example of using indices to speed up joins? And for the semantic analysis thing, I sent the error in my first email, here are the lines that I modified to get it to work. Basically changed "_count_Of" to "_count_of" Regards, -- Rohan Monga On Wed, Sep 21, 2011 at 4:22 AM, Prajakta Kalmegh <pkalm...@gmail.com> wrote: > Hi Rohan > I run your queries on the same version from trunk and did not get any > errors. Not sure why you are getting a semantic analysis error. It would be > good if you could send me a snapshot of the error message. > About the second issue of getting the same plan even if you > set hive.optimize.index.groupby to true or false, we do not optimize for > cases where the count (aggregate functions) are applied on keys other than > the group-by keys. Since you are trying to get a count of id2 and have the > index key (as well as the group-by key) on id1, the optimization is not > applied. Hence the same plan. > Hope this helps. Please let me know if you have any questions. > Regards, > Prajakta > > >> From: rohan monga <monga.ro...@gmail.com> >> Date: September 19, 2011 11:26:29 PM PDT >> To: <user@hive.apache.org> >> Subject: problems with indices >> Reply-To: <user@hive.apache.org> >> >> Hi, >> I have a table and index that look like >> >> <snip> >> CREATE TABLE table_t1(id1 int, id2 int) >> CREATE INDEX table_t1_idx ON TABLE table_t1(id1) AS >> 'org.apache.hadoop.hive.ql.index.AggregateIndexHandler' WITH DEFERRED >> REBUILD IDXPROPERTIES("AGGREGATES"="count(id2)"); >> ALTER INDEX table_t1_idx ON table_t1 REBUILD; >> </snip> >> >> Now, I am seeing 2 problems with the query 'set >> hive.optimize.index.groupby=false;select count(id2) from table_t1 >> group by id1', >> firstly, >> >> <snip> >> FAILED: Error in semantic analysis: Line 1:11 Invalid table alias or >> column reference '`_count_Of_id2`': (possible column names are: id1, >> _bucketname, _offsets, _count_of_id2) >> </snip> >> >> So, I assumed that it was a typo or something, and I change all 'Of' >> to 'of' in the relevant places in the code. >> >> Then I ran the query again, with the option set to true and then to >> false, the run times were almost the same. Also 'explain' on the >> queries show identical query plans. >> I am using hive revision 1172989 from trunk >> >> Could someone help me with this? >> >> Regards, >> >> -- >> Rohan Monga > > >
rohan.diff
Description: Binary data