Hi Prajakta,
Thanks, I was able to generate a plan which was using indices by
following your advise.
However, I want to group on one key and count the others
"select count(id2) from table_t1 group by id1", would that not be possible ?

Also, could you give me an example of using indices to speed up joins?

And for the semantic analysis thing, I sent the error in my first
email, here are the lines that I modified to get it to work. Basically
changed "_count_Of" to "_count_of"

Regards,
--
Rohan Monga



On Wed, Sep 21, 2011 at 4:22 AM, Prajakta Kalmegh <pkalm...@gmail.com> wrote:
> Hi Rohan
> I run your queries on the same version from trunk and did not get any
> errors. Not sure why you are getting a semantic analysis error. It would be
> good if you could send me a snapshot of the error message.
> About the second issue of getting the same plan even if you
> set hive.optimize.index.groupby to true or false, we do not optimize for
> cases where the count (aggregate functions) are applied on keys other than
> the group-by keys. Since you are trying to get a count of id2 and have the
> index key (as well as the group-by key) on id1, the optimization is not
> applied. Hence the same plan.
> Hope this helps. Please let me know if you have any questions.
> Regards,
> Prajakta
>
>
>> From: rohan monga <monga.ro...@gmail.com>
>> Date: September 19, 2011 11:26:29 PM PDT
>> To: <user@hive.apache.org>
>> Subject: problems with indices
>> Reply-To: <user@hive.apache.org>
>>
>> Hi,
>> I have a table and index that look like
>>
>> <snip>
>> CREATE TABLE table_t1(id1 int,  id2 int)
>> CREATE INDEX table_t1_idx ON TABLE table_t1(id1) AS
>> 'org.apache.hadoop.hive.ql.index.AggregateIndexHandler' WITH DEFERRED
>> REBUILD IDXPROPERTIES("AGGREGATES"="count(id2)");
>> ALTER INDEX table_t1_idx ON table_t1 REBUILD;
>> </snip>
>>
>> Now, I am seeing 2 problems with the query 'set
>> hive.optimize.index.groupby=false;select count(id2) from table_t1
>> group by id1',
>> firstly,
>>
>> <snip>
>> FAILED: Error in semantic analysis: Line 1:11 Invalid table alias or
>> column reference '`_count_Of_id2`': (possible column names are: id1,
>> _bucketname, _offsets, _count_of_id2)
>> </snip>
>>
>> So, I assumed that it was a typo or something, and I change all 'Of'
>> to 'of' in the relevant places in the code.
>>
>> Then I ran the query again, with the option set to true and then to
>> false, the run times were almost the same. Also 'explain' on the
>> queries show identical query plans.
>> I am using hive revision 1172989 from trunk
>>
>> Could someone help me with this?
>>
>> Regards,
>>
>> --
>> Rohan Monga
>
>
>

Attachment: rohan.diff
Description: Binary data

Reply via email to