Hi Rohan

As of now, we support queries with aggregate function count on columns which
are the same as index key column and are used in group-by construct. For
example,
CREATE TABLE tbl(key int, value int);
CREATE INDEX tbl_key_idx ON TABLE tbl(key) AS
'org.apache.hadoop.hive.ql.index.AggregateIndexHandler' WITH DEFERRED
REBUILD IDXPROPERTIES("AGGREGATES"="count(key)");
ALTER INDEX tbl_key_idx ON tbl REBUILD;

It is valid to use the following query:
select key, count(key) from tbl where key < 1000 group by key;
But not valid to use:
select value, count(key) from tbl group by value; (I think this is the same
as your case)

As of now, you cannot use the aggregate index in case of join queries.

For the semantic analysis error, I apologize for the typo in the code. It
worked for us as it was uniform in all the classes :)
You forgot to change it in a few more instances and hence the semantic
error. The aggregate index itself creates the column references with a
'count_Of..' in it. We have fixed the issue and should be updated in the
trunk soon. Thanks for noticing it though. :)

Regards,
Prajakta








On Wed, Sep 21, 2011 at 12:28 AM, rohan monga <monga.ro...@gmail.com> wrote:

> Hi Prajakta,
> Thanks, I was able to generate a plan which was using indices by
> following your advise.
> However, I want to group on one key and count the others
> "select count(id2) from table_t1 group by id1", would that not be possible
> ?
>
> Also, could you give me an example of using indices to speed up joins?
>
> And for the semantic analysis thing, I sent the error in my first
> email, here are the lines that I modified to get it to work. Basically
> changed "_count_Of" to "_count_of"
>
> Regards,
> --
> Rohan Monga
>
>
>
> On Wed, Sep 21, 2011 at 4:22 AM, Prajakta Kalmegh <pkalm...@gmail.com>
> wrote:
> > Hi Rohan
> > I run your queries on the same version from trunk and did not get any
> > errors. Not sure why you are getting a semantic analysis error. It would
> be
> > good if you could send me a snapshot of the error message.
> > About the second issue of getting the same plan even if you
> > set hive.optimize.index.groupby to true or false, we do not optimize for
> > cases where the count (aggregate functions) are applied on keys other
> than
> > the group-by keys. Since you are trying to get a count of id2 and have
> the
> > index key (as well as the group-by key) on id1, the optimization is not
> > applied. Hence the same plan.
> > Hope this helps. Please let me know if you have any questions.
> > Regards,
> > Prajakta
> >
> >
> >> From: rohan monga <monga.ro...@gmail.com>
> >> Date: September 19, 2011 11:26:29 PM PDT
> >> To: <user@hive.apache.org>
> >> Subject: problems with indices
> >> Reply-To: <user@hive.apache.org>
> >>
> >> Hi,
> >> I have a table and index that look like
> >>
> >> <snip>
> >> CREATE TABLE table_t1(id1 int,  id2 int)
> >> CREATE INDEX table_t1_idx ON TABLE table_t1(id1) AS
> >> 'org.apache.hadoop.hive.ql.index.AggregateIndexHandler' WITH DEFERRED
> >> REBUILD IDXPROPERTIES("AGGREGATES"="count(id2)");
> >> ALTER INDEX table_t1_idx ON table_t1 REBUILD;
> >> </snip>
> >>
> >> Now, I am seeing 2 problems with the query 'set
> >> hive.optimize.index.groupby=false;select count(id2) from table_t1
> >> group by id1',
> >> firstly,
> >>
> >> <snip>
> >> FAILED: Error in semantic analysis: Line 1:11 Invalid table alias or
> >> column reference '`_count_Of_id2`': (possible column names are: id1,
> >> _bucketname, _offsets, _count_of_id2)
> >> </snip>
> >>
> >> So, I assumed that it was a typo or something, and I change all 'Of'
> >> to 'of' in the relevant places in the code.
> >>
> >> Then I ran the query again, with the option set to true and then to
> >> false, the run times were almost the same. Also 'explain' on the
> >> queries show identical query plans.
> >> I am using hive revision 1172989 from trunk
> >>
> >> Could someone help me with this?
> >>
> >> Regards,
> >>
> >> --
> >> Rohan Monga
> >
> >
> >
>

Reply via email to