Hi Guys,

We have a Hive 0.12 ORC table that is partitioned on year, month, day, hour
and is bucketed by one column.

So far so good - We are seeing good speed up improvements as compared to
non-ORC format.

   - Now we want to add an index on another commonly used column. My
   question was - Given ORC already has inline indexes, is this worth the
   effort?
   - Are indexes in Hive production ready in general? I have heard mixed
   feedback about this.

I tried creating an index on a column 'akey' like this

CREATE INDEX events_akey_index
ON TABLE events_indexed (akey)
AS 'BITMAP'
WITH DEFERRED REBUILD
IDXPROPERTIES ('creator' = 'me', 'created_at' = 'some_time')
IN TABLE events_akey_index_table
PARTITIONED BY (year , month , day, hour);
COMMENT 'Events table indexed by akey';

But I get a parse error something like this

FAILED: ParseException line 7:0 missing EOF at 'PARTITIONED' near
'events_akey_index_table'

If I drop the partitioned by column, then things work fine and the index is
created but corresponding query using the index is actually slower than on
the non-indexed version of the table since it doesn't use the partition
information.

Any idea what am I missing here?

Sagar

Reply via email to