Hi Guys, We have a Hive 0.12 ORC table that is partitioned on year, month, day, hour and is bucketed by one column.
So far so good - We are seeing good speed up improvements as compared to non-ORC format. - Now we want to add an index on another commonly used column. My question was - Given ORC already has inline indexes, is this worth the effort? - Are indexes in Hive production ready in general? I have heard mixed feedback about this. I tried creating an index on a column 'akey' like this CREATE INDEX events_akey_index ON TABLE events_indexed (akey) AS 'BITMAP' WITH DEFERRED REBUILD IDXPROPERTIES ('creator' = 'me', 'created_at' = 'some_time') IN TABLE events_akey_index_table PARTITIONED BY (year , month , day, hour); COMMENT 'Events table indexed by akey'; But I get a parse error something like this FAILED: ParseException line 7:0 missing EOF at 'PARTITIONED' near 'events_akey_index_table' If I drop the partitioned by column, then things work fine and the index is created but corresponding query using the index is actually slower than on the non-indexed version of the table since it doesn't use the partition information. Any idea what am I missing here? Sagar