Dear all, I created a compact index for a table with several hundred million records as follows. The table is partitioned by the month. The index on A and B was created successfully, but I can't see it getting used in the queries. It would be great if one of you experts can shed some light on what am I missing. I'm using hive 0.9.
set hive.exec.parallel=false; CREATE INDEX idx_yyyy ON TABLE yyyy(a,b) AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' WITH DEFERRED REBUILD COMMENT 'Index for yyyy table. Indexing on A and B'; ALTER INDEX idx_yyyy on yyyy REBUILD; hive> describe yyyy; OK a bigint ... b bigint .... month int hive> show index on yyyy; OK idx_yyyy yyyy a, b default__yyyy_p_idx_yyyy__ compact Index for tm top50 table. Indexing on A and B hive> explain select a,b from tm_top50_p where a=113231 and month=201308; OK ABSTRACT SYNTAX TREE: (TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TABNAME yyyy))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (TOK_TABLE_OR_COL a)) (TOK_SELEXPR (TOK_TABLE_OR_COL b))) (TOK_WHERE (and (= (TOK_TABLE_OR_COL a) 113231) (= (TOK_TABLE_OR_COL month) 201308))))) STAGE DEPENDENCIES: Stage-1 is a root stage Stage-0 is a root stage STAGE PLANS: Stage: Stage-1 Map Reduce Alias -> Map Operator Tree: yyyy TableScan alias: yyyy Filter Operator predicate: expr: (a = 113231) type: boolean Select Operator expressions: expr: a type: bigint expr: b type: bigint outputColumnNames: _col0, _col1 File Output Operator compressed: false GlobalTableId: 0 table: input format: org.apache.hadoop.mapred.TextInputFormat output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-0 Fetch Operator limit: -1 thanks a lot, Thilina -- https://www.cs.indiana.edu/~tgunarat/ http://www.linkedin.com/in/thilina http://thilina.gunarathne.org