how to force hive to use index for predicates

Mahsa Mofidpoor Sat, 19 May 2012 14:12:32 -0700

Hi,

I have a table pokes(foo int, bar string) on which I created a compact
index (on foo). I set hive.optimze.autoindex=true and then run the query
select * from pokes where name = 'somename'. However, when I enable
logging, I do not see the phase in which the optimizer makes use of the
index. I am suspecting that it did not use the index to execute the query.
If so, how should I force it to use the index?


Thank you in advance for your reply,
Mahsa

Below is the complete log:

12/05/19 17:01:37 INFO ql.Driver: <PERFLOG method=Driver.run>
12/05/19 17:01:37 INFO ql.Driver: <PERFLOG method=compile>
12/05/19 17:01:37 INFO parse.ParseDriver: Parsing command: select * from
pokes where pokes.bar = 'John'
12/05/19 17:01:37 INFO parse.ParseDriver: Parse Completed
12/05/19 17:01:37 INFO parse.SemanticAnalyzer: Starting Semantic Analysis
12/05/19 17:01:37 INFO parse.SemanticAnalyzer: Completed phase 1 of
Semantic Analysis
12/05/19 17:01:37 INFO parse.SemanticAnalyzer: Get metadata for source
tables
12/05/19 17:01:37 INFO metastore.HiveMetaStore: 0: get_table : db=default
tbl=pokes
12/05/19 17:01:37 INFO hive.log: DDL: struct pokes { i32 foo, string bar}
12/05/19 17:01:37 INFO parse.SemanticAnalyzer: Get metadata for subqueries
12/05/19 17:01:37 INFO parse.SemanticAnalyzer: Get metadata for destination
tables
12/05/19 17:01:37 INFO parse.SemanticAnalyzer: Completed getting MetaData
in Semantic Analysis
12/05/19 17:01:37 INFO hive.log: DDL: struct pokes { i32 foo, string bar}
12/05/19 17:01:37 INFO ppd.OpProcFactory: Processing for FS(3)
12/05/19 17:01:37 INFO ppd.OpProcFactory: Processing for SEL(2)
12/05/19 17:01:37 INFO ppd.OpProcFactory: Processing for FIL(1)
12/05/19 17:01:37 INFO ppd.OpProcFactory: Pushdown Predicates of FIL For
Alias : pokes
12/05/19 17:01:37 INFO ppd.OpProcFactory: (bar = 'John')
12/05/19 17:01:37 INFO ppd.OpProcFactory: Processing for TS(0)
12/05/19 17:01:37 INFO ppd.OpProcFactory: Pushdown Predicates of TS For
Alias : pokes
12/05/19 17:01:37 INFO ppd.OpProcFactory: (bar = 'John')
12/05/19 17:01:37 INFO hive.log: DDL: struct pokes { i32 foo, string bar}
12/05/19 17:01:37 INFO hive.log: DDL: struct pokes { i32 foo, string bar}
12/05/19 17:01:37 INFO hive.log: DDL: struct pokes { i32 foo, string bar}
12/05/19 17:01:37 INFO physical.MetadataOnlyOptimizer: Looking for table
scans where optimization is applicable
12/05/19 17:01:37 INFO physical.MetadataOnlyOptimizer: Found 0 metadata
only table scans
12/05/19 17:01:37 INFO parse.SemanticAnalyzer: Completed plan generation
12/05/19 17:01:37 INFO ql.Driver: Semantic Analysis Completed
12/05/19 17:01:37 INFO ql.Driver: Returning Hive schema:
Schema(fieldSchemas:[FieldSchema(name:foo, type:int, comment:null),
FieldSchema(name:bar, type:string, comment:null)], properties:null)
12/05/19 17:01:37 INFO ql.Driver: </PERFLOG method=compile
start=1337461297242 end=1337461297288 duration=46>
12/05/19 17:01:37 INFO ql.Driver: <PERFLOG method=Driver.execute>
12/05/19 17:01:37 INFO ql.Driver: Starting command: select * from pokes
where pokes.bar = 'John'
Total MapReduce jobs = 1
12/05/19 17:01:37 INFO ql.Driver: Total MapReduce jobs = 1
Launching Job 1 out of 1
12/05/19 17:01:37 INFO ql.Driver: Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
12/05/19 17:01:37 INFO exec.Task: Number of reduce tasks is set to 0 since
there's no reduce operator
12/05/19 17:01:37 INFO exec.ExecDriver: Using
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
12/05/19 17:01:37 INFO exec.ExecDriver: adding libjars:
file:///usr/local/hive/build/dist/lib/hive-builtins-0.10.0-SNAPSHOT.jar
12/05/19 17:01:37 INFO exec.ExecDriver: Processing alias pokes
12/05/19 17:01:37 INFO exec.ExecDriver: Adding input file
hdfs://localhost:54310/user/hive/warehouse/pokes
12/05/19 17:01:37 INFO exec.Utilities: Content Summary not cached for
hdfs://localhost:54310/user/hive/warehouse/pokes
12/05/19 17:01:37 INFO exec.ExecDriver: Making Temp Directory:
hdfs://localhost:54310/tmp/hive-hduser/hive_2012-05-19_17-01-37_242_5525578509473503196/-ext-10001
12/05/19 17:01:37 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.
12/05/19 17:01:37 INFO io.CombineHiveInputFormat: CombineHiveInputSplit
creating pool for hdfs://localhost:54310/user/hive/warehouse/pokes; using
filter path hdfs://localhost:54310/user/hive/warehouse/pokes
12/05/19 17:01:37 INFO mapred.FileInputFormat: Total input paths to process
: 1
12/05/19 17:01:37 INFO io.CombineHiveInputFormat: number of splits 1
Starting Job = job_201205131854_0113, Tracking URL =
http://localhost:50030/jobdetails.jsp?jobid=job_201205131854_0113
12/05/19 17:01:37 INFO exec.Task: Starting Job = job_201205131854_0113,
Tracking URL =
http://localhost:50030/jobdetails.jsp?jobid=job_201205131854_0113
Kill Command = /usr/local/hadoop/bin/../bin/hadoop job
 -Dmapred.job.tracker=localhost:54311 -kill job_201205131854_0113
12/05/19 17:01:37 INFO exec.Task: Kill Command =
/usr/local/hadoop/bin/../bin/hadoop job
 -Dmapred.job.tracker=localhost:54311 -kill job_201205131854_0113
Hadoop job information for Stage-1: number of mappers: 1; number of
reducers: 0
12/05/19 17:01:43 INFO exec.Task: Hadoop job information for Stage-1:
number of mappers: 1; number of reducers: 0
2012-05-19 17:01:43,764 Stage-1 map = 0%,  reduce = 0%
12/05/19 17:01:43 INFO exec.Task: 2012-05-19 17:01:43,764 Stage-1 map = 0%,
 reduce = 0%
2012-05-19 17:01:46,773 Stage-1 map = 100%,  reduce = 0%
12/05/19 17:01:46 INFO exec.Task: 2012-05-19 17:01:46,773 Stage-1 map =
100%,  reduce = 0%
2012-05-19 17:01:49,782 Stage-1 map = 100%,  reduce = 100%
12/05/19 17:01:49 INFO exec.Task: 2012-05-19 17:01:49,782 Stage-1 map =
100%,  reduce = 100%
Ended Job = job_201205131854_0113
12/05/19 17:01:49 INFO exec.Task: Ended Job = job_201205131854_0113
12/05/19 17:01:49 INFO exec.FileSinkOperator: Moving tmp dir:
hdfs://localhost:54310/tmp/hive-hduser/hive_2012-05-19_17-01-37_242_5525578509473503196/_tmp.-ext-10001
to:
hdfs://localhost:54310/tmp/hive-hduser/hive_2012-05-19_17-01-37_242_5525578509473503196/_tmp.-ext-10001.intermediate
12/05/19 17:01:49 INFO exec.FileSinkOperator: Moving tmp dir:
hdfs://localhost:54310/tmp/hive-hduser/hive_2012-05-19_17-01-37_242_5525578509473503196/_tmp.-ext-10001.intermediate
to:
hdfs://localhost:54310/tmp/hive-hduser/hive_2012-05-19_17-01-37_242_5525578509473503196/-ext-10001
12/05/19 17:01:49 INFO ql.Driver: </PERFLOG method=Driver.execute
start=1337461297288 end=1337461309816 duration=12528>
MapReduce Jobs Launched:
12/05/19 17:01:49 INFO ql.Driver: MapReduce Jobs Launched:
Job 0: Map: 1   HDFS Read: 5812 HDFS Write: 0 SUCCESS
12/05/19 17:01:49 INFO ql.Driver: Job 0: Map: 1   HDFS Read: 5812 HDFS
Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
12/05/19 17:01:49 INFO ql.Driver: Total MapReduce CPU Time Spent: 0 msec
OK
12/05/19 17:01:49 INFO ql.Driver: OK
12/05/19 17:01:49 INFO ql.Driver: <PERFLOG method=releaseLocks>
12/05/19 17:01:49 INFO ql.Driver: </PERFLOG method=releaseLocks
start=1337461309816 end=1337461309817 duration=1>
12/05/19 17:01:49 INFO ql.Driver: </PERFLOG method=Driver.run
start=1337461297242 end=1337461309817 duration=12575>
Time taken: 12.599 seconds
12/05/19 17:01:49 INFO CliDriver: Time taken: 12.599 seconds

how to force hive to use index for predicates

Reply via email to