Hi, I am still having problems building my index. In an attempt to find someone who can help me I'll go through all the steps that I try.
1) First I load my data into hive. hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score; Loading data to table default.score Deleted hdfs://localhost/data/warehouse/score OK Time taken: 7.817 seconds 2) Then I try to create the index hive> CREATE INDEX bigIndex > ON TABLE score(Ath_Seq_Num) > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'; FAILED: Error in metadata: java.lang.RuntimeException: Please specify deferred rebuild using " WITH DEFERRED REBUILD ". FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask hive> 3) OK, so it suggests that I use "DEFERRED BUILD" and so I do hive> > > CREATE INDEX bigIndex > ON TABLE score(Ath_Seq_Num) > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler' > WITH DEFERRED REBUILD; OK Time taken: 0.603 seconds 4) Now, to create the index I assume that I use ALTER INDEX as follows: hive>ALTER INDEX bigIndex ON score REBUILD; Total MapReduce jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 138 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapred.reduce.tasks=<number> Starting Job = job_201210311448_0001, Tracking URL = http://localhost:50030/jobdetails.jsp?jobid=job_201210311448_0001 Kill Command = /data/hadoop-1.0.3/libexec/../bin/hadoop job -Dmapred.job.tracker=localhost:8021 -kill job_201210311448_0001 Hadoop job information for Stage-1: number of mappers: 511; number of reducers: 138 2012-10-31 15:59:27,076 Stage-1 map = 0%, reduce = 0% 5) This all looks promising, and after increasing my heapsize to get the Map/Reduce to complete, I get this an hour later 2012-10-31 17:08:23,572 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4135.47 sec MapReduce Total cumulative CPU time: 0 days 1 hours 8 minutes 55 seconds 470 msec Ended Job = job_201210311448_0001 Loading data to table default.default__score_bigindex__ Deleted hdfs://localhost/data/warehouse/default__score_bigindex__ Invalid alter operation: Unable to alter index. FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask So what have I done wrong, and what am I to do to get this index to build successfully? Any help appreciated. Peter Marron From: Peter Marron [mailto:peter.mar...@trilliumsoftware.com] Sent: 24 October 2012 13:27 To: user@hive.apache.org Subject: RE: Indexes Hi Shreepadma, Thanks for this. Looks exactly like the information I need. I was going to reply when I had tried it all out, but I'm having problems creating the index at the moment (I'm getting an OutOfMemoryError at the moment). So I thought that I had better reply now to say thank you. Peter Marron From: Shreepadma Venugopalan [mailto:shreepa...@cloudera.com] Sent: 23 October 2012 19:49 To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Indexes Hi Peter, Indexing support was added to Hive in 0.7 and in 0.8 the query compiler was enhanced to optimized some class of queries (certain group bys and joins) using indexes. Assuming you are using the built in index handler you need to do the following _after_ you have created and rebuilt the index, SET hive.index.compact.file='/tmp/index_result'; SET hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat; You will then notice speed up for a query of the form, select count(*) from tab where indexed_col = some_val Thanks, Shreepadma On Tue, Oct 23, 2012 at 5:44 AM, Peter Marron <peter.mar...@trilliumsoftware.com<mailto:peter.mar...@trilliumsoftware.com>> wrote: Hi, I'm very much a Hive newbie but I've been looking at HIVE-417 and this page in particular: http://cwiki.apache.org/confluence/display/Hive/IndexDev Using this information I've been able to create an index (using Hive 0.8.1) and when I look at the contents it all looks very promising indeed. However on the same page there's this comment: "...This document currently only covers index creation and maintenance. A follow-on will explain how indexes are used to optimize queries (building on FilterPushdownDev<https://cwiki.apache.org/confluence/display/Hive/FilterPushdownDev>)...." However I can't find the "follow-on" which tells me how to exploit the index that I've created to "optimize" subsequent queries. Now I've been told that I can create and use indexes with the current release of Hive _without_ writing and developing any Java code of my own. Is this true? If so, how? Any help appreciated. Peter Marron.