Re: Creating Indexes

Dean Wampler Thu, 01 Nov 2012 06:02:24 -0700

It looks like you're using Derby with a real cluster, not just a single
machine in local or pseudo-distributed mode. I haven't tried this myself,
but the derby jar is probably not on the machine that ran the reducer task
that failed.


dean

On Thu, Nov 1, 2012 at 4:31 AM, Peter Marron <
peter.mar...@trilliumsoftware.com> wrote:

>  Hi Shreepadma,****
>
> ** **
>
> I agree that the error looks odd. However I can’t believe that I would have
> ****
>
> got this far with Hive if there was no derby jar. Nevertheless I checked.*
> ***
>
> Here is a directory listing of the Hive install:****
>
> ** **
>
> pmarron@pmarron-ubuntu:/data/hive/lib$ ls****
>
> ant-contrib-1.0b3.jar          commons-pool-1.5.4.jar
> hive-common-0.8.1.jar         hive-shims-0.8.1.jar  mockito-all-1.8.2.jar*
> ***
>
> antlr-2.7.7.jar                datanucleus-connectionpool-2.0.3.jar
> hive-contrib-0.8.1.jar        javaewah-0.3.jar      php****
>
> antlr-3.0.1.jar                datanucleus-core-2.0.3.jar
> hive_contrib.jar              jdo2-api-2.3-ec.jar   py****
>
> antlr-runtime-3.0.1.jar        datanucleus-enhancer-2.0.3.jar
> hive-exec-0.8.1.jar           jline-0.9.94.jar      slf4j-api-1.6.1.jar***
> *
>
> asm-3.1.jar                    datanucleus-rdbms-2.0.3.jar
> hive-hbase-handler-0.8.1.jar  json-20090211.jar     slf4j-log4j12-1.6.1.jar
> ****
>
> commons-cli-1.2.jar            *derby-10.4.2.0.jar*
> hive-hwi-0.8.1.jar            junit-4.10.jar
> stringtemplate-3.1-b1.jar****
>
> commons-codec-1.3.jar          guava-r06.jar
> hive-hwi-0.8.1.war            libfb303-0.7.0.jar    velocity-1.5.jar****
>
> commons-collections-3.2.1.jar  hbase-0.89.0-SNAPSHOT.jar
>         hive-jdbc-0.8.1.jar           libfb303.jar
> zookeeper-3.3.1.jar****
>
> commons-dbcp-1.4.jar           hbase-0.89.0-SNAPSHOT-tests.jar
> hive-metastore-0.8.1.jar      libthrift-0.7.0.jar****
>
> commons-lang-2.4.jar           hive-anttasks-0.8.1.jar
>             hive-pdk-0.8.1.jar            libthrift.jar****
>
> commons-logging-1.0.4.jar      hive-builtins-0.8.1.jar
> hive-serde-0.8.1.jar          log4j-1.2.15.jar****
>
> commons-logging-api-1.0.4.jar  hive-cli-0.8.1.jar
> hive-service-0.8.1.jar        log4j-1.2.16.jar****
>
> ** **
>
> Also I found a derby.log in my home directory which I have attached.****
>
> ** **
>
> Regards,****
>
> ** **
>
> Z****
>
> ** **
>
> *From:* Shreepadma Venugopalan [mailto:shreepa...@cloudera.com]
> *Sent:* 31 October 2012 21:58
>
> *To:* user@hive.apache.org
> *Subject:* Re: Creating Indexes****
>
>  ** **
>
> Hi Peter,****
>
> ** **
>
> From the execution log,****
>
> ** **
>
> java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver****
>
>           at java.net.URLClassLoader$1.run(URLClassLoader.java:366)****
>
>           at java.net.URLClassLoader$1.run(URLClassLoader.java:355)****
>
>           at java.security.AccessController.doPrivileged(Native Method)***
> *
>
>           at java.net.URLClassLoader.findClass(URLClassLoader.java:354)***
> *
>
>           at java.lang.ClassLoader.loadClass(ClassLoader.java:423)****
>
>           at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> ****
>
>           at java.lang.ClassLoader.loadClass(ClassLoader.java:356)****
>
>           at java.lang.Class.forName0(Native Method)****
>
>           at java.lang.Class.forName(Class.java:186)****
>
>           at
> org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:68)
> ****
>
>           at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:778)
> ****
>
>           at
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:723)
> ****
>
>           at
> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)****
>
>           at
> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)****
>
>           at
> org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)****
>
>           at
> org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303)****
>
>           at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529)****
>
>           at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)*
> ***
>
>           at org.apache.hadoop.mapred.Child$4.run(Child.java:255)****
>
>           at java.security.AccessController.doPrivileged(Native Method)***
> *
>
>           at javax.security.auth.Subject.doAs(Subject.java:415)****
>
>           at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
> ****
>
>           at org.apache.hadoop.mapred.Child.main(Child.java:249)****
>
> ** **
>
> It appears that the error is due derby classes not being found. Can you
> check if the derby jars are present?****
>
> ** **
>
> Thanks,****
>
> Shreepadma****
>
> ** **
>
> ** **
>
> On Wed, Oct 31, 2012 at 12:52 PM, Peter Marron <
> peter.mar...@trilliumsoftware.com> wrote:****
>
> Hi Shreepadma****
>
>  ****
>
> Happy to attach the logs, not quite sure which one is going to****
>
> be most useful. Please find attached one which contained an****
>
> error of some sort. Not sure it it’s related or not to the index error.***
> *
>
> Found the file in this location:****
>
>  ****
>
>
> /data/hadoop/logs/userlogs/job_201210311448_0001/attempt_201210311448_0001_r_000137_0/syslog
> ****
>
>  ****
>
> so maybe that will help you locate any other file that you might want to
> see.****
>
>  ****
>
> Thanks for your efforts.****
>
>  ****
>
> Peter Marron****
>
>  ****
>
> *From:* Shreepadma Venugopalan [mailto:shreepa...@cloudera.com]
> *Sent:* 31 October 2012 18:38
> *To:* user@hive.apache.org
> *Subject:* Re: Creating Indexes****
>
>  ****
>
> Hi Peter,****
>
>  ****
>
> Can you attach the execution logs? What is the exception that you see in
> the execution logs?****
>
>  ****
>
> Thanks,****
>
> Shreepadma ****
>
>  ****
>
> On Wed, Oct 31, 2012 at 10:42 AM, Peter Marron <
> peter.mar...@trilliumsoftware.com> wrote:****
>
> Hi,****
>
>  ****
>
> I am still having problems building my index.****
>
> In an attempt to find someone who can help me****
>
> I’ll go through all the steps that I try.****
>
>  ****
>
> 1)      First I load my data into hive.****
>
>  ****
>
> hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score;****
>
> Loading data to table default.score****
>
> Deleted hdfs://localhost/data/warehouse/score****
>
> OK****
>
> Time taken: 7.817 seconds****
>
>  ****
>
> 2)      Then I try to create the index ****
>
>  ****
>
> hive> CREATE INDEX bigIndex****
>
>     > ON TABLE score(Ath_Seq_Num)****
>
>     > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';***
> *
>
> FAILED: Error in metadata: java.lang.RuntimeException: Please specify
> deferred rebuild using " WITH DEFERRED REBUILD ".****
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask****
>
> hive> ****
>
>  ****
>
> 3)      OK, so it suggests that I use “DEFERRED BUILD” and so I do****
>
> hive> ****
>
>     > ****
>
>     > CREATE INDEX bigIndex****
>
>     > ON TABLE score(Ath_Seq_Num)****
>
>     > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'****
>
>     > WITH DEFERRED REBUILD;****
>
> OK****
>
> Time taken: 0.603 seconds****
>
>  ****
>
> 4)      Now, to create the index I assume that I use ALTER INDEX as
> follows:****
>
>  ****
>
> hive>ALTER INDEX bigIndex ON score REBUILD;****
>
> Total MapReduce jobs = 1****
>
> Launching Job 1 out of 1****
>
> Number of reduce tasks not specified. Estimated from input data size: 138*
> ***
>
> In order to change the average load for a reducer (in bytes):****
>
>   set hive.exec.reducers.bytes.per.reducer=<number>****
>
> In order to limit the maximum number of reducers:****
>
>   set hive.exec.reducers.max=<number>****
>
> In order to set a constant number of reducers:****
>
>   set mapred.reduce.tasks=<number>****
>
> Starting Job = job_201210311448_0001, Tracking URL =
> http://localhost:50030/jobdetails.jsp?jobid=job_201210311448_0001****
>
> Kill Command = /data/hadoop-1.0.3/libexec/../bin/hadoop job
> -Dmapred.job.tracker=localhost:8021 -kill job_201210311448_0001****
>
> Hadoop job information for Stage-1: number of mappers: 511; number of
> reducers: 138****
>
> 2012-10-31 15:59:27,076 Stage-1 map = 0%,  reduce = 0%****
>
>  ****
>
> 5)      This all looks promising, and after increasing my heapsize to get
> the Map/Reduce to complete, I get this an hour later****
>
>  ****
>
> 2012-10-31 17:08:23,572 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU
> 4135.47 sec****
>
> MapReduce Total cumulative CPU time: 0 days 1 hours 8 minutes 55 seconds
> 470 msec****
>
> Ended Job = job_201210311448_0001****
>
> Loading data to table default.default__score_bigindex__****
>
> Deleted hdfs://localhost/data/warehouse/default__score_bigindex__****
>
> Invalid alter operation: Unable to alter index.****
>
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.DDLTask****
>
>  ****
>
> So what have I done wrong, and what am I to do to get this index to build
> successfully?****
>
>  ****
>
> Any help appreciated.****
>
>  ****
>
> Peter Marron****
>
>  ****
>
> *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com]
> *Sent:* 24 October 2012 13:27
> *To:* user@hive.apache.org
> *Subject:* RE: Indexes****
>
>  ****
>
> Hi Shreepadma,****
>
>  ****
>
> Thanks for this. Looks exactly like the information I need.****
>
> I was going to reply when I had tried it all out, but I’m having****
>
> problems creating the index at the moment (I’m getting an****
>
> OutOfMemoryError at the moment). So I thought that I had****
>
> better reply now to say thank you.****
>
>  ****
>
> Peter Marron****
>
>  ****
>
>  ****
>
> *From:* Shreepadma Venugopalan 
> [mailto:shreepa...@cloudera.com<shreepa...@cloudera.com>]
>
> *Sent:* 23 October 2012 19:49
> *To:* user@hive.apache.org
> *Subject:* Re: Indexes****
>
>  ****
>
> Hi Peter,****
>
>  ****
>
> Indexing support was added to Hive in 0.7 and in 0.8 the query compiler
> was enhanced to optimized some class of queries (certain group bys and
> joins) using indexes. Assuming you are using the built in index handler you
> need to do the following _after_ you have created and rebuilt the index,**
> **
>
>  ****
>
> SET hive.index.compact.file='/tmp/index_result';****
>
> SET
> hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat;
> ****
>
>  ****
>
> You will then notice speed up for a query of the form,****
>
>  ****
>
> select count(*) from tab where indexed_col = some_val****
>
>  ****
>
> Thanks,****
>
> Shreepadma****
>
>  ****
>
> On Tue, Oct 23, 2012 at 5:44 AM, Peter Marron <
> peter.mar...@trilliumsoftware.com> wrote:****
>
> Hi,****
>
>  ****
>
> I’m very much a Hive newbie but I’ve been looking at HIVE-417 and this
> page in particular:****
>
> http://cwiki.apache.org/confluence/display/Hive/IndexDev****
>
> Using this information I’ve been able to create an index (using Hive 0.8.1)
> ****
>
> and when I look at the contents it all looks very promising indeed.****
>
> However on the same page there’s this comment:****
>
>  ****
>
> “…This document currently only covers index creation and maintenance. A
> follow-on will explain how indexes are used to optimize queries (building
> on 
> FilterPushdownDev<https://cwiki.apache.org/confluence/display/Hive/FilterPushdownDev>
> )….”****
>
>  ****
>
> However I can’t find the “follow-on” which tells me how to exploit the
> index that I’ve****
>
> created to “optimize” subsequent queries.****
>
> Now I’ve been told that I can create and use indexes with the current****
>
> release of Hive _*without*_ writing and developing any Java code of my
> own.****
>
> Is this true? If so, how?****
>
>  ****
>
> Any help appreciated.****
>
>  ****
>
> Peter Marron.****
>
>  ****
>
>  ****
>
> ** **
>



-- 
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330

Re: Creating Indexes

Reply via email to