It looks like you're using Derby with a real cluster, not just a single machine in local or pseudo-distributed mode. I haven't tried this myself, but the derby jar is probably not on the machine that ran the reducer task that failed.
dean On Thu, Nov 1, 2012 at 4:31 AM, Peter Marron < peter.mar...@trilliumsoftware.com> wrote: > Hi Shreepadma,**** > > ** ** > > I agree that the error looks odd. However I can’t believe that I would have > **** > > got this far with Hive if there was no derby jar. Nevertheless I checked.* > *** > > Here is a directory listing of the Hive install:**** > > ** ** > > pmarron@pmarron-ubuntu:/data/hive/lib$ ls**** > > ant-contrib-1.0b3.jar commons-pool-1.5.4.jar > hive-common-0.8.1.jar hive-shims-0.8.1.jar mockito-all-1.8.2.jar* > *** > > antlr-2.7.7.jar datanucleus-connectionpool-2.0.3.jar > hive-contrib-0.8.1.jar javaewah-0.3.jar php**** > > antlr-3.0.1.jar datanucleus-core-2.0.3.jar > hive_contrib.jar jdo2-api-2.3-ec.jar py**** > > antlr-runtime-3.0.1.jar datanucleus-enhancer-2.0.3.jar > hive-exec-0.8.1.jar jline-0.9.94.jar slf4j-api-1.6.1.jar*** > * > > asm-3.1.jar datanucleus-rdbms-2.0.3.jar > hive-hbase-handler-0.8.1.jar json-20090211.jar slf4j-log4j12-1.6.1.jar > **** > > commons-cli-1.2.jar *derby-10.4.2.0.jar* > hive-hwi-0.8.1.jar junit-4.10.jar > stringtemplate-3.1-b1.jar**** > > commons-codec-1.3.jar guava-r06.jar > hive-hwi-0.8.1.war libfb303-0.7.0.jar velocity-1.5.jar**** > > commons-collections-3.2.1.jar hbase-0.89.0-SNAPSHOT.jar > hive-jdbc-0.8.1.jar libfb303.jar > zookeeper-3.3.1.jar**** > > commons-dbcp-1.4.jar hbase-0.89.0-SNAPSHOT-tests.jar > hive-metastore-0.8.1.jar libthrift-0.7.0.jar**** > > commons-lang-2.4.jar hive-anttasks-0.8.1.jar > hive-pdk-0.8.1.jar libthrift.jar**** > > commons-logging-1.0.4.jar hive-builtins-0.8.1.jar > hive-serde-0.8.1.jar log4j-1.2.15.jar**** > > commons-logging-api-1.0.4.jar hive-cli-0.8.1.jar > hive-service-0.8.1.jar log4j-1.2.16.jar**** > > ** ** > > Also I found a derby.log in my home directory which I have attached.**** > > ** ** > > Regards,**** > > ** ** > > Z**** > > ** ** > > *From:* Shreepadma Venugopalan [mailto:shreepa...@cloudera.com] > *Sent:* 31 October 2012 21:58 > > *To:* user@hive.apache.org > *Subject:* Re: Creating Indexes**** > > ** ** > > Hi Peter,**** > > ** ** > > From the execution log,**** > > ** ** > > java.lang.ClassNotFoundException: org.apache.derby.jdbc.EmbeddedDriver**** > > at java.net.URLClassLoader$1.run(URLClassLoader.java:366)**** > > at java.net.URLClassLoader$1.run(URLClassLoader.java:355)**** > > at java.security.AccessController.doPrivileged(Native Method)*** > * > > at java.net.URLClassLoader.findClass(URLClassLoader.java:354)*** > * > > at java.lang.ClassLoader.loadClass(ClassLoader.java:423)**** > > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > **** > > at java.lang.ClassLoader.loadClass(ClassLoader.java:356)**** > > at java.lang.Class.forName0(Native Method)**** > > at java.lang.Class.forName(Class.java:186)**** > > at > org.apache.hadoop.hive.ql.stats.jdbc.JDBCStatsPublisher.connect(JDBCStatsPublisher.java:68) > **** > > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.publishStats(FileSinkOperator.java:778) > **** > > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.closeOp(FileSinkOperator.java:723) > **** > > at > org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:557)**** > > at > org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)**** > > at > org.apache.hadoop.hive.ql.exec.Operator.close(Operator.java:566)**** > > at > org.apache.hadoop.hive.ql.exec.ExecReducer.close(ExecReducer.java:303)**** > > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:529)**** > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:420)* > *** > > at org.apache.hadoop.mapred.Child$4.run(Child.java:255)**** > > at java.security.AccessController.doPrivileged(Native Method)*** > * > > at javax.security.auth.Subject.doAs(Subject.java:415)**** > > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121) > **** > > at org.apache.hadoop.mapred.Child.main(Child.java:249)**** > > ** ** > > It appears that the error is due derby classes not being found. Can you > check if the derby jars are present?**** > > ** ** > > Thanks,**** > > Shreepadma**** > > ** ** > > ** ** > > On Wed, Oct 31, 2012 at 12:52 PM, Peter Marron < > peter.mar...@trilliumsoftware.com> wrote:**** > > Hi Shreepadma**** > > **** > > Happy to attach the logs, not quite sure which one is going to**** > > be most useful. Please find attached one which contained an**** > > error of some sort. Not sure it it’s related or not to the index error.*** > * > > Found the file in this location:**** > > **** > > > /data/hadoop/logs/userlogs/job_201210311448_0001/attempt_201210311448_0001_r_000137_0/syslog > **** > > **** > > so maybe that will help you locate any other file that you might want to > see.**** > > **** > > Thanks for your efforts.**** > > **** > > Peter Marron**** > > **** > > *From:* Shreepadma Venugopalan [mailto:shreepa...@cloudera.com] > *Sent:* 31 October 2012 18:38 > *To:* user@hive.apache.org > *Subject:* Re: Creating Indexes**** > > **** > > Hi Peter,**** > > **** > > Can you attach the execution logs? What is the exception that you see in > the execution logs?**** > > **** > > Thanks,**** > > Shreepadma **** > > **** > > On Wed, Oct 31, 2012 at 10:42 AM, Peter Marron < > peter.mar...@trilliumsoftware.com> wrote:**** > > Hi,**** > > **** > > I am still having problems building my index.**** > > In an attempt to find someone who can help me**** > > I’ll go through all the steps that I try.**** > > **** > > 1) First I load my data into hive.**** > > **** > > hive> LOAD DATA INPATH 'E3/score.csv' OVERWRITE INTO TABLE score;**** > > Loading data to table default.score**** > > Deleted hdfs://localhost/data/warehouse/score**** > > OK**** > > Time taken: 7.817 seconds**** > > **** > > 2) Then I try to create the index **** > > **** > > hive> CREATE INDEX bigIndex**** > > > ON TABLE score(Ath_Seq_Num)**** > > > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';*** > * > > FAILED: Error in metadata: java.lang.RuntimeException: Please specify > deferred rebuild using " WITH DEFERRED REBUILD ".**** > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask**** > > hive> **** > > **** > > 3) OK, so it suggests that I use “DEFERRED BUILD” and so I do**** > > hive> **** > > > **** > > > CREATE INDEX bigIndex**** > > > ON TABLE score(Ath_Seq_Num)**** > > > AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler'**** > > > WITH DEFERRED REBUILD;**** > > OK**** > > Time taken: 0.603 seconds**** > > **** > > 4) Now, to create the index I assume that I use ALTER INDEX as > follows:**** > > **** > > hive>ALTER INDEX bigIndex ON score REBUILD;**** > > Total MapReduce jobs = 1**** > > Launching Job 1 out of 1**** > > Number of reduce tasks not specified. Estimated from input data size: 138* > *** > > In order to change the average load for a reducer (in bytes):**** > > set hive.exec.reducers.bytes.per.reducer=<number>**** > > In order to limit the maximum number of reducers:**** > > set hive.exec.reducers.max=<number>**** > > In order to set a constant number of reducers:**** > > set mapred.reduce.tasks=<number>**** > > Starting Job = job_201210311448_0001, Tracking URL = > http://localhost:50030/jobdetails.jsp?jobid=job_201210311448_0001**** > > Kill Command = /data/hadoop-1.0.3/libexec/../bin/hadoop job > -Dmapred.job.tracker=localhost:8021 -kill job_201210311448_0001**** > > Hadoop job information for Stage-1: number of mappers: 511; number of > reducers: 138**** > > 2012-10-31 15:59:27,076 Stage-1 map = 0%, reduce = 0%**** > > **** > > 5) This all looks promising, and after increasing my heapsize to get > the Map/Reduce to complete, I get this an hour later**** > > **** > > 2012-10-31 17:08:23,572 Stage-1 map = 100%, reduce = 100%, Cumulative CPU > 4135.47 sec**** > > MapReduce Total cumulative CPU time: 0 days 1 hours 8 minutes 55 seconds > 470 msec**** > > Ended Job = job_201210311448_0001**** > > Loading data to table default.default__score_bigindex__**** > > Deleted hdfs://localhost/data/warehouse/default__score_bigindex__**** > > Invalid alter operation: Unable to alter index.**** > > FAILED: Execution Error, return code 1 from > org.apache.hadoop.hive.ql.exec.DDLTask**** > > **** > > So what have I done wrong, and what am I to do to get this index to build > successfully?**** > > **** > > Any help appreciated.**** > > **** > > Peter Marron**** > > **** > > *From:* Peter Marron [mailto:peter.mar...@trilliumsoftware.com] > *Sent:* 24 October 2012 13:27 > *To:* user@hive.apache.org > *Subject:* RE: Indexes**** > > **** > > Hi Shreepadma,**** > > **** > > Thanks for this. Looks exactly like the information I need.**** > > I was going to reply when I had tried it all out, but I’m having**** > > problems creating the index at the moment (I’m getting an**** > > OutOfMemoryError at the moment). So I thought that I had**** > > better reply now to say thank you.**** > > **** > > Peter Marron**** > > **** > > **** > > *From:* Shreepadma Venugopalan > [mailto:shreepa...@cloudera.com<shreepa...@cloudera.com>] > > *Sent:* 23 October 2012 19:49 > *To:* user@hive.apache.org > *Subject:* Re: Indexes**** > > **** > > Hi Peter,**** > > **** > > Indexing support was added to Hive in 0.7 and in 0.8 the query compiler > was enhanced to optimized some class of queries (certain group bys and > joins) using indexes. Assuming you are using the built in index handler you > need to do the following _after_ you have created and rebuilt the index,** > ** > > **** > > SET hive.index.compact.file='/tmp/index_result';**** > > SET > hive.input.format=org.apache.hadoop.hive.ql.index.compact.HiveCompactIndexInputFormat; > **** > > **** > > You will then notice speed up for a query of the form,**** > > **** > > select count(*) from tab where indexed_col = some_val**** > > **** > > Thanks,**** > > Shreepadma**** > > **** > > On Tue, Oct 23, 2012 at 5:44 AM, Peter Marron < > peter.mar...@trilliumsoftware.com> wrote:**** > > Hi,**** > > **** > > I’m very much a Hive newbie but I’ve been looking at HIVE-417 and this > page in particular:**** > > http://cwiki.apache.org/confluence/display/Hive/IndexDev**** > > Using this information I’ve been able to create an index (using Hive 0.8.1) > **** > > and when I look at the contents it all looks very promising indeed.**** > > However on the same page there’s this comment:**** > > **** > > “…This document currently only covers index creation and maintenance. A > follow-on will explain how indexes are used to optimize queries (building > on > FilterPushdownDev<https://cwiki.apache.org/confluence/display/Hive/FilterPushdownDev> > )….”**** > > **** > > However I can’t find the “follow-on” which tells me how to exploit the > index that I’ve**** > > created to “optimize” subsequent queries.**** > > Now I’ve been told that I can create and use indexes with the current**** > > release of Hive _*without*_ writing and developing any Java code of my > own.**** > > Is this true? If so, how?**** > > **** > > Any help appreciated.**** > > **** > > Peter Marron.**** > > **** > > **** > > ** ** > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330