Hmm, the example code there may not have been run in distributed mode recently, or perhaps Pig performs some magic to automatically register Jars containing classes directly referenced as UDFs.
-----Original Message----- From: "Christian Decker" <decker.christ...@gmail.com> Sent: Friday, August 13, 2010 12:16pm To: user@cassandra.apache.org Subject: Re: Cassandra and Pig Wow, that was extremely quick, thanks Stu :-) I'm still a bit unclear on what the pig_cassandra script does. It sets some variables (PIG_CLASSPATH for one) and then starts the original pig binary but injects some libraries in it (libthrift and pig-core) but strangely not the cassandra loadfunc, why not? Anyway now I understand why I was getting different errors when executing directly via Pig compared to through pig_cassandra. Still I get an exception which I cannot explain where it comes from (http://pastebin.com/JYfSSfny): Caused by: java.lang.RuntimeException: Could not resolve error that occured when launching map reduce job: java.lang.ExceptionInInitializerError at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510) at java.lang.Thread.dispatchUncaughtException(Thread.java:1845) Any idea? Thanks again for your fast answer :) On Fri, Aug 13, 2010 at 6:55 PM, Stu Hood <stu.h...@rackspace.com> wrote: > That error is coming from the frontend: the jars must also be on the local > classpath. Take a look at how contrib/pig/bin/pig_cassandra sets up > $PIG_CLASSPATH. > > -----Original Message----- > From: "Christian Decker" <decker.christ...@gmail.com> > Sent: Friday, August 13, 2010 11:30am > To: user@cassandra.apache.org > Subject: Cassandra and Pig > > Hi all, > > I'm trying to get Pig to read data from a Cassandra cluster, which I > thought > trivial since Cassandra already provides me with the CassandraStorage > class. > Problem is that once I try executing a simple script like this: > > register /path/to/pig-0.7.0-core.jar;register > /path/to/libthrift-r917130.jar; > register /path/to/cassandra_loadfunc.jarrows = LOAD > 'cassandra://Keyspace1/Standard1' USING > org.apache.cassandra.hadoop.pig.CassandraStorage();cols = FOREACH rows > GENERATE flatten($1);colnames = FOREACH cols GENERATE $0;namegroups = > GROUP colnames BY $0;namecounts = FOREACH namegroups GENERATE > COUNT($1), group;orderednames = ORDER namecounts BY $0;topnames = > LIMIT orderednames 50;dump topnames; > > I just end up with a NoClassDefFoundError: > > ERROR org.apache.pig.tools.grunt.Grunt - > org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to > open iterator for alias topnames > at org.apache.pig.PigServer.openIterator(PigServer.java:521) > at > org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544) > at > > org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241) > at > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162) > at > > org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138) > at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89) > at org.apache.pig.Main.main(Main.java:391) > Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: > Unable to store alias topnames > at org.apache.pig.PigServer.store(PigServer.java:577) > at org.apache.pig.PigServer.openIterator(PigServer.java:504) > ... 6 more > Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR > 2117: > Unexpected error when launching map reduce job. > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209) > at > > org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308) > at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835) > at org.apache.pig.PigServer.store(PigServer.java:569) > ... 7 more > Caused by: java.lang.RuntimeException: Could not resolve error that occured > when launching map reduce job: java.lang.NoClassDefFoundError: > org/apache/thrift/TBase > at > > org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510) > at java.lang.Thread.dispatchUncaughtException(Thread.java:1845) > > I cannot think of a reason as to why. As far as I understood it Pig takes > the jar files in the script, unpackages them, creates the execution plan > for > the script itself and then bundles it into a single jar again, then submits > it to the HDFS from where it will be executed in Hadoop, right? > I also checked that the class in question actually is in the libthrift jar, > so what's going wrong? > > Regards, > Chris > > >