Re: Cassandra and Pig

Stu Hood Fri, 13 Aug 2010 12:31:56 -0700

Hmm, the example code there may not have been run in distributed mode recently, 
or perhaps Pig performs some magic to automatically register Jars containing 
classes directly referenced as UDFs.


-----Original Message-----
From: "Christian Decker" <decker.christ...@gmail.com>
Sent: Friday, August 13, 2010 12:16pm
To: user@cassandra.apache.org
Subject: Re: Cassandra and Pig

Wow, that was extremely quick, thanks Stu :-)
I'm still a bit unclear on what the pig_cassandra script does. It sets some
variables (PIG_CLASSPATH for one) and then starts the original pig binary
but injects some libraries in it (libthrift and pig-core) but strangely not
the cassandra loadfunc, why not?

Anyway now I understand why I was getting different errors when executing
directly via Pig compared to through pig_cassandra. Still I get an exception
which I cannot explain where it comes from (http://pastebin.com/JYfSSfny):

Caused by: java.lang.RuntimeException: Could not resolve error that occured
when launching map reduce job: java.lang.ExceptionInInitializerError
 at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
 at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)


Any idea? Thanks again for your fast answer :)

On Fri, Aug 13, 2010 at 6:55 PM, Stu Hood <stu.h...@rackspace.com> wrote:

> That error is coming from the frontend: the jars must also be on the local
> classpath. Take a look at how contrib/pig/bin/pig_cassandra sets up
> $PIG_CLASSPATH.
>
> -----Original Message-----
> From: "Christian Decker" <decker.christ...@gmail.com>
> Sent: Friday, August 13, 2010 11:30am
> To: user@cassandra.apache.org
> Subject: Cassandra and Pig
>
> Hi all,
>
> I'm trying to get Pig to read data from a Cassandra cluster, which I
> thought
> trivial since Cassandra already provides me with the CassandraStorage
> class.
> Problem is that once I try executing a simple script like this:
>
> register /path/to/pig-0.7.0-core.jar;register
> /path/to/libthrift-r917130.jar;
> register /path/to/cassandra_loadfunc.jarrows = LOAD
> 'cassandra://Keyspace1/Standard1' USING
> org.apache.cassandra.hadoop.pig.CassandraStorage();cols = FOREACH rows
> GENERATE flatten($1);colnames = FOREACH cols GENERATE $0;namegroups =
> GROUP colnames BY $0;namecounts = FOREACH namegroups GENERATE
> COUNT($1), group;orderednames = ORDER namecounts BY $0;topnames =
> LIMIT orderednames 50;dump topnames;
>
> I just end up with a NoClassDefFoundError:
>
> ERROR org.apache.pig.tools.grunt.Grunt -
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias topnames
> at org.apache.pig.PigServer.openIterator(PigServer.java:521)
>  at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
> at
>
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
>  at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
>  at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:391)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002:
> Unable to store alias topnames
>  at org.apache.pig.PigServer.store(PigServer.java:577)
> at org.apache.pig.PigServer.openIterator(PigServer.java:504)
>  ... 6 more
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
> 2117:
> Unexpected error when launching map reduce job.
> at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:209)
>  at
>
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:308)
> at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:835)
>  at org.apache.pig.PigServer.store(PigServer.java:569)
> ... 7 more
> Caused by: java.lang.RuntimeException: Could not resolve error that occured
> when launching map reduce job: java.lang.NoClassDefFoundError:
> org/apache/thrift/TBase
>  at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher$JobControlThreadExceptionHandler.uncaughtException(MapReduceLauncher.java:510)
>  at java.lang.Thread.dispatchUncaughtException(Thread.java:1845)
>
> I cannot think of a reason as to why. As far as I understood it Pig takes
> the jar files in the script, unpackages them, creates the execution plan
> for
> the script itself and then bundles it into a single jar again, then submits
> it to the HDFS from where it will be executed in Hadoop, right?
> I also checked that the class in question actually is in the libthrift jar,
> so what's going wrong?
>
> Regards,
> Chris
>
>
>

Re: Cassandra and Pig

Reply via email to