Been trying for the past little bit to try and get the PIG integration
working with Cassandra 0.8.0

1.  Downloaded the src for 0.8.0 and ran ant build
2.  went into contrib/pig and ran ant ... gives me:
/usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar
and is copied into the lib/ directory
3.  Downloaded pig-0.8.1, modified the ivy/libraries.properties so
that it uses Jackson 1.8.2 .. and ran ant.  it compiles and gives me
two jars:  pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar
----- I did try to run it with Jackson 1.4 as the
contrib/pig/README.txt suggested, but that failed...  The referenced
JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same
results)

Environment variables are set:
java version "1.6.0_24"

PIG_INITIAL_ADDRESS=localhost
PIG_HOME=/usr/local/src/pig-0.8.1
PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner
PIG_RPC_PORT=9160
CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src

I then start up cassandra ... no issues.  I connect and create a new
keyspace called foo with a column family called bar and a CF called
foo...Inside the CF bar, I create a few rows, with random columns ....
4 Rows.

>From contrib/pig I run:  bin/pig_cassandra -x local ... immediately
get the error:

[: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator

-- this is a reference to this line:  if [ ! -e $PIG_JAR ]; then

*** Problem here is that $PIG_JAR is a reference to two files ...
pig-0.8.1-core.jar & pig.jar ...

Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or
even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar

Try again to run:  bin/pig_cassandra -x local and everything loads up nicely:

2011-06-21 02:07:23,671 [main] INFO  org.apache.pig.Main - Logging
error messages to:
/usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log
2011-06-21 02:07:23,778 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
Connecting to hadoop file system at: file:///
grunt> register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register
/usr/local/src/pig-0.8.1/pig.jar; register
/usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar;
register 
/usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar;
register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar;
grunt>
grunt> rows = LOAD 'cassandra://foo/bar' USING CassandraStorage();
grunt> STORE rows into 'cassandra://foo/foo' USING CassandraStorage();
2011-06-21 02:04:53,271 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN
2011-06-21 02:04:53,271 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
pig.usenewlogicalplan is set to true. New logical plan will be used.
2011-06-21 02:04:53,324 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics
with processName=JobTracker, sessionId=
2011-06-21 02:04:53,447 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
(Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1
Operator Key: scope-1)
2011-06-21 02:04:53,458 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false
2011-06-21 02:04:53,477 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2011-06-21 02:04:53,477 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2011-06-21 02:04:53,480 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:53,494 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:53,494 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig script settings are
added to the job
2011-06-21 02:04:53,556 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to
default 0.3
2011-06-21 02:04:59,700 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2011-06-21 02:04:59,718 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:59,719 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2011-06-21 02:04:59,948 [Thread-5] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:59,960 [Thread-5] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:04:59,980 [Thread-5] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2011-06-21 02:05:00,220 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2011-06-21 02:05:00,322 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,340 [Thread-14] INFO
org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total
input paths (combined) to process : 1
2011-06-21 02:05:00,372 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,374 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,378 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,381 [Thread-14] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2011-06-21 02:05:00,491 [Thread-14] WARN
org.apache.hadoop.mapred.LocalJobRunner - job_local_0001
java.lang.NoClassDefFoundError: org/apache/cassandra/db/marshal/TypeParser
        at 
org.apache.cassandra.hadoop.pig.CassandraStorage.getDefaultMarshallers(Unknown
Source)
        at 
org.apache.cassandra.hadoop.pig.CassandraStorage.columnToTuple(Unknown
Source)
        at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown
Source)
        at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187)
        at 
org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423)
        at 
org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
Caused by: java.lang.ClassNotFoundException:
org.apache.cassandra.db.marshal.TypeParser
        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
        ... 10 more
2011-06-21 02:05:00,818 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- HadoopJobId: job_local_0001
2011-06-21 02:05:05,408 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- job job_local_0001 has failed! Stop running all dependent jobs
2011-06-21 02:05:05,411 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2011-06-21 02:05:05,412 [main] ERROR
org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s)
failed!
2011-06-21 02:05:05,412 [main] INFO
org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats
reported below may be incomplete
2011-06-21 02:05:05,413 [main] INFO
org.apache.pig.tools.pigstats.PigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
0.20.2  0.8.1   root    2011-06-21 02:04:53     2011-06-21 02:05:05     UNKNOWN

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_local_0001  rows    MAP_ONLY        Message: Job failed!
cassandra://foo/foo,

Input(s):
Failed to read data from "cassandra://foo/bar"

Output(s):
Failed to produce result in "cassandra://foo/foo"

Job DAG:
job_local_0001


2011-06-21 02:05:05,413 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2011-06-21 02:05:05,416 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
grunt>


Any help or insight is appreciated ....

Reply via email to