Been trying for the past little bit to try and get the PIG integration working with Cassandra 0.8.0
1. Downloaded the src for 0.8.0 and ran ant build 2. went into contrib/pig and ran ant ... gives me: /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/build/cassandra_storage.jar and is copied into the lib/ directory 3. Downloaded pig-0.8.1, modified the ivy/libraries.properties so that it uses Jackson 1.8.2 .. and ran ant. it compiles and gives me two jars: pig-0.8.1-SNAPSHOT-core.jar and pig-0.8.1-SNAPSHOT.jar ----- I did try to run it with Jackson 1.4 as the contrib/pig/README.txt suggested, but that failed... The referenced JIRA ticket (PIG-1863) suggests 1.6.0 (still produces the same results) Environment variables are set: java version "1.6.0_24" PIG_INITIAL_ADDRESS=localhost PIG_HOME=/usr/local/src/pig-0.8.1 PIG_PARTITIONER=org.apache.cassandra.dht.RandomPartitioner PIG_RPC_PORT=9160 CASSANDRA_HOME=/usr/local/src/apache-cassandra-0.8.0-src I then start up cassandra ... no issues. I connect and create a new keyspace called foo with a column family called bar and a CF called foo...Inside the CF bar, I create a few rows, with random columns .... 4 Rows. >From contrib/pig I run: bin/pig_cassandra -x local ... immediately get the error: [: 45: /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar: unexpected operator -- this is a reference to this line: if [ ! -e $PIG_JAR ]; then *** Problem here is that $PIG_JAR is a reference to two files ... pig-0.8.1-core.jar & pig.jar ... Changing line 44 to PIG_JAR=$PIG_HOME/pig*core*.jar fixes this ... (or even referencing $PIG_HOME/build/pig*core*.jar or just pig.jar Try again to run: bin/pig_cassandra -x local and everything loads up nicely: 2011-06-21 02:07:23,671 [main] INFO org.apache.pig.Main - Logging error messages to: /usr/local/src/apache-cassandra-0.8.0-src/contrib/pig/pig_1308593243668.log 2011-06-21 02:07:23,778 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// grunt> register /usr/local/src/pig-0.8.1/pig-0.8.1-core.jar; register /usr/local/src/pig-0.8.1/pig.jar; register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-fixes.jar; register /usr/local/src/apache-cassandra-0.8.0-src/lib/avro-1.4.0-sources-fixes.jar; register /usr/local/src/apache-cassandra-0.8.0-src/lib/libthrift-0.6.jar; grunt> grunt> rows = LOAD 'cassandra://foo/bar' USING CassandraStorage(); grunt> STORE rows into 'cassandra://foo/foo' USING CassandraStorage(); 2011-06-21 02:04:53,271 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2011-06-21 02:04:53,271 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - pig.usenewlogicalplan is set to true. New logical plan will be used. 2011-06-21 02:04:53,324 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 2011-06-21 02:04:53,447 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - (Name: rows: Store(cassandra://foo/foo:CassandraStorage) - scope-1 Operator Key: scope-1) 2011-06-21 02:04:53,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2011-06-21 02:04:53,477 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2011-06-21 02:04:53,477 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2011-06-21 02:04:53,480 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:04:53,494 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:04:53,494 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job 2011-06-21 02:04:53,556 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2011-06-21 02:04:59,700 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2011-06-21 02:04:59,718 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:04:59,719 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2011-06-21 02:04:59,948 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:04:59,960 [Thread-5] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:04:59,980 [Thread-5] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2011-06-21 02:05:00,220 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2011-06-21 02:05:00,322 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:05:00,340 [Thread-14] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2011-06-21 02:05:00,372 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:05:00,374 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:05:00,378 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:05:00,381 [Thread-14] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2011-06-21 02:05:00,491 [Thread-14] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 java.lang.NoClassDefFoundError: org/apache/cassandra/db/marshal/TypeParser at org.apache.cassandra.hadoop.pig.CassandraStorage.getDefaultMarshallers(Unknown Source) at org.apache.cassandra.hadoop.pig.CassandraStorage.columnToTuple(Unknown Source) at org.apache.cassandra.hadoop.pig.CassandraStorage.getNext(Unknown Source) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:187) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:423) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177) Caused by: java.lang.ClassNotFoundException: org.apache.cassandra.db.marshal.TypeParser at java.net.URLClassLoader$1.run(URLClassLoader.java:202) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:190) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301) at java.lang.ClassLoader.loadClass(ClassLoader.java:248) ... 10 more 2011-06-21 02:05:00,818 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local_0001 2011-06-21 02:05:05,408 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local_0001 has failed! Stop running all dependent jobs 2011-06-21 02:05:05,411 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2011-06-21 02:05:05,412 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed! 2011-06-21 02:05:05,412 [main] INFO org.apache.pig.tools.pigstats.PigStats - Detected Local mode. Stats reported below may be incomplete 2011-06-21 02:05:05,413 [main] INFO org.apache.pig.tools.pigstats.PigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 0.20.2 0.8.1 root 2011-06-21 02:04:53 2011-06-21 02:05:05 UNKNOWN Failed! Failed Jobs: JobId Alias Feature Message Outputs job_local_0001 rows MAP_ONLY Message: Job failed! cassandra://foo/foo, Input(s): Failed to read data from "cassandra://foo/bar" Output(s): Failed to produce result in "cassandra://foo/foo" Job DAG: job_local_0001 2011-06-21 02:05:05,413 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2011-06-21 02:05:05,416 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized grunt> Any help or insight is appreciated ....