I'm currently trying to get Cassandra (1.2.5) and Pig (0.11.1) to play nice together. I'm running a basic script:
rows = LOAD 'cassandra://keyspace/colfam' USING CassandraStorage(); dump rows; This fails for my column family which has ~100,000 rows. However, if I modify the script to this: rows = LOAD 'cassandra://betable_games/bets' USING CassandraStorage(); rows = limit rows 7000; dump rows; Then it seems to work. 7000 is about as high as I've been able to get it before it fails. The error I keep getting is: 2013-06-07 14:58:49,119 [Thread-4] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local_0001 java.lang.RuntimeException: org.apache.thrift.TException: Message length exceeded: 4480 at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:384) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:390) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.computeNext(ColumnFamilyRecordReader.java:313) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader.getProgress(ColumnFamilyRecordReader.java:103) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.getProgress(PigRecordReader.java:169) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.getProgress(MapTask.java:514) at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:539) at org.apache.hadoop.mapreduce.MapContext.nextKeyValue(MapContext.java:67) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:214) Caused by: org.apache.thrift.TException: Message length exceeded: 4480 at org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:393) at org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363) at org.apache.cassandra.thrift.Column.read(Column.java:535) at org.apache.cassandra.thrift.ColumnOrSuperColumn.read(ColumnOrSuperColumn.java:507) at org.apache.cassandra.thrift.KeySlice.read(KeySlice.java:408) at org.apache.cassandra.thrift.Cassandra$get_range_slices_result.read(Cassandra.java:12905) at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78) at org.apache.cassandra.thrift.Cassandra$Client.recv_get_range_slices(Cassandra.java:734) at org.apache.cassandra.thrift.Cassandra$Client.get_range_slices(Cassandra.java:718) at org.apache.cassandra.hadoop.ColumnFamilyRecordReader$StaticRowIterator.maybeInit(ColumnFamilyRecordReader.java:346) ... 13 more I've seen a similar problem on this mailing list using Cassandra-1.2.3, however the fixes on that thread of increasing thrift_framed_transport_size_in_mb, thrift_max_message_length_in_mb in cassandra.yaml did not appear to have any effect. Has anyone else seen this issue, and how can I fix it? Thanks, -Mark