This has to be a bug or either that or I'm insane. Here's my table in Cassandra:
CREATE TABLE test_source ( id int , primary key(id) ); INSERT INTO test_source (ID) VALUES(1); INSERT INTO test_source (ID) VALUES(2); INSERT INTO test_source (ID) VALUES(3); INSERT INTO test_source (ID) VALUES(4); cqlsh:blogindex> select * from test_source; id ---- 1 2 4 3 (4 rows) … now I load that into pig and run: test_source = LOAD 'cassandra://blogindex/test_source' USING CassandraStorage() AS (source, target: bag {T: tuple(name, value)}); dump test_source; (4,{((),)}) (1,{((),)}) (2,{((),)}) (4,{((),)}) (1,{((),)}) (3,{((),)}) (3,{((),)}) (2,{((),)}) … now it COULD be a bug with 'dump' … but even then that's a bug. I suspect that Cassandra might be getting confused and giving too many rows to pig due to maybe duplicating input splits? -- Founder/CEO Spinn3r.com Location: *San Francisco, CA* Skype: *burtonator* blog: http://burtonator.wordpress.com … or check out my Google+ profile<https://plus.google.com/102718274791889610666/posts> <http://spinn3r.com> War is peace. Freedom is slavery. Ignorance is strength. Corporations are people.