For posterity, I ended up hacking around this by renaming the repeated 'value' alias in CassandraStorage and rebuilding it. Here's the patch:
--- src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java.original 2011-10-11 23:42:19.000000000 -0700 +++ src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java 2011-10-11 23:44:26.000000000 -0700 @@ -357,7 +357,7 @@ validator = validators.get(cdef.getName()); if (validator == null) validator = marshallers.get(1); - valSchema.setName("value"); + valSchema.setName("value_"+new String(cdef.getName())); valSchema.setType(getPigType(validator)); tupleFields.add(valSchema); } I'm not suggesting this is a correct fix, but it does allow me to move forward. Another suggestion was to try Pig 0.8.1 instead, but I ran into https://cwiki.apache.org/confluence/display/PIG/FAQ#FAQ-Q%3AWhatshallIdoifIsaw%22FailedtocreateDataStorage%22%3F On Tue, Oct 11, 2011 at 10:34 PM, Pete Warden <p...@jetpac.com> wrote: > Thanks for all your help Brandon and Jeremy, that got me to the point where > I could load data. > > I'm now hitting a new issue that seems like it could possibly be related. > When I try to access the data like this: > > grunt> rows = LOAD 'cassandra://Frap/FriendsAlreadyRanked' USING > CassandraStorage(); > grunt> parts = FOREACH rows GENERATE key, > FromCassandraBag('time_last_ranked', columns); > > I see the following error: > > 2011-10-11 22:23:43,877 [main] ERROR org.apache.pig.tools.grunt.Grunt - > ERROR 1108: > <line 4, column 71> Duplicate schema alias: value in "columns" > > At first I thought it might be related to the Pygmalion helper functions, > so I tried to strip it back to basics using this second line instead: > > parts = FOREACH rows GENERATE key,$1; > > and I still get an identical error. > > Any further thoughts on how I can dig into this? > > Thanks again, > Pete > > On Tue, Oct 11, 2011 at 3:37 PM, Brandon Williams <dri...@gmail.com>wrote: > >> On Tue, Oct 11, 2011 at 4:24 PM, Pete Warden <p...@petewarden.com> wrote: >> > I'm trying to run the most basic example for pig_cassandra, counting the >> > number of rows in a column family, and I'm hitting the following error: >> > 2011-10-11 14:13:32,321 [main] ERROR org.apache.pig.tools.grunt.Grunt - >> > ERROR 1031: Incompatable field schema: left is >> > "columns:bag{:tuple(name:bytearray,value:bytearray)}", right is >> > >> "columns:bag{:tuple(name:chararray,value:bytearray,time_last_ranked:chararray,value:bytearray)}" >> >> After https://issues.apache.org/jira/browse/CASSANDRA-2777 you need to >> remove the 'AS' and everything after it; your schema definition >> conflicts with what was inferred. >> >> -Brandon >> > >