Hey guys, I'm having a problem with pig and cassandra and was hoping someone could point me in the right direction. I've setup Pig and Cassandra and I'm able to run through the example shown in the README.txt - I can view a list of top column names. That's all good stuff.
What I would like to do next is just dump out the column values. Suppose I have a very simple Column Family called User. To that column family, I've added 2 rows of data, each row just has 1 column 'userName'. I'm using a GUID as my key. When I load and dump my rows, I get some data like: (6c7fef29-16dd-44ca-bde1-f53995b2e818,{(userName,someUserName1)}) (8be0b934-45aa-444f-90e2-ce7137a73b68,{(userName,someUserName2}) (c51fc8ce-2a53-46bb-b872-0f644b972f62,{(userName,someUserName3)}) As I understand it, at this point, the GUID is $0 and $1 is the bag that contains my columns. So, like in the README, I run: cols = FOREACH rows GENERATE flatten($1); As I understand it, when I flatten a bag, I get a set of tuples. When I dump cols, I get the following: (userName,someUserName1) (userName,someUserName2) (userName,someUserName3) If I continue with the README, I would run colnames = FOREACH cols GENERATE $0 to give me the column names. I'm a little confused why I only get column names - when I do a describe on cols, I get the following: cols: {bytearray} It seems like $0 should be the entire line (userName,someUserName1), not just the column name. Anyways, what I really what is the column value, not the name. Is there a way to do that? I listed all of the failed attempts I made below. - colnames = FOREACH cols GENERATE $1 and was told $1 was out of bounds. - casted = FOREACH cols GENERATE (tuple(chararray, chararray))$0; but all I got back were empty tuples - values = FOREACH cols GENERATE $0.$1; but I got an error telling me data byte array can't be casted to tuple So I'm stuck - any help would be greatly appreciated. Thanks! Eric.