Hi Will, That's partly why I like to use FromCassandraBag and ToCassandraBag from pygmalion - it does the work for you to get it back into a form that cassandra understands.
Others may know better how to massage the data into that form using just pig, but if all else fails, you could write a udf to do that. Jeremy On Jun 15, 2011, at 1:17 PM, William Oberman wrote: > I think I'm stuck on typing issues trying to store data in cassandra. To > verify, cassandra wants (key, {tuples}) > > My pig script is fairly brief: > raw = LOAD 'cassandra://test_in/test_cf' USING CassandraStorage() AS > (key:chararray, columns:bag {column:tuple (name, value)}); > --colums == timeUUID -> JSON > rows = FOREACH raw GENERATE key, FLATTEN(columns); > alias_target_day = FOREACH rows { > --I wrote a specialized parser that does exactly what I need > observation_map = com.civicscience.pig.ParseObservation($2); > GENERATE $0 as alias, observation_map#'_fqt' as target, > observation_map#'_day' as day; > }; > grouping = GROUP alias_target_day BY ((chararray)target,(chararray)day); > X = FOREACH grouping GENERATE group.$0 as target, TOTUPLE(group.$1, > COUNT($1)) as day_count; > > This gets me: > (targetA, (day1, count)) > (targetA, (day2, count)) > (targetB, (day1, count)) > .... > > But, cassandra wants the 2nd item to be a bag. So, I tried: > X = FOREACH grouping GENERATE group.$0 as target, TOBAG(TOTUPLE(group.$1, > COUNT($1))) as day_count; > > But this results in: > (targetA, {((day1, count))}) > (targetA, {((day2, count))}) > (targetB, {((day1, count))}) > It's hard to see, but the 2nd item now has a nested tuple as the first value, > which is still bad. > > How to I get (key, {tuple})??? I wasn't sure where to post this (pig or > cassandra), so I'm posting to the pig list too. > > will