I think I'm stuck on typing issues trying to store data in cassandra. To verify, cassandra wants (key, {tuples})
My pig script is fairly brief: raw = LOAD 'cassandra://test_in/test_cf' USING CassandraStorage() AS (key:chararray, columns:bag {column:tuple (name, value)}); --colums == timeUUID -> JSON rows = FOREACH raw GENERATE key, FLATTEN(columns); alias_target_day = FOREACH rows { --I wrote a specialized parser that does exactly what I need observation_map = com.civicscience.pig.ParseObservation($2); GENERATE $0 as alias, observation_map#'_fqt' as target, observation_map#'_day' as day; }; grouping = GROUP alias_target_day BY ((chararray)target,(chararray)day); X = FOREACH grouping GENERATE group.$0 as target, TOTUPLE(group.$1, COUNT($1)) as day_count; This gets me: (targetA, (day1, count)) (targetA, (day2, count)) (targetB, (day1, count)) .... But, cassandra wants the 2nd item to be a bag. So, I tried: X = FOREACH grouping GENERATE group.$0 as target, TOBAG(TOTUPLE(group.$1, COUNT($1))) as day_count; But this results in: (targetA, {((day1, count))}) (targetA, {((day2, count))}) (targetB, {((day1, count))}) It's hard to see, but the 2nd item now has a nested tuple as the first value, which is still bad. How to I get (key, {tuple})??? I wasn't sure where to post this (pig or cassandra), so I'm posting to the pig list too. will