I tried this same script on closer to production data, and I'm getting errors. I'm 50% sure it's this: https://issues.apache.org/jira/browse/PIG-1283
One of my rows in cassandra has no columns (maybe?), which maybe causes a null bag, which causes COUNT to blow up (at least, that's my theory). As a workaround, can I have COUNT ignore/skip rows with null columns? I'll start digging through the docs as well. will On Fri, Jun 3, 2011 at 4:09 PM, William Oberman <ober...@civicscience.com>wrote: > That is exactly what I wanted, thanks for the confirm! > > > On Fri, Jun 3, 2011 at 4:06 PM, Dmitriy Ryaboy <dvrya...@gmail.com> wrote: > >> I am not sure what you mean by "count all columns". The code you have >> counts all *cells*. >> So: >> id1: col1, col2 >> id2: col1, col2, col3 >> >> has 3 columns in a conventional sense, but your code will return 5. Is >> that what you want? If so, your code seems correct. >> >> D >> >> On Fri, Jun 3, 2011 at 12:53 PM, William Oberman >> <ober...@civicscience.com> wrote: >> > Howdy, >> > >> > I'm coming from cassandra, and I'm actually trying to count all columns >> in a >> > column family. I believe that is similar to counting the number tuples >> in a >> > bag in the lingo in the pig manual. It was harder than I expected, but >> I >> > think this works: >> > rows = LOAD 'cassandra://MyKeyspace/MyColumnFamily' USING >> CassandraStorage() >> > AS (key, columns: bag {T: tuple(name, value)}); >> > counts = FOREACH rows GENERATE COUNT(columns); >> > counts_in_bag = GROUP counts ALL; >> > sum_of_bag = FOREACH counts_in_bag GENERATE SUM($1); >> > dump sum_of_bag; >> > >> > My question is: am I right that it works? I started with 3 keys having >> a >> > total of 5 columns and got (5). Then I added a new key/column, and >> another >> > column on an existing key and got (7). So, it seems like it's working. >> > But, was there a better way to write it? >> > >> > Thanks! >> > >> > will >> > >> > > > > -- > Will Oberman > Civic Science, Inc. > 3030 Penn Avenue., First Floor > Pittsburgh, PA 15201 > (M) 412-480-7835 > (E) ober...@civicscience.com > -- Will Oberman Civic Science, Inc. 3030 Penn Avenue., First Floor Pittsburgh, PA 15201 (M) 412-480-7835 (E) ober...@civicscience.com