Hi all, A little while back, I started a project called pygmalion for example scripts and UDFs for people using Pig with Cassandra. Currently there are a few handy UDFs in there like:
FromCassandraBag: a way to convert from what Cassandra returns (key:chararray, columns:bag {column:tuple (name, value)}) to something more tabular (key, value1, value2, value3). You specify the values you want to project - it's good for tabular data. ToCassandraBag: a way to convert from (key, value1, value2, value3) to what Cassandra expects when writing - (key:chararray, columns:bag {column:tuple (name, value)}) - the column names are extracted from the variable names in the Pig script. Both contributed by Jacob Perkins with slight revisions by Jeremy Hanna StringConcat: probably something everyone implements but instead of CONCAT that only does two strings, it does any number of strings. GenerateTimeUUID: a udf that generates a time uuid with or without a time to base it on. https://github.com/jeromatron/pygmalion/ It definitely needs more work and examples, but I've been using the UDFs in there for a while with Cassandra 0.7.5 (previously 0.7-branch). Now that 0.7.5 is released, I'd just like to let people know about it if they would like to contribute or even just use it.