I was able to solve the issue. There was another layer of compression happening in the DAO that was using java.util.zip.Deflater/Inflater, along with the snappy compression defined on the CF. The solution was to extend CassandraStorage and override the getNext() method. The new implementation calls super.getNext() and inflates the Tuples where appropriate.
-Marlon On Wed, Apr 23, 2014 at 1:39 PM, marlon hendred <mhend...@gmail.com> wrote: > Hi, > > I'm attempting to dump a pig relation of a compressed column family. Its a > single column whose value is a json blob. It's compressed via snappy > compression and the value validator is BytesType. After I create the > relation and dump I get garbage. Here is the describe: > > ColumnFamily: CF > Key Validation Class: org.apache.cassandra.db.marshal.TimeUUIDType > Default column value validator: > org.apache.cassandra.db.marshal.BytesType > Cells sorted by: org.apache.cassandra.db.marshal.UTF8Type > GC grace seconds: 86400 > Compaction min/max thresholds: 2/32 > Read repair chance: 0.1 > DC Local Read repair chance: 0.0 > Populate IO Cache on flush: false > Replicate on write: true > Caching: KEYS_ONLY > Bloom Filter FP chance: default > Built indexes: [] > Compaction Strategy: > org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy > Compression Options: > sstable_compression: > org.apache.cassandra.io.compress.SnappyCompressor > > Pig stuff: > rows = LOAD 'cql://Keyspace/CF' using CqlStorage(); > > I've tried to overwrite the schema by adding 'as (key: chararray, col1: > chararray, value: chararray)' but when I dump this it still looks like its > binary. > > Do I need to implement my own CqlStorage() here that uncompress or am I > just missing something? I've done some googling but haven't seen anything > on the subject. Also I am using Datastax Enterprise. 3.1. Thanks in > advance! > > -m >