Hello -

Forgive me if this has been asked before, but I'm trying to determine the
best way to add compression to DataSink Outputs (starting with
TextOutputFormat).  Realistically I would like each partition file (based
on parallelism) to be compressed independently with gzip, but am open to
other solutions.

My first thought was to extend TextOutputFormat with a new class that
compresses after closing and before returning, but I'm not sure that would
work across all possible file systems (S3, Local, and HDFS).

Any thoughts?

Thanks!

Wes

Reply via email to