Hello - Forgive me if this has been asked before, but I'm trying to determine the best way to add compression to DataSink Outputs (starting with TextOutputFormat). Realistically I would like each partition file (based on parallelism) to be compressed independently with gzip, but am open to other solutions.
My first thought was to extend TextOutputFormat with a new class that compresses after closing and before returning, but I'm not sure that would work across all possible file systems (S3, Local, and HDFS). Any thoughts? Thanks! Wes