Re: Compress DataSink Output

Wesley Kerr Fri, 19 Aug 2016 07:23:51 -0700

That looks good.  Thanks!

On Fri, Aug 19, 2016 at 6:15 AM Robert Metzger <rmetz...@apache.org> wrote:


> Hi Wes,
>
> Flink's own OutputFormats don't support compression, but we have some
> tools to use Hadoop's OutputFormats with Flink [1], and those support
> compression:
> https://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapreduce/lib/output/FileOutputFormat.html
>
> Let me know if you need more information.
>
> Regards,
> Robert
>
> [1]:
> https://ci.apache.org/projects/flink/flink-docs-master/apis/batch/hadoop_compatibility.html
>
>
> On Thu, Aug 18, 2016 at 2:13 AM, Wesley Kerr <wesley.n.k...@gmail.com>
> wrote:
>
>> Hello -
>>
>> Forgive me if this has been asked before, but I'm trying to determine the
>> best way to add compression to DataSink Outputs (starting with
>> TextOutputFormat).  Realistically I would like each partition file
>> (based on parallelism) to be compressed independently with gzip, but am
>> open to other solutions.
>>
>> My first thought was to extend TextOutputFormat with a new class that
>> compresses after closing and before returning, but I'm not sure that would
>> work across all possible file systems (S3, Local, and HDFS).
>>
>> Any thoughts?
>>
>> Thanks!
>>
>> Wes
>>
>>
>>
>

Re: Compress DataSink Output

Reply via email to