I think that would be very worthwhile :-) Happy to hear that you want to
contribute that!

Decorating the input stream sounds like a great approach and would also
work for other compression formats.

The other thing that needs to be taken into account is that GZIP files are
not splittable in the same way as uncompressed files. You may have to
invent something clever there, or simply restrict the format to have one
input split per file (rather than block).

On Thu, Apr 30, 2015 at 5:41 PM, Kruse, Sebastian <sebastian.kr...@hpi.de>
wrote:

> Hi everyone,
>
> I just recently came across a use-case where I needed to read gzip files
> and handle byte order marks transparently. I know that gzip can be read
> with Hadoop input formats but that did not work for me since I wanted to
> reuse my existing custom Flink input formats.
>
> It turned out that both requirements (and more) can be dealt with by
> allowing the input formats to decorate the input stream. Do you think it is
> worthwhile to include these changes in Flink? I could take care of it.
>
> Cheers,
> Sebastian
>

Reply via email to