There is already support for inflate compressed files and I introduced logic to handle unsplittable formats.
Sent from my iPhone > On 30.04.2015, at 19:39, Stephan Ewen <se...@apache.org> wrote: > > I think that would be very worthwhile :-) Happy to hear that you want to > contribute that! > > Decorating the input stream sounds like a great approach and would also > work for other compression formats. > > The other thing that needs to be taken into account is that GZIP files are > not splittable in the same way as uncompressed files. You may have to > invent something clever there, or simply restrict the format to have one > input split per file (rather than block). > > On Thu, Apr 30, 2015 at 5:41 PM, Kruse, Sebastian <sebastian.kr...@hpi.de> > wrote: > >> Hi everyone, >> >> I just recently came across a use-case where I needed to read gzip files >> and handle byte order marks transparently. I know that gzip can be read >> with Hadoop input formats but that did not work for me since I wanted to >> reuse my existing custom Flink input formats. >> >> It turned out that both requirements (and more) can be dealt with by >> allowing the input formats to decorate the input stream. Do you think it is >> worthwhile to include these changes in Flink? I could take care of it. >> >> Cheers, >> Sebastian >>