Great. Please file a JIRA and open a pull request for the feature! On Mon, May 4, 2015 at 10:37 AM, Kruse, Sebastian <sebastian.kr...@hpi.de> wrote:
> Right, I saw the .deflate file support und the unsplittable flag and built > upon that code. I just tried to generalize it and expose it as a hook, so > that unforeseen issues like new exotic compression formats or handling > custom preambles can be implemented by the users themselves. > I can create a ticket and a pull request by this week, so that you can > have a look at it. > > Cheers, > Sebastian > ________________________________________ > From: Robert Metzger [metrob...@gmail.com] > Sent: Thursday, April 30, 2015 21:01 > To: dev@flink.apache.org > Subject: Re: Gzip support > > There is already support for inflate compressed files and I introduced > logic to handle unsplittable formats. > > > Sent from my iPhone > > > On 30.04.2015, at 19:39, Stephan Ewen <se...@apache.org> wrote: > > > > I think that would be very worthwhile :-) Happy to hear that you want to > > contribute that! > > > > Decorating the input stream sounds like a great approach and would also > > work for other compression formats. > > > > The other thing that needs to be taken into account is that GZIP files > are > > not splittable in the same way as uncompressed files. You may have to > > invent something clever there, or simply restrict the format to have one > > input split per file (rather than block). > > > > On Thu, Apr 30, 2015 at 5:41 PM, Kruse, Sebastian < > sebastian.kr...@hpi.de> > > wrote: > > > >> Hi everyone, > >> > >> I just recently came across a use-case where I needed to read gzip files > >> and handle byte order marks transparently. I know that gzip can be read > >> with Hadoop input formats but that did not work for me since I wanted to > >> reuse my existing custom Flink input formats. > >> > >> It turned out that both requirements (and more) can be dealt with by > >> allowing the input formats to decorate the input stream. Do you think > it is > >> worthwhile to include these changes in Flink? I could take care of it. > >> > >> Cheers, > >> Sebastian > >> >