Re: read .gz files

2015-02-19 Thread Sebastian
Upgrading to 0.8.1 helped, thx! On 19.02.2015 22:08, Robert Metzger wrote: Hey, are you using Flink 0.8.0 ? I think we've added support for Hadoop input formats with scala in 0.8.1 and 0.9 (master). The following code just printed me the List of all page titles of the catalan wikipedia ;) (bui

Re: read .gz files

2015-02-19 Thread Robert Metzger
Hey, are you using Flink 0.8.0 ? I think we've added support for Hadoop input formats with scala in 0.8.1 and 0.9 (master). The following code just printed me the List of all page titles of the catalan wikipedia ;) (build against master) def main(args: Array[String]) { val env = ExecutionEnvi

Re: read .gz files

2015-02-19 Thread Sebastian
I tried to follow the example on the web page like this: --- implicit val env = ExecutionEnvironment.getExecutionEnvironment val job = Job.getInstance val hadoopInput = new HadoopInputFormat[LongWritable,Text]( new TextInput

Re: read .gz files

2015-02-19 Thread Robert Metzger
I just had a look at Hadoop's TextInputFormat. In hadoop-common-2.2.0.jar there are the following compression codecs contained: org.apache.hadoop.io.compress.BZip2Codec org.apache.hadoop.io.compress.DefaultCodec org.apache.hadoop.io.compress.DeflateCodec org.apache.hadoop.io.compress.GzipCodec org

Re: read .gz files

2015-02-19 Thread Robert Metzger
Hi, right now Flink itself has only support for reading ".deflate" files. Its basically the same algorithm as gzip but gzip files seem to have some header which makes the two formats incompatible. But you can easily use HadoopInputFormats with Flink. I'm sure there is a Hadoop IF for reading gzip

read .gz files

2015-02-19 Thread Sebastian
Hi, does flink support reading gzipped files? Haven't found any info about this on the website. Best, Sebastian