Re: Handling decompression exceptions

2016-10-11 Thread Yassine MARZOUGUI
Thank you Fabian and Flavio for your help. Best, Yassine 2016-10-11 14:02 GMT+02:00 Flavio Pompermaier : > I posted a workaround for that at https://github.com/okkam-it/ > flink-examples/blob/master/src/main/java/it/okkam/datalinks/batch/flink/ > datasourcemanager/importers/Csv2RowExample.java >

Re: Handling decompression exceptions

2016-10-11 Thread Flavio Pompermaier
I posted a workaround for that at https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/datalinks/batch/flink/datasourcemanager/importers/Csv2RowExample.java On 11 Oct 2016 1:57 p.m., "Fabian Hueske" wrote: > Hi, > > Flink's String parser does not support escaped quotes.

Re: Handling decompression exceptions

2016-10-11 Thread Fabian Hueske
Hi, Flink's String parser does not support escaped quotes. You data contains a double double quote (""). The parser identifies this as the end of the string field. As a workaround, you can read the file as a regular text file, line by line and do the parsing in a MapFunction. Best, Fabian 2016-1

Re: Handling decompression exceptions

2016-10-11 Thread Yassine MARZOUGUI
Forgot to add parseQuotedStrings('"'). After adding it I'm getting the same exception with the second code too. 2016-10-11 13:29 GMT+02:00 Yassine MARZOUGUI : > Hi Fabian, > > I tried to debug the code, and it turns out a line in my csv data is > causing the ArrayIndexOutOfBoundsException, here i

Re: Handling decompression exceptions

2016-10-11 Thread Yassine MARZOUGUI
Hi Fabian, I tried to debug the code, and it turns out a line in my csv data is causing the ArrayIndexOutOfBoundsException, here is the exception stacktrace: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.flink.types.parser.StringParser.parseField(StringParser.java:49) at org.apache.f

Re: Handling decompression exceptions

2016-10-11 Thread Fabian Hueske
Hi Yassine, I ran your code without problems and got the correct result. Can you provide the Stacktrace of the Exception? Thanks, Fabian 2016-10-10 10:57 GMT+02:00 Yassine MARZOUGUI : > Thank you Fabian and Stephan for the suggestions. > I couldn't override "readLine()" because it's final, so w

Re: Handling decompression exceptions

2016-10-10 Thread Yassine MARZOUGUI
Thank you Fabian and Stephan for the suggestions. I couldn't override "readLine()" because it's final, so went with Fabian's solution, but I'm struggling with csv field masks. Any help is appreciated. I created an Input Format which is basically TupleCsvInputFormat for which I overrode the nextReco

Re: Handling decompression exceptions

2016-10-04 Thread Stephan Ewen
How about just overriding the "readLine()" method to call "super.readLine()" and catching EOF exceptions? On Tue, Oct 4, 2016 at 5:56 PM, Fabian Hueske wrote: > Hi Yassine, > > AFAIK, there is no built-in way to ignore corrupted compressed files. > You could try to implement a FileInputFormat th

Re: Handling decompression exceptions

2016-10-04 Thread Fabian Hueske
Hi Yassine, AFAIK, there is no built-in way to ignore corrupted compressed files. You could try to implement a FileInputFormat that wraps the CsvInputFormat and forwards all calls to the wrapped CsvIF. The wrapper would also catch and ignore the EOFException. If you do that, you would not be able

Handling decompression exceptions

2016-10-04 Thread Yassine MARZOUGUI
Hi all, I am reading a large number of GZip compressed csv files, nested in a HDFS directory: Configuration parameters = new Configuration(); parameters.setBoolean("recursive.file.enumeration", true); DataSet> hist = env.readCsvFile("hdfs:///shared/logs/") .ignoreFirstLine()