[ https://issues.apache.org/jira/browse/FLINK-21562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nico Kruber updated FLINK-21562: -------------------------------- Labels: auto-deprioritized-major usability (was: auto-deprioritized-major) > Add more informative message on CSV parsing errors > -------------------------------------------------- > > Key: FLINK-21562 > URL: https://issues.apache.org/jira/browse/FLINK-21562 > Project: Flink > Issue Type: Improvement > Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table > SQL / Ecosystem > Affects Versions: 1.11.3 > Reporter: Nico Kruber > Priority: Minor > Labels: auto-deprioritized-major, usability > > I was parsing a CSV file with comments in it and used {{'csv.allow-comments' > = 'true'}} without also passing {{'csv.ignore-parse-errors' = 'true'}} to the > table DDL to not hide any actual format errors. > Since I didn't just have strings in my table, this did of course stumble on > the commented-out line with the following error: > {code} > 2021-02-16 17:45:53,055 WARN org.apache.flink.runtime.taskmanager.Task > [] - Source: TableSourceScan(table=[[default_catalog, > default_database, airports]], fields=[IATA_CODE, AIRPORT, CITY, STATE, > COUNTRY, LATITUDE, LONGITUDE]) -> SinkConversionToTuple2 -> Sink: SQL Client > Stream Collect Sink (1/1)#0 (9f3a3965f18ed99ee42580bdb559ba66) switched from > RUNNING to FAILED. > java.io.IOException: Failed to deserialize CSV row. > at > org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:257) > ~[flink-csv-1.12.1.jar:1.12.1] > at > org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:162) > ~[flink-csv-1.12.1.jar:1.12.1] > at > org.apache.flink.streaming.api.functions.source.InputFormatSourceFunction.run(InputFormatSourceFunction.java:90) > ~[flink-dist_2.12-1.12.1.jar:1.12.1] > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:110) > ~[flink-dist_2.12-1.12.1.jar:1.12.1] > at > org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:66) > ~[flink-dist_2.12-1.12.1.jar:1.12.1] > at > org.apache.flink.streaming.runtime.tasks.SourceStreamTask$LegacySourceFunctionThread.run(SourceStreamTask.java:241) > ~[flink-dist_2.12-1.12.1.jar:1.12.1] > Caused by: java.lang.NumberFormatException: empty String > at > sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1842) > ~[?:1.8.0_275] > at sun.misc.FloatingDecimal.parseDouble(FloatingDecimal.java:110) > ~[?:1.8.0_275] > at java.lang.Double.parseDouble(Double.java:538) ~[?:1.8.0_275] > at > org.apache.flink.formats.csv.CsvToRowDataConverters.convertToDouble(CsvToRowDataConverters.java:203) > ~[flink-csv-1.12.1.jar:1.12.1] > at > org.apache.flink.formats.csv.CsvToRowDataConverters.lambda$createNullableConverter$ac6e531e$1(CsvToRowDataConverters.java:113) > ~[flink-csv-1.12.1.jar:1.12.1] > at > org.apache.flink.formats.csv.CsvToRowDataConverters.lambda$createRowConverter$18bb1dd$1(CsvToRowDataConverters.java:98) > ~[flink-csv-1.12.1.jar:1.12.1] > at > org.apache.flink.formats.csv.CsvFileSystemFormatFactory$CsvInputFormat.nextRecord(CsvFileSystemFormatFactory.java:251) > ~[flink-csv-1.12.1.jar:1.12.1] > ... 5 more > {code} > Two things should be improved here: > # commented-out lines should be ignored by default (potentially, FLINK-17133 > addresses this or at least gives the user the power to do so) > # the error message itself is not very informative: "empty String". > This ticket is about the latter. I would suggest to have at least a few more > pointers to direct the user to finding the source in the CSV file/item/... - > here, the data type could just be wrong or the CSV file itself may be > wrong/corrupted and the user would need to investigate. > What exactly may help here, probably depends on the actual input connector > this format is currently working with, e.g. line number in a csv file would > be best, otherwise that may not be possible but we could show the whole line > or at least a few surrounding fields... -- This message was sent by Atlassian Jira (v8.3.4#803005)