[ 
https://issues.apache.org/jira/browse/FLINK-4785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15905779#comment-15905779
 ] 

Luke Hutchison edited comment on FLINK-4785 at 3/10/17 11:48 PM:
-----------------------------------------------------------------

I'm pretty sure I have seen backslash escaping in CSV before, but the 
old-school way of quoting quote characters (double double quotes) is the one 
that made it into the RFC, presumably for backwards compatibility with 
spreadsheets.

>From my dup bug report, https://issues.apache.org/jira/browse/FLINK-6107 :

--

The RFC for the CSV format specifies that double quotes are valid in quoted 
strings in CSV, by doubling the quote character:

https://tools.ietf.org/html/rfc4180

However, when parsing a CSV file with Flink containing quoted quotes, such as:

bob,"The name is ""Bob"""

you get this exception:

org.apache.flink.api.common.io.ParseException: Line could not be parsed: 
'bob,"The name is ""Bob"""'
ParserError UNQUOTED_CHARS_AFTER_QUOTED_STRING 
Expect field types: class java.lang.String, class java.lang.String

--

See also https://issues.apache.org/jira/browse/FLINK-6016 (quoted strings in 
CSV should be able to contain newlines).


was (Author: lukehutch):
I'm pretty sure I have seen backslash escaping in CSV before, but the 
old-school way of quoting quote characters (double double quotes) is the one 
that made it into the RFC, presumably for backwards compatibility with 
spreadsheets.

Fabian -- you copied the text from the wrong bug report, 
https://issues.apache.org/jira/browse/FLINK-6016 , rather than 
https://issues.apache.org/jira/browse/FLINK-6107 , which is:

--

The RFC for the CSV format specifies that double quotes are valid in quoted 
strings in CSV, by doubling the quote character:

https://tools.ietf.org/html/rfc4180

However, when parsing a CSV file with Flink containing quoted quotes, such as:

bob,"The name is ""Bob"""

you get this exception:

org.apache.flink.api.common.io.ParseException: Line could not be parsed: 
'bob,"The name is ""Bob"""'
ParserError UNQUOTED_CHARS_AFTER_QUOTED_STRING 
Expect field types: class java.lang.String, class java.lang.String

> Flink string parser doesn't handle string fields containing two consecutive 
> double quotes
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-4785
>                 URL: https://issues.apache.org/jira/browse/FLINK-4785
>             Project: Flink
>          Issue Type: Improvement
>          Components: Core
>    Affects Versions: 1.1.2
>            Reporter: Flavio Pompermaier
>              Labels: csv
>
> To reproduce the error run 
> https://github.com/okkam-it/flink-examples/blob/master/src/main/java/it/okkam/datalinks/batch/flink/datasourcemanager/importers/Csv2RowExample.java



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to