2014-07-09 4:15 GMT+02:00 Gary Gregory <garydgreg...@gmail.com>: > We do have a discrepancy between our format class and lexer (which is > hardwired with CR & LF). > > Ideally, it seems the lexer should pickup it's set of EOL Strings from the > format. > > I recall reading worries of performance issues changing this but either we > support all of the EOL strings including some of the odd ball ones like > Unicode, or we do not. Perhaps we can have an alternate Lexer that takes a > set of EOL strings if performance is really that much worse. >
Sounds reasonable, but seems to be a lot of work. Maybe we can just document that 1.0 can only handle CR & LF and add the ability for more exotic record separators in 1.1. I'm hoping for higher adoption and more patches once we have a release on maven central. Benedikt > > Gary > > > On Mon, Jul 7, 2014 at 1:47 PM, Benedikt Ritter <brit...@apache.org> > wrote: > > > Any thoughts about this fix? Could be a solution to push out 1.0. If we > > come up with a more generic solution afterwards, we can still deprecate > > escapeCRLFOnce. > > > > Benedikt > > > > ---------- Forwarded message ---------- > > From: Tillmann Gaida (JIRA) <j...@apache.org> > > Date: 2014-06-30 10:36 GMT+02:00 > > Subject: [jira] [Comment Edited] (CSV-35) Escaped line separators are not > > supported > > To: brit...@apache.org > > > > > > > > [ > > > > > https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047460#comment-14047460 > > ] > > > > Tillmann Gaida edited comment on CSV-35 at 6/30/14 8:34 AM: > > ------------------------------------------------------------ > > > > I added a patch "commons-csv CSV-35 escapeCRLFOnce[ test].patch", which > > introduces a CSVFormat setting "escapeCRLFOnce", which enables the > desired > > behaviour in Lexer. It is false by default and I did not change > > CSVFormat.MYSQL, which might be approprate. I am not exactly happy with > the > > naming of the setting. Consider renaming it if you happen to build upon > the > > patch. > > > > EDIT: clarity > > > > EDIT: This is a very specific setting. A cleaner solution would probably > be > > to allow escaping of record separators by a single escape char. However > it > > appears that the MYSQL format uses LF as a record separator, so we would > > need to have multiple record separators, which in this case would not be > > actual record separators. > > > > I'd argue that CRLF is special enough to have an individual setting, but > I > > would also agree with having a cleaner CSVFormat. The only real > alternative > > would be having a way to individually specify character sequences and a > > replacement if they are preceded by the escape char. > > > > > > was (Author: tillmann gaida): > > I added a patch "commons-csv CSV-35 escapeCRLFOnce[ test].patch", which > > introduces a CSVFormat setting "escapeCRLFOnce", which enables the > desired > > behaviour in Lexer. It is false by default and I did not change > > CSVFormat.MYSQL, which might be approprate. I am not exactly happy with > the > > naming of the setting. Consider renaming it if you happen to build upon > the > > patch. > > > > EDIT: clarity > > > > > Escaped line separators are not supported > > > ----------------------------------------- > > > > > > Key: CSV-35 > > > URL: https://issues.apache.org/jira/browse/CSV-35 > > > Project: Commons CSV > > > Issue Type: Bug > > > Reporter: Emmanuel Bourg > > > Fix For: 1.0 > > > > > > Attachments: CSV-35.patch, commons-csv CSV-35 escapeCRLFOnce > > test.patch, commons-csv CSV-35 escapeCRLFOnce.patch, > > mysql-export-line-terminated-by-crlf.csv, > > mysql-export-line-terminated-by-lf.csv > > > > > > > > > Commons CSV doesn't handle escaped line separators, for example: > > > {code} > > > value1;value2;value3a\ > > > value3b > > > {code} > > > In this case the expected result is: > > > {code}["value1", "value2", "value3a\nvalue3b"]{code} > > > This kind of escaping is produced by MySQL, whether the field enclosing > > is enabled or not. It's possible to see enclosing quotes and escaped line > > separators like this: > > > {code} > > > "value1";"value2";"value3a\ > > > value3b" > > > {code} > > > > > > > > -- > > This message was sent by Atlassian JIRA > > (v6.2#6252) > > > > > > > > -- > > http://people.apache.org/~britter/ > > http://www.systemoutprintln.de/ > > http://twitter.com/BenediktRitter > > http://github.com/britter > > > > > > -- > E-Mail: garydgreg...@gmail.com | ggreg...@apache.org > Java Persistence with Hibernate, Second Edition > <http://www.manning.com/bauer3/> > JUnit in Action, Second Edition <http://www.manning.com/tahchiev/> > Spring Batch in Action <http://www.manning.com/templier/> > Blog: http://garygregory.wordpress.com > Home: http://garygregory.com/ > Tweet! http://twitter.com/GaryGregory > -- http://people.apache.org/~britter/ http://www.systemoutprintln.de/ http://twitter.com/BenediktRitter http://github.com/britter