2014-07-09 4:15 GMT+02:00 Gary Gregory <garydgreg...@gmail.com>:

> We do have a discrepancy between our format class and lexer (which is
> hardwired with CR & LF).
>
> Ideally, it seems the lexer should pickup it's set of EOL Strings from the
> format.
>
> I recall reading worries of performance issues changing this but either we
> support all of the EOL strings including some of the odd ball ones like
> Unicode, or we do not. Perhaps we can have an alternate Lexer that takes a
> set of EOL strings if performance is really that much worse.
>

Sounds reasonable, but seems to be a lot of work. Maybe we can just
document that 1.0 can only handle CR & LF and add the ability for more
exotic record separators in 1.1. I'm hoping for higher adoption and more
patches once we have a release on maven central.

Benedikt


>
> Gary
>
>
> On Mon, Jul 7, 2014 at 1:47 PM, Benedikt Ritter <brit...@apache.org>
> wrote:
>
> > Any thoughts about this fix? Could be a solution to push out 1.0. If we
> > come up with a more generic solution afterwards, we can still deprecate
> > escapeCRLFOnce.
> >
> > Benedikt
> >
> > ---------- Forwarded message ----------
> > From: Tillmann Gaida (JIRA) <j...@apache.org>
> > Date: 2014-06-30 10:36 GMT+02:00
> > Subject: [jira] [Comment Edited] (CSV-35) Escaped line separators are not
> > supported
> > To: brit...@apache.org
> >
> >
> >
> >     [
> >
> >
> https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047460#comment-14047460
> > ]
> >
> > Tillmann Gaida edited comment on CSV-35 at 6/30/14 8:34 AM:
> > ------------------------------------------------------------
> >
> > I added a patch "commons-csv CSV-35 escapeCRLFOnce[ test].patch", which
> > introduces a CSVFormat setting "escapeCRLFOnce", which enables the
> desired
> > behaviour in Lexer. It is false by default and I did not change
> > CSVFormat.MYSQL, which might be approprate. I am not exactly happy with
> the
> > naming of the setting. Consider renaming it if you happen to build upon
> the
> > patch.
> >
> > EDIT: clarity
> >
> > EDIT: This is a very specific setting. A cleaner solution would probably
> be
> > to allow escaping of record separators by a single escape char. However
> it
> > appears that the MYSQL format uses LF as a record separator, so we would
> > need to have multiple record separators, which in this case would not be
> > actual record separators.
> >
> > I'd argue that CRLF is special enough to have an individual setting, but
> I
> > would also agree with having a cleaner CSVFormat. The only real
> alternative
> > would be having a way to individually specify character sequences and a
> > replacement if they are preceded by the escape char.
> >
> >
> > was (Author: tillmann gaida):
> > I added a patch "commons-csv CSV-35 escapeCRLFOnce[ test].patch", which
> > introduces a CSVFormat setting "escapeCRLFOnce", which enables the
> desired
> > behaviour in Lexer. It is false by default and I did not change
> > CSVFormat.MYSQL, which might be approprate. I am not exactly happy with
> the
> > naming of the setting. Consider renaming it if you happen to build upon
> the
> > patch.
> >
> > EDIT: clarity
> >
> > > Escaped line separators are not supported
> > > -----------------------------------------
> > >
> > >                 Key: CSV-35
> > >                 URL: https://issues.apache.org/jira/browse/CSV-35
> > >             Project: Commons CSV
> > >          Issue Type: Bug
> > >            Reporter: Emmanuel Bourg
> > >             Fix For: 1.0
> > >
> > >         Attachments: CSV-35.patch, commons-csv CSV-35 escapeCRLFOnce
> > test.patch, commons-csv CSV-35 escapeCRLFOnce.patch,
> > mysql-export-line-terminated-by-crlf.csv,
> > mysql-export-line-terminated-by-lf.csv
> > >
> > >
> > > Commons CSV doesn't handle escaped line separators, for example:
> > > {code}
> > > value1;value2;value3a\
> > > value3b
> > > {code}
> > > In this case the expected result is:
> > > {code}["value1", "value2", "value3a\nvalue3b"]{code}
> > > This kind of escaping is produced by MySQL, whether the field enclosing
> > is enabled or not. It's possible to see enclosing quotes and escaped line
> > separators like this:
> > > {code}
> > > "value1";"value2";"value3a\
> > > value3b"
> > > {code}
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v6.2#6252)
> >
> >
> >
> > --
> > http://people.apache.org/~britter/
> > http://www.systemoutprintln.de/
> > http://twitter.com/BenediktRitter
> > http://github.com/britter
> >
>
>
>
> --
> E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
> Java Persistence with Hibernate, Second Edition
> <http://www.manning.com/bauer3/>
> JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
> Spring Batch in Action <http://www.manning.com/templier/>
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
>



-- 
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter

Reply via email to