I agree that we should stop worrying about edge cases and release a version that covers the majority of needs.

Adrian Crum
Sandglass Software
www.sandglass-software.com

On 7/10/2014 9:12 AM, Benedikt Ritter wrote:
2014-07-09 4:15 GMT+02:00 Gary Gregory <garydgreg...@gmail.com>:

We do have a discrepancy between our format class and lexer (which is
hardwired with CR & LF).

Ideally, it seems the lexer should pickup it's set of EOL Strings from the
format.

I recall reading worries of performance issues changing this but either we
support all of the EOL strings including some of the odd ball ones like
Unicode, or we do not. Perhaps we can have an alternate Lexer that takes a
set of EOL strings if performance is really that much worse.


Sounds reasonable, but seems to be a lot of work. Maybe we can just
document that 1.0 can only handle CR & LF and add the ability for more
exotic record separators in 1.1. I'm hoping for higher adoption and more
patches once we have a release on maven central.

Benedikt



Gary


On Mon, Jul 7, 2014 at 1:47 PM, Benedikt Ritter <brit...@apache.org>
wrote:

Any thoughts about this fix? Could be a solution to push out 1.0. If we
come up with a more generic solution afterwards, we can still deprecate
escapeCRLFOnce.

Benedikt

---------- Forwarded message ----------
From: Tillmann Gaida (JIRA) <j...@apache.org>
Date: 2014-06-30 10:36 GMT+02:00
Subject: [jira] [Comment Edited] (CSV-35) Escaped line separators are not
supported
To: brit...@apache.org



     [


https://issues.apache.org/jira/browse/CSV-35?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14047460#comment-14047460
]

Tillmann Gaida edited comment on CSV-35 at 6/30/14 8:34 AM:
------------------------------------------------------------

I added a patch "commons-csv CSV-35 escapeCRLFOnce[ test].patch", which
introduces a CSVFormat setting "escapeCRLFOnce", which enables the
desired
behaviour in Lexer. It is false by default and I did not change
CSVFormat.MYSQL, which might be approprate. I am not exactly happy with
the
naming of the setting. Consider renaming it if you happen to build upon
the
patch.

EDIT: clarity

EDIT: This is a very specific setting. A cleaner solution would probably
be
to allow escaping of record separators by a single escape char. However
it
appears that the MYSQL format uses LF as a record separator, so we would
need to have multiple record separators, which in this case would not be
actual record separators.

I'd argue that CRLF is special enough to have an individual setting, but
I
would also agree with having a cleaner CSVFormat. The only real
alternative
would be having a way to individually specify character sequences and a
replacement if they are preceded by the escape char.


was (Author: tillmann gaida):
I added a patch "commons-csv CSV-35 escapeCRLFOnce[ test].patch", which
introduces a CSVFormat setting "escapeCRLFOnce", which enables the
desired
behaviour in Lexer. It is false by default and I did not change
CSVFormat.MYSQL, which might be approprate. I am not exactly happy with
the
naming of the setting. Consider renaming it if you happen to build upon
the
patch.

EDIT: clarity

Escaped line separators are not supported
-----------------------------------------

                 Key: CSV-35
                 URL: https://issues.apache.org/jira/browse/CSV-35
             Project: Commons CSV
          Issue Type: Bug
            Reporter: Emmanuel Bourg
             Fix For: 1.0

         Attachments: CSV-35.patch, commons-csv CSV-35 escapeCRLFOnce
test.patch, commons-csv CSV-35 escapeCRLFOnce.patch,
mysql-export-line-terminated-by-crlf.csv,
mysql-export-line-terminated-by-lf.csv


Commons CSV doesn't handle escaped line separators, for example:
{code}
value1;value2;value3a\
value3b
{code}
In this case the expected result is:
{code}["value1", "value2", "value3a\nvalue3b"]{code}
This kind of escaping is produced by MySQL, whether the field enclosing
is enabled or not. It's possible to see enclosing quotes and escaped line
separators like this:
{code}
"value1";"value2";"value3a\
value3b"
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)



--
http://people.apache.org/~britter/
http://www.systemoutprintln.de/
http://twitter.com/BenediktRitter
http://github.com/britter




--
E-Mail: garydgreg...@gmail.com | ggreg...@apache.org
Java Persistence with Hibernate, Second Edition
<http://www.manning.com/bauer3/>
JUnit in Action, Second Edition <http://www.manning.com/tahchiev/>
Spring Batch in Action <http://www.manning.com/templier/>
Blog: http://garygregory.wordpress.com
Home: http://garygregory.com/
Tweet! http://twitter.com/GaryGregory





---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org
For additional commands, e-mail: dev-h...@commons.apache.org

Reply via email to