Hi Sven,

my use case was hand-edited CSVs (therefore the quotes are extra clutter), 
which imples that I would be hand-viewing/editing only small CSVs (no 
performance issues).

I agree that we should err on the safe side with CR & LF (e.g. tools may 
sometimes autoconvert line endings).

Regarding performance:

#findFirstInString:inSet:startingAt: didn't work for me (not sure if bug, or I 
don't know how to use), so I've trried with inCharacterSet:

Tested on worst-case scenario - strings don't contain tested symbols.

s10 := 'a' repeat: 10.
s100 := 'a' repeat: 100.

[ {'"'. ','. String cr. String lf } anySatisfy: [ :each | s10 
includesSubstring: each ] ] bench. "'1,200,046 per second'"
[ {'"'. ','. String cr. String lf } anySatisfy: [ :each | s100 
includesSubstring: each ] ] bench. "'495,482 per second'"

[ s10 includesAny: { $,. $". Character cr. Character lf } ] bench. "'2,819,416 
per second'"
[ s100 includesAny: { $,. $". Character cr. Character lf } ] bench. "'2,200,668 
per second'"

[ ByteString findFirstInString: s10 inCharacterSet: ',"', String crlf 
startingAt: 1 ] bench. "'1,187,324 per second'"
[ ByteString findFirstInString: s100 inCharacterSet: ',"', String crlf 
startingAt: 1 ] bench. "'165,526 per second'"


#includesAny: seems to be the best by far.

Storing the tested characters didn't improve it by much.

Peter

On Sat, Jul 22, 2017 at 06:51:31PM +0200, Sven Van Caekenberghe wrote:
> Hi Peter,
> 
> > On 22 Jul 2017, at 14:12, Peter Uhnak <i.uh...@gmail.com> wrote:
> > 
> > Hi,
> > 
> > this is a continuation of an older thread about quoting fields only when 
> > necessary. ( 
> > http://forum.world.st/NeoCSVWriter-automatic-quotes-td4924781.html )
> > 
> > I've attached fileouts of NeoCSV packages with the addition (I don't know 
> > if Metacello can file-out only changesets).
> > 
> > The change doesn't affect the default behavior, only when explicitly 
> > requested.
> > 
> > Peter
> 
> I accepted your changes as such, the .mcz's were copied over to the main 
> repositories. This is a pure & clean extension, so that is good. Thank you.
> 
> This option is always going to be slower, but the current implementation 
> might be improved, I think.
> 
> The key test in #writeOptionalQuotedField:
> 
> {
>   lineEnd asString.
>   separator asString.
>   '"' } anySatisfy: [ :each | string includesSubstring: each ]
> 
> will be quite slow. 
> 
> If we accept a little bit of (over safe) error on EOL and use any occurrence 
> of CR or LF as needing a quote, we could switch to characters to test for. 
> There exists a fast (primitive) test, #findFirstInString:inSet:startingAt: 
> that can do all the testing in one go. If your version turns out to be slow, 
> we could try that, if measurements show a difference.
> 
> Regards,
> 
> Sven 
> 
> 

Reply via email to