Re: [Pharo-users] NeoCSV on Irregular Files

Esteban A. Maringolo Wed, 26 Jul 2017 10:07:27 -0700

2017-07-26 13:04 GMT-03:00 Sven Van Caekenberghe <s...@stfx.eu>:
> I agree.
>
> If the file is non-homegeneous it is not longer CSV by definition.
>
> Holding on to the original stream and creating new readers for each section 
> is one option, an other one could be to add a #reset method.
>
> The big question is how to known when one section begins/ends.


In my experience I looked for certain delimiters, like a header row
with the field names.
Oil & Gas telemetry instruments generate outputs like that, like a
concatenation of several CSVs into one, maybe even with a non-csv like
header of 10 rows of data.

What I had to do to deal with that was either:
a) Reading it line by line, buffering the hole "segment" until EOF or
the next delimiter is found, or...
b) Pre-scanning the whole file, and marking start and end positions of
each segment, generating a new readStream with the contents and passed
it the CSV parser (which doesn't care nor know about segments).

> NeoCSVReader holds a one char buffer, so you could peek for something, just 
> maybe. Then you could discover the section switches while parsing (a bit like 
> #atEnd is used from #upToEnd, add a #atSectionEnd). But it all depends on 
> your specific format.

It's harder to do if it is char based, instead of "line" based. Or at
least harder to code.

Regards!

Esteban A. Maringolo

Re: [Pharo-users] NeoCSV on Irregular Files

Reply via email to