[CSV] Thoughs on CSV-293 - Add support for multiple null String values

Dávid Szigecsán Wed, 14 Aug 2024 00:38:55 -0700

Hi All,

There is a JIRA feature request (
https://issues.apache.org/jira/browse/CSV-293) for supporting multiple null
strings, not just one.


To be honest, I did not really know much about the mentioned CSVW
definitions, but based on the comment in the ticket "Feel free to provide a
PR on GitHub" and the requested change seemed not a big deal to me. So I
created the PR (https://github.com/apache/commons-csv/pull/456).

After I created it, things have changed. I got a comment on github
from Gary to check the comment in JIRA.
He explained two concerns about the feature. So I started to dig into the
topic and learn a bit about the CSVW.

I think Gary has a point. This particular feature is far too little to
support CSVW.
As I understand it, CSVW introduces a lot of (let's say) validation rules
(in a json metadata file) that are not really needed for a "simple" CSV
parser.
E.g.:
- it introduces a feature to represent data like relational databases with
multiple CSV files and calls them TableGroup.
- the TableGroup contains multiple tables (CSV files) which are in
relation. There are keys (ForeignKey, PrimaryKey).
- there are Data Types (string as a base, but there are multiple number
integer, non negative integer, float, boolean, date, datetime, etc.)
- fields can be required
And much more.

So the feature to support multiple null values is not enough for supporting
CSVW at all. And I agree with Gary. I can't see the real life use case
where multiple null values were used in a simple single CSV file.
Maybe in a tableGroup with multiple CSV files in a CSVW dataset it is
useful (because the different files could contain different null values),
but not in the level of a CSV file parser handles a single file.

What do you think?

Regards,
David

[CSV] Thoughs on CSV-293 - Add support for multiple null String values

Reply via email to