RE: XML Considered Harmful

Avi Gross via Python-list Thu, 23 Sep 2021 15:02:42 -0700

What you are describing Stephen, is what I meant by emulating a relational 
database with tables.

And, FYI, There is no guarantee that two authors with the same name will not be 
assumed to be the same person.

Besides the lack of any one official CSV format, there are oodles of features I 
have seen that are normally external to the CSV. For example, I have often read 
in data from a CSV or similar, where you could tell the software to consider a 
blank or 999 to mean NA and what denotes a line in the file to be ignored as a 
comment and whether a separator is a space or any combination of whitespace and 
what quotes something so say you can hide a comma and how to handle escapes and 
whether to skip blank lines and more.

Now a really good design might place some metadata into the file that can be 
used to set defaults for things like that or incorporate them into the format 
unambiguously. It might calculate the likely data type for various fields and 
store that in the metadata. So even if you stored rectangular data in a CSV 
file, perhaps the early lines would be in some format that can be read as 
comments and supply some info like the above.

Are any of the CSV variants more like that?

-----Original Message-----
From: Python-list <[email protected]> On 
Behalf Of Stefan Ram
Sent: Thursday, September 23, 2021 5:43 PM
To: [email protected]
Subject: Re: XML Considered Harmful

"Avi Gross" <[email protected]> writes:
>But scientific papers seemingly allow oodles of authors and any time 
>you update the data, you may need yet another column.

  You can use three CSV files: papers, persons, and authors:

  papers.csv

1, "Is the accelerated expansion evidence of a change of signature?"

  persons.csv

1, Marc Mars

  authors.csv

1, 1

  I.e., paper 1 is authored by person 1.

  Now, when we learn that José M. M. Senovilla also is a
  co-author of "Is the accelerated expansion evidence of a
  forthcoming change of signature?", we do only have to add
  new rows, no new colums.

  papers.csv

1, "Is the accelerated expansion evidence of a change of signature?"

  persons.csv

1, "Marc Mars"
2, "José M. M. Senovilla"

  authors.csv

1, 1
1, 2

  The real problem with CSV is that there is no CSV.

  This is not a specific data language with a specific
  specification. Instead it is a vague designation for
  a plethora of CSV dialects, which usually dot not even
  have a specification. Compare this with XML. XML has
  a sole specification managed by the W3C.

--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: XML Considered Harmful

Reply via email to