What you are describing Stephen, is what I meant by emulating a relational database with tables.
And, FYI, There is no guarantee that two authors with the same name will not be assumed to be the same person. Besides the lack of any one official CSV format, there are oodles of features I have seen that are normally external to the CSV. For example, I have often read in data from a CSV or similar, where you could tell the software to consider a blank or 999 to mean NA and what denotes a line in the file to be ignored as a comment and whether a separator is a space or any combination of whitespace and what quotes something so say you can hide a comma and how to handle escapes and whether to skip blank lines and more. Now a really good design might place some metadata into the file that can be used to set defaults for things like that or incorporate them into the format unambiguously. It might calculate the likely data type for various fields and store that in the metadata. So even if you stored rectangular data in a CSV file, perhaps the early lines would be in some format that can be read as comments and supply some info like the above. Are any of the CSV variants more like that? -----Original Message----- From: Python-list <python-list-bounces+avigross=verizon....@python.org> On Behalf Of Stefan Ram Sent: Thursday, September 23, 2021 5:43 PM To: python-list@python.org Subject: Re: XML Considered Harmful "Avi Gross" <avigr...@verizon.net> writes: >But scientific papers seemingly allow oodles of authors and any time >you update the data, you may need yet another column. You can use three CSV files: papers, persons, and authors: papers.csv 1, "Is the accelerated expansion evidence of a change of signature?" persons.csv 1, Marc Mars authors.csv 1, 1 I.e., paper 1 is authored by person 1. Now, when we learn that José M. M. Senovilla also is a co-author of "Is the accelerated expansion evidence of a forthcoming change of signature?", we do only have to add new rows, no new colums. papers.csv 1, "Is the accelerated expansion evidence of a change of signature?" persons.csv 1, "Marc Mars" 2, "José M. M. Senovilla" authors.csv 1, 1 1, 2 The real problem with CSV is that there is no CSV. This is not a specific data language with a specific specification. Instead it is a vague designation for a plethora of CSV dialects, which usually dot not even have a specification. Compare this with XML. XML has a sole specification managed by the W3C. -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list