On 16/06/16 16:39, Shaun Cutts wrote:
The opinion of database developers is that using more than 1664 columns is bad
design, so that the current maximum number of columns is not onerous.
When I simply built applications on top of databases, I was of the same opinion.
However, now my job is to perform some sort of analysis on data from elsewhere.
Over the course of a couple weeks I need to do something with data that often is
structured into tables of many thousands of columns. Figuring out what the
appropriate form is part of manipulating the data — something for which I’d
like to use postgres. My applications, such as they are, handle metadata not
data.
Operations such as crosstab or json_populate_record can easily create rows with
many more than 1664 fields. I also use data such as census data — with many
different survey fields all applying to geographic areas. For a given
application only some are relevant, but ideally I’d create a materialized view
for a given project with relevant data. In the best case, the “library” version
of the dataset would have just a few very wide tables. Storing in json is
possible but inconvenient, slower and means that often type info must be stored
separately.
In short, IMHO the database shouldn’t force me to structure my data before I
understand it, and being able to query it is how I come to understand it. At
some scale, practicality dictates that one needs to take “special measures” to
handle large volumes of data — large in breadth as well as width. But this
boundary should be on the order of millions of columns, not thousands.
Is there a reason besides “its bad design” to disallow tables with many columns?
Not had to deal with extremely fat database tables.
However, back when I was a COBOL programmer: one system had a very big
record type, more than 10 times bigger than anything else I'd ever dealt
with. It was an extreme pain to debug problems with such a large record
(try poring over hex dumps on lineflow!) - so I expect it would be
painful for a large database table.
Could you spread the huge number of columns into a number or database
tables with some sort of logical grouping?
Cheers,
Gavin
--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general