On Tue, May 12, 2009 at 7:06 AM, Ow Mun Heng <ow.mun.h...@wdc.com> wrote:
> -----Original Message----- > From: pgsql-general-ow...@postgresql.org [mailto:pgsql-general- > On Tue, May 12, 2009 at 01:23:14PM +0800, Ow Mun Heng wrote: > >> | sum of count | sum_of_count_squared | qty | qty < 100 | qty < 500 | > >> > >> > >> I'm thinking of lumping them into 1 column via an array instead of into > >> 5 different columns. Not sure how to go about this, hence the email to > >> the list. > > >The normal array constructor should work: > > > > SELECT ARRAY[MIN(v),MAX(v),AVG(v),STDEV(v)] > > FROM (VALUES (1),(3),(4)) x(v); > > > >Not sure why this is better than using separate columns though. Maybe a > >new datatype and a custom aggregate would be easier to work with? > > The issue here is the # of columns needed to populate the table. > > The table I'm summarizing has close to between 50 to 100+ columns, if the > 1:5x is used as a yardstick, then the table will get awfully wide quickly. > > I need to know how to do it first, then test accordingly for performance > and > corner cases. > > I apologize for coming into this conversation late. I used to do analysis of a public use data flat file that had one row per patient and up to 24 diagnosis codes, each in a different column. Is this analogous to your situation? I found it was worth the effort to convert the flat file into a relational data model where the patients' diagnosis codes were in one column in a separate table. This model also makes more complex analysis easier. Since there were several types of fields that needed to be combined into their own tables, I found it took less time to convert the flat file to the relational model using a script prior to importing the data into the database server. A Python script would read the original file and create 5 clean, tab-delimited files that were ready to be imported. I hope this helps. Andrew