Very interesting points. Thanks for the documentation link and the point about 
alignment.

As a test of Tom's suggestion to group smallints together to avoid alignment 
problems, I changed the column order from
smallint, date, smallint, integer, real TO
smallint, smallint, date, integer, real

This resulted in a "Small" table that is 15% smaller than the original "Small" 
table (414MB vs 487MB). I wasn't aware that tables could be optimized by 
switching column order like this. This could be a good note to make in either 
the "Data Types" portion of the PostgreSQL manual or the "Performance 
Optimization" portion of the PostgreSQL Wiki.

One might even imagine a future version of PostgreSQL using an efficient disk 
layout that may not match the table layout in order to avoid wasted space from 
padding. I suppose this already happens to some extent with the different 
storage modes (plain, extended, external).

Steve also correctly pointed out that my text string probably only takes up 3-4 
bytes rather than 7 bytes. Meaning the "Small" table uses only 7-8 bytes/row 
less than the "Big" table. For 10M rows, the expected savings should be about 
70-80MB. This matches the 79MB of savings I see once I reorder the columns 
following Tom's suggestion. All is right in the world!

Thanks to all,
Robert


SC> More questions than answers:
SC> What version of PostgreSQL?

9.1.1 (I should have mentioned that)

SC> How are your determining the space used by a table?

\d+

SC> Why are you assuming 7 bytes for a 3-character value? (Character values 
SC> up to 126 characters long only have 1-character overhead.)

You are correct. I read the wrong documentation (docs for 8.2 says overhead is 
4 bytes, but docs for my 9.1 version says overhead is only one byte for small 
strings). So I believe I should assume only 4 bytes of total usage here, maybe 
less if it's compressed.

SC> What is the fill-factor on the tables? (Should default to 100% but don't 
SC> know how you are configured.)

I'm using 100%, but it shouldn't matter as I was giving sizes without the index 
anyway.

SC> Do the tables have OIDs or not?

No

-----Original Message-----
From: Tom Lane [mailto:t...@sss.pgh.pa.us] 
Sent: Tuesday, July 24, 2012 10:00 PM
To: Adrian Klaver
Cc: McGehee, Robert; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Smaller data types use same disk space

Adrian Klaver <adrian.kla...@gmail.com> writes:
> On 07/24/2012 03:21 PM, McGehee, Robert wrote:
>> I've created two tables labeled "Big" and "Small" that both store the same 
>> 10 million rows of data using 493MB and 487MB of disk space respectively. 
>> The difference is that the "Big" table uses data types that take up more 
>> space (integer rather than smallint, float rather than real, etc). The "Big" 
>> table should need about 27 bytes/row versus 16 bytes/row for the "Small" 
>> table, indicating to me that the "Big" table should be 70% bigger in actual 
>> disk size. In reality, it's only 1% bigger or 6MB (after clustering, 
>> vacuuming and analyzing). Why is this? Shouldn't the "Small" table be about 
>> 110MB smaller (11 bytes for 10 million rows)? I'm estimating table size with 
>> \d+

> See here for the gory details:
> http://www.postgresql.org/docs/9.1/interactive/storage-page-layout.html

> See in particular:
> Table 55-4. HeapTupleHeaderData Layout
>  From the text:
> ""All table rows are structured in the same way. There is a fixed-size 
> header (occupying 23 bytes on most machines.." which breaks you 
> assumption of the Big/Small row size comparison.

Aside from the tuple header, there are alignment considerations that you
have to allow for.  So for instance, if your columns are int, smallint,
int, that will take the same amount of space as 3 ints, because the
savings disappears into alignment of the third int.  You need two
adjacent smallints to get any benefit.

                        regards, tom lane

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to