On 03/11/13 20:37, David Rowley wrote:
I've just been looking at how alignment of columns in tuples can make
the tuple larger than needed.
I created 2 tables... None of which are very real world, but I was
hunting for the extremes here...
The first table contained an int primary key and then a total of 10
int columns and 10 boolean columns, I placed one boolean column after
an int column so that it was int,bool, int bool, etc
With the 2nd table I had all the ints first then all the booleans at
the end of the table. I then inserted 1 million records per table and
checked the sizes of each table.
The first table was 112 MB and the 2nd table was 81MB, so naturally
there is a difference when it comes to running queries on these tables.
postgres=# select sum(Value1) from test1;
sum
--------------
500000500000
(1 row)
Time: 239.306 ms
postgres=# select sum(Value1) from test2;
sum
--------------
500000500000
(1 row)
Time: 186.926 ms
So in this example a full scan and aggregate of a single column is 28%
faster.
I'm sure in the real world there are many cases where a better choice
in column ordering would save space and save processing times, but is
this something that we want to leave up to our users?
I've not yet looked at the code to see how hard implementing
separation of column physical order and logical order would be. I
really just want to get an idea of what the thoughts would be on such
a change.
I would imagine it should be possible to have a function which
optimises column orders which is run when a table is created or
rewritten. New columns would still go onto the end of the tuple unless
the table had to be rewritten and in this case the column order would
be optimised again. All plays where column names were displayed
without explicit ordering, e.g select * and in psql the catalog could
be queried to see which order these columns should be displayed in.
For reference I've attached the script I used for testing this.
I'd like to implement this as a project, but before I start any work
on it I'd just like to find out other people's thoughts on it.
Regards
David Rowley
I think the system should PHYSICALLY store the columns in the most space
efficient order, and have a facility for mapping to & from the LOGICAL
order - so that users & application developers only have worry about the
logical order. Even system programers would normally not have to be
concerned with the physical order. I am a little surprised that this is
not already done, to be honest.
Cheers,
Gavin