Hi,

I am new to the mailing list (as a writer ... reading since a long time), but 
am using PostgreSQL in our company for quite some time now.
Lately, we rolled out some new servers with vastly improved hardware (8 disk 
RAID 1+0, instead of old 2 disk RAID 1 etc. etc.), new OS (Ubuntu 10.04 server 
instead of Debian Lenny) and newer PostgreSQL version (8.4.4 instead of 8.3.8). 
And everything seemed to be much better - as expected - until, well until I 
have seen the first full backup run-times.
For example:

-          One of our old machines needed approximately 2 ½ hours to make a 
dump (with "-Fc -compress=9") of our approximately 100 GByte DB.

-          One of our new machines with the copy of the same 100 GByte DB 
needed 6 ½ to 7 hours to dump the same DB.

First check of our MRTG monitoring showed, that during the backup, the machine 
was not saturated in any way by load (IO Wait were below 5%), CPUs were almost 
idling so you got the feeling, that machine was sleeping ... But of course it 
did the backup, only that it was very slow.

No I further investigated. I also did some detailed monitoring with collect 
(http://collectl.sourceforge.net/), seeing, that the machine often was doing 
some slow reading and every 20-30 seconds wrote down a few Mbytes on the backup 
disk. So, almost everything suggests that we are very very far from some 
hardware limitations. I also checked some file write/ reads on the RAID, and 
could confirm, that plain DD of some multy GByte file only took a snap.

OK, then I checked what PG_DUMP was doing. In the process list of PostgreSQL 
(viewed via pgAdmin), I see how PG_DUMP works (didn't care much about it until 
now). In does a COPY to STDOUT. Now I wanted to narrow down the problem, and 
tried to reproduce that COPY statement of one of our 8 GByte DBs (that table 
only has INT, BIGINT, BOOLEAN fields ! So no potential BLOB problem!). And now 
comes the interesting part.

-          When I did "COPY public.bigtable (column1, column2, column3) TO 
'/var/tmp/test.dump';" it is FAST. It writes down the dump file of 3 GBytes in 
about 2 ½ minutes (as expected with the hardware)!

-          When I did "time sudo -u postgres /var/software/pgsql1/bin/psql -h 
/tmp/pgsql1 -p 4306 -d  database_name -f copy-test.sql > /var/tmp/test.dump" 
(and copy-test.sql has : "COPY public.cachereleaseartist (releasebyterritoryid, 
artistid, isdisplayartist, artistroleclusterid, artistroletypeid, 
iscontainedinrelease, iscontainedinresource) TO STDOUT;" ) ... I couldn't wait 
until it ended .. after 20 minutes the test.dump file was merely at 1 Gbyte 
(far from the target size of 3 Gbyte).
I also monitored both statements via collect, and could confirm that the direct 
COPY to file made some reasonable IO activity, while the COPY via STDOUT was 
idling like mad, with some occasional bursts and falling asleep again. This 
would also make clear, why PG_DUMP is so dog slow. If it gets its data via 
STDOUT at that speed, then it falls asleep too ....

And to make things even worse: I did the same test on our old hardware too. And 
BOTH, COPY to file directly AND COPY via STDOUT war equally fast (took 5 ½ 
minutes ... but OK, that's an old iron compared to the new one)!!! No 
difference between file and STDOUT (as expected)!

Now I ask, whats going on here ???? Why is COPY via STDOUT so much slower on 
out new machine?

-          Has anything changed between PostgreSQL 8.3.8 and PostgreSQL 8.4.4 
in this respect? Maybe some STDOUT buffer flushing from COPY? Buffer handling? 
Buffer size???

-          Is maybe the OS to blame? As I told, we changed from Debian Lenny 
(Kernel 2.6.26-2) to Ubuntu 10.04 Server (Kernel 2.6.32-22) (both 64-bit)? And 
if yes, would there be ways for PostgreSQL to adapt to the new OSs new behavior 
in STDOUT (if there is such thing)???

And for the Hardware:

-          Old machine was an 8 Core with 16 GByte RAM BUT only one RAID 1 
(software!) with two disks for everything (I know, that was a little bit 
meager).

-          New machine has 8 Core (with hyper-threading 16), 24 GByte RAM, and 
a RAID 1+0 with 8 disks for Data only, and XLOG/Backup/OS on  a second RAID 1 
with 2 disks, with hardware raid controller and battery backed cache  (so, 
obviously, this machine should be faster than the old one)

Can someone shed some light on this STDOUT madness? Or give me some 
directions/hints in which I could further research?

Thank you very much!

Andras Fabian
Atrada Trading Network AG

Reply via email to