Niblett, David A wrote:
I'm in serious need of help here. I'm about at my whits end of dealing with dbmail and getting it to work. I've had a couple of database crashes now and found things like if I run dbmail-util my DB process load skyrockets and the server becomes unusable.
This sounds like a problem with dbmail-util not handling a database problem very well. Hopefully this can be found and fixed.
I'm hoping that maybe I'm just incapable of tuning PostgreSQL for performance. At this point I'd like to know if there is anyone out there that has experience with dbmail-2.0.4 on PostgreSQL-7.4.7 in a moderately large table size (9-10GB). We are very interested in paying for some help in the form of consulting if need be. At this point if we can't work out the bugs then we are going to scrap the entire thing and go back to our simple Windows based NTMail system.
Don't do that! DBMail is quite nice once you have it running well for you. I run Dbmail 2.0.4 against Postgresql 8.0.x, I used to run it against 7.4.x, for DBMail's purposes, they are mostly the same. My database size is close to your, but it sounds like I have far fewer simultaneous users.
Some items that seems to happen are: 1) If I stop/reload the postgres database (normal nice stop which should allow all transactions to finish) the dbmail master daemon dies and we seem to get a lot of unconnected messages suddenly in the database. We see these in the form of no user, no subject messages in users mailboxes.
It sounds to me like the problem has to do with the fact that DBMail does not use transaction to maintain database consistency. This is a hold over from the fact that they use MySQL, personally I think this is one of DBMail's biggest failings.
2) When dbmail-util runs, the process load just sky rockets on the db server. It seems to be related to the large join done for finding the messageblks that are not connected.
Can you give more detail here? The load on the server goes up? There are more active postgres connections? What process is actually using the CPU, dbmail-util or postgres?
3) When we vacuum the database the process load screams sky high (like 160+) on the server. The last time we did this it took 4 days for the vacuum to finish. We believe we have this fixed by using the pg_autovacuum daemon.
What is the load on the server normally, and how many concurrent users do you actually have running? Vacuum can be costly, but it shouldn't be that bad, especially with a server like the one you have. Have you tried any of the vacuum delay settings? (BTW, I wrote pg_autovacuum :-)
Our set up is: DB Server - Dual Xeon 3.06GHz, 1GB RAM, SCSI RAID Ultra320 drives. DBMail Server - P4 3.06GHz, 1GB RAM, SATA drives. Running: dbmail-2.0.4, postgresql-7.4.7 on Gentoo with Linux 2.6 kernel
Dual Xeon tends not to be the best PostgreSQL platform, it has some heavy context switching problems, but even so, I would expect this server to be just fine.
As far as tweaking goes, I've set sort_mem and vacuum_mem on postgres to each 100M (102400) to help try and stop swaps. I've also increased the shared memory limit from 32M to 100M.
100M of sort mem might be way too high. Remember that sort mem is limited on a per process basis whereas shared memory is allocated on a per cluster basis, meaning if you have 20 Imap clients all doing a sort at the same time you could be asking postgresql to allocate as much as 2G of memory. You might try lowering that value and seeing what happens.
I'd also be very interested in knowing a better way to limit the database transaction logs such that should I suffer a crash I'm not having to dump the database and restore. I never really had this issue with MSSQL. I expect to lose things like the message that is being delivered, but not corrupt the dbmail_users table and everything else.
If you are running on stable hardware, you should never have to dump reload postgresql. Again I think the problem is that DBMail is not using transaction to ensure database consistency.
Matt