Hello David:

* I see a couple of issues. Would you consider a default postgresql.conf with minor tweaks?
Like:
shared_buffers = 600
sort_mem = 1024 (not much higher than 8192 as you seem to have many users)

* Your comments about orphan message hunting on the multiple joins (dbmail-util) makes me wonder if all your indices are intact.

I am running  a big server on 7.3.6 and smaller setups on  8.0.3
The former (7.3.6) is on Intel, not Xeons but a pair of 1.4G Tualatins - 3gig mem but mail doesn't get it all cuz machine does other mem-intensive admin stuff too. The server handles the "alerts" from hundreds of monitored servers on Micromuse Netcool and other Entmgt monitors. Some Admins literally kick the crap out of it with overabundance of senseless yellow alerts :o( they've been asked a billion times to drop. Hundreds of accounts get thousands of little messages and hundreds of big ones every hour. Never been a problem on psql. Thing is bullet-proof-stable. Cron runs minor dbmail-util every 40 minutes and full dbmail-util every 6hrs. It scoots through these pretty quick but I agree, the join on the message orphans is a little rough -- but at worst a few extra minutes, not days.

Questions:
a) In your para(1) you say Dbmail 'master daemon dies' ... which do you mean by that (lmtpd, imapd. pop3d ?) (Is PostgreSQL denying a connection because it is out of memory, I wonder?) b) ...have you traced the connect status between the two servers -- is it consistant? permissions good?.

Troubleshoooting ideas:
1) After 'hiding' your ambitious conf and using a more tame, quasi-default conf, list your indices and see if they are all there, none missing. (I pasted a full set below so you can compare to a working server.)

2) You likely did this: check your psql error log and see what if anything it says the problem is during the time of the dbmail-util run. Also syslog? Also see if you can run 'systat -vmstat' to get a picture of phys and swap mem status while things are haywire (if you have any cpu left at all (. ) Also, if you like, go to trace level 5 for lmtpd and see how it is doing with its database connects (are they consistant?) - it might tell something you didn't know.

3) Another suggestion for database checking: Using the latest DbMail 2.0.4 SVN ~/dbmail_snapshot/sql/postgresql/create_tables.sql, create another (empty) database on your server. Call it dbmail2 or something; compare the schema; and run some tests against it by changing dbmail.conf on the database server. (If you don't have dbmail installed on the database server, it would be a good idea to do so. That way you can run your tools on the DB server instead of across the LAN)

4) Consider a fresh dev rebuild using PGSQL 8+ ... it's quite nice* and a good excuse for a step by step rebuild of your system, using more conservative aproaches to configs until all is up and running well. *The nice stuff in 8+ includes Savepoints, Improved Buffer Management, Checkpoint, Vacuum, Point-In-Time Recovery which remedy your last point: "I'd also be very interested in knowing a better way to... etc"

5) Hardware: (>DB Server - Dual Xeon 3.06GHz, 1GB RAM, SCSI RAID Ultra320 drives.<) You might not have enough memory for your aggressive configuration. Gottabe 'HT Xeons'. (Is PosgreSQL threading across all 4 CPUs? Are any threads going linear on account of a broken network connection or other issue -- this could eat memory and push into swap and even cause broken sequences.)

Memory: Are there two mem banks on the board, one for each CPU? When only one bank is used, that should be CPU0 for most boards. Check manual. I wonder about a memory issue with mismatched Dual Rank x4/x8 400mhz memory or wrong memory for the board. 'HT Xeon' boards can be picky. Run a mem test to make certain the mem sticks are paired. (If you are running a single stick of 1 gig it seldom goes in the first slot. Check ur manual.) What can happen say with a pair of 512s mismatched on a HT Xeon board is that Linux will manage memory well until it must *reuse* phys memory past the first 512 ... it can then have troubles... with bizarre symptoms.

PSQL Indices

dbmail_acl_pkey
dbmail_aliases_alias_idx
dbmail_aliases_alias_low_idx
dbmail_aliases_pkey
dbmail_auto_notifications_pkey
dbmail_auto_replies_pkey
dbmail_idx_ipnumber
dbmail_idx_since
dbmail_mailboxes_name_idx
dbmail_mailboxes_owner_idx
dbmail_mailboxes_owner_name_idx
dbmail_mailboxes_pkey
dbmail_messageblks_physmessage_idx
dbmail_messageblks_physmessage_is_header_idx
dbmail_messageblks_pkey
dbmail_messages_7
dbmail_messages_8
dbmail_messages_mailbox_idx
dbmail_messages_physmessage_idx
dbmail_messages_pkey
dbmail_messages_seen_flag_idx
dbmail_messages_status_idx
dbmail_messages_status_notdeleted_idx
dbmail_messages_unique_id_idx
dbmail_pbsp_pkey
dbmail_physmessage_pkey
dbmail_subscription_pkey
dbmail_users_name_idx
dbmail_users_pkey

Sequences:
dbmail_alias_idnr_seq
dbmail_mailbox_idnr_seq
dbmail_message_idnr_seq
dbmail_messageblk_idnr_seq
dbmail_physmessage_id_seq
dbmail_seq_pbsp_id
dbmail_user_idnr_seq

Hope this helps...
best...
Mike

----- Original Message ----- From: "Niblett, David A" <[EMAIL PROTECTED]>
To: <dbmail@dbmail.org>
Sent: Friday, July 15, 2005 9:07 AM
Subject: [Dbmail] DBMail + PostgreSQL Problem


Hello all,

I'm in serious need of help here.  I'm about at my whits
end of dealing with dbmail and getting it to work.  I've
had a couple of database crashes now and found things like
if I run dbmail-util my DB process load skyrockets and
the server becomes unusable.

I'm hoping that maybe I'm just incapable of tuning PostgreSQL
for performance.  At this point I'd like to know if there is
anyone out there that has experience with dbmail-2.0.4 on
PostgreSQL-7.4.7 in a moderately large table size (9-10GB).
We are very interested in paying for some help in the form of
consulting if need be.  At this point if we can't work out the
bugs then we are going to scrap the entire thing and go back
to our simple Windows based NTMail system.

Some items that seems to happen are:

1) If I stop/reload the postgres database (normal nice stop which
should allow all transactions to finish) the dbmail master daemon
dies and we seem to get a lot of unconnected messages suddenly in
the database.  We see these in the form of no user, no subject
messages in users mailboxes.

2) When dbmail-util runs, the process load just sky rockets on
the db server.  It seems to be related to the large join done
for finding the messageblks that are not connected.

3) When we vacuum the database the process load screams sky high
(like 160+) on the server.  The last time we did this it took 4
days for the vacuum to finish.  We believe we have this fixed by
using the pg_autovacuum daemon.

Our set up is:
DB Server - Dual Xeon 3.06GHz, 1GB RAM, SCSI RAID Ultra320 drives.
DBMail Server - P4 3.06GHz, 1GB RAM, SATA drives.
Running: dbmail-2.0.4, postgresql-7.4.7 on Gentoo with Linux 2.6
kernel

As far as tweaking goes, I've set sort_mem and vacuum_mem on postgres
to each 100M (102400) to help try and stop swaps.  I've also increased
the shared memory limit from 32M to 100M.

I'd also be very interested in knowing a better way to limit the
database transaction logs such that should I suffer a crash I'm not
having to dump the database and restore.  I never really had this
issue with MSSQL.  I expect to lose things like the message that is
being delivered, but not corrupt the dbmail_users table and everything
else.

HELP... TIA

--
David A. Niblett               | email: [EMAIL PROTECTED]
Network Administrator          | Phone: (352) 334-3400
Gainesville Regional Utilities | Web: http://www.gru.net/

_______________________________________________
Dbmail mailing list
Dbmail@dbmail.org
https://mailman.fastxs.nl/mailman/listinfo/dbmail


Reply via email to