Mike, I'm basically doing most all of what you suggest. I'm going to start over with a new build, go to 8.0.3 on postgres and probably 2.0.4-svn.
We did find a duplex mis-match between the dbmail server and the database. So that could be part of the issue. I don't think the Xeon's that I have are HT. The memory is set up as (2) 512M chips one in each bank. -- David A. Niblett | email: [EMAIL PROTECTED] Network Administrator | Phone: (352) 334-3400 Gainesville Regional Utilities | Web: http://www.gru.net/ -----Original Message----- From: M. J. [Mike] O'Brien [mailto:[EMAIL PROTECTED] Sent: Saturday, July 16, 2005 4:44 AM To: DBMail mailinglist Subject: [Dbmail] DBMail + PostgreSQL Problem Hello David: * I see a couple of issues. Would you consider a default postgresql.conf with minor tweaks? Like: shared_buffers = 600 sort_mem = 1024 (not much higher than 8192 as you seem to have many users) * Your comments about orphan message hunting on the multiple joins (dbmail-util) makes me wonder if all your indices are intact. I am running a big server on 7.3.6 and smaller setups on 8.0.3 The former (7.3.6) is on Intel, not Xeons but a pair of 1.4G Tualatins - 3gig mem but mail doesn't get it all cuz machine does other mem-intensive admin stuff too. The server handles the "alerts" from hundreds of monitored servers on Micromuse Netcool and other Entmgt monitors. Some Admins literally kick the crap out of it with overabundance of senseless yellow alerts :o( they've been asked a billion times to drop. Hundreds of accounts get thousands of little messages and hundreds of big ones every hour. Never been a problem on psql. Thing is bullet-proof-stable. Cron runs minor dbmail-util every 40 minutes and full dbmail-util every 6hrs. It scoots through these pretty quick but I agree, the join on the message orphans is a little rough -- but at worst a few extra minutes, not days. Questions: a) In your para(1) you say Dbmail 'master daemon dies' ... which do you mean by that (lmtpd, imapd. pop3d ?) (Is PostgreSQL denying a connection because it is out of memory, I wonder?) b) ...have you traced the connect status between the two servers -- is it consistant? permissions good?. Troubleshoooting ideas: 1) After 'hiding' your ambitious conf and using a more tame, quasi-default conf, list your indices and see if they are all there, none missing. (I pasted a full set below so you can compare to a working server.) 2) You likely did this: check your psql error log and see what if anything it says the problem is during the time of the dbmail-util run. Also syslog? Also see if you can run 'systat -vmstat' to get a picture of phys and swap mem status while things are haywire (if you have any cpu left at all (. ) Also, if you like, go to trace level 5 for lmtpd and see how it is doing with its database connects (are they consistant?) - it might tell something you didn't know. 3) Another suggestion for database checking: Using the latest DbMail 2.0.4 SVN ~/dbmail_snapshot/sql/postgresql/create_tables.sql, create another (empty) database on your server. Call it dbmail2 or something; compare the schema; and run some tests against it by changing dbmail.conf on the database server. (If you don't have dbmail installed on the database server, it would be a good idea to do so. That way you can run your tools on the DB server instead of across the LAN) 4) Consider a fresh dev rebuild using PGSQL 8+ ... it's quite nice* and a good excuse for a step by step rebuild of your system, using more conservative aproaches to configs until all is up and running well. *The nice stuff in 8+ includes Savepoints, Improved Buffer Management, Checkpoint, Vacuum, Point-In-Time Recovery which remedy your last point: "I'd also be very interested in knowing a better way to... etc" 5) Hardware: (>DB Server - Dual Xeon 3.06GHz, 1GB RAM, SCSI RAID Ultra320 drives.<) You might not have enough memory for your aggressive configuration. Gottabe 'HT Xeons'. (Is PosgreSQL threading across all 4 CPUs? Are any threads going linear on account of a broken network connection or other issue -- this could eat memory and push into swap and even cause broken sequences.) Memory: Are there two mem banks on the board, one for each CPU? When only one bank is used, that should be CPU0 for most boards. Check manual. I wonder about a memory issue with mismatched Dual Rank x4/x8 400mhz memory or wrong memory for the board. 'HT Xeon' boards can be picky. Run a mem test to make certain the mem sticks are paired. (If you are running a single stick of 1 gig it seldom goes in the first slot. Check ur manual.) What can happen say with a pair of 512s mismatched on a HT Xeon board is that Linux will manage memory well until it must *reuse* phys memory past the first 512 ... it can then have troubles... with bizarre symptoms. PSQL Indices dbmail_acl_pkey dbmail_aliases_alias_idx dbmail_aliases_alias_low_idx dbmail_aliases_pkey dbmail_auto_notifications_pkey dbmail_auto_replies_pkey dbmail_idx_ipnumber dbmail_idx_since dbmail_mailboxes_name_idx dbmail_mailboxes_owner_idx dbmail_mailboxes_owner_name_idx dbmail_mailboxes_pkey dbmail_messageblks_physmessage_idx dbmail_messageblks_physmessage_is_header_idx dbmail_messageblks_pkey dbmail_messages_7 dbmail_messages_8 dbmail_messages_mailbox_idx dbmail_messages_physmessage_idx dbmail_messages_pkey dbmail_messages_seen_flag_idx dbmail_messages_status_idx dbmail_messages_status_notdeleted_idx dbmail_messages_unique_id_idx dbmail_pbsp_pkey dbmail_physmessage_pkey dbmail_subscription_pkey dbmail_users_name_idx dbmail_users_pkey Sequences: dbmail_alias_idnr_seq dbmail_mailbox_idnr_seq dbmail_message_idnr_seq dbmail_messageblk_idnr_seq dbmail_physmessage_id_seq dbmail_seq_pbsp_id dbmail_user_idnr_seq Hope this helps... best... Mike ----- Original Message ----- From: "Niblett, David A" <[EMAIL PROTECTED]> To: <dbmail@dbmail.org> Sent: Friday, July 15, 2005 9:07 AM Subject: [Dbmail] DBMail + PostgreSQL Problem > Hello all, > > I'm in serious need of help here. I'm about at my whits > end of dealing with dbmail and getting it to work. I've > had a couple of database crashes now and found things like > if I run dbmail-util my DB process load skyrockets and > the server becomes unusable. > > I'm hoping that maybe I'm just incapable of tuning PostgreSQL for > performance. At this point I'd like to know if there is anyone out > there that has experience with dbmail-2.0.4 on PostgreSQL-7.4.7 in a > moderately large table size (9-10GB). We are very interested in paying > for some help in the form of consulting if need be. At this point if > we can't work out the bugs then we are going to scrap the entire thing > and go back to our simple Windows based NTMail system. > > Some items that seems to happen are: > > 1) If I stop/reload the postgres database (normal nice stop which > should allow all transactions to finish) the dbmail master daemon dies > and we seem to get a lot of unconnected messages suddenly in the > database. We see these in the form of no user, no subject messages in > users mailboxes. > > 2) When dbmail-util runs, the process load just sky rockets on the db > server. It seems to be related to the large join done for finding the > messageblks that are not connected. > > 3) When we vacuum the database the process load screams sky high (like > 160+) on the server. The last time we did this it took 4 days for the > vacuum to finish. We believe we have this fixed by using the > pg_autovacuum daemon. > > Our set up is: > DB Server - Dual Xeon 3.06GHz, 1GB RAM, SCSI RAID Ultra320 drives. > DBMail Server - P4 3.06GHz, 1GB RAM, SATA drives. > Running: dbmail-2.0.4, postgresql-7.4.7 on Gentoo with Linux 2.6 > kernel > > As far as tweaking goes, I've set sort_mem and vacuum_mem on postgres > to each 100M (102400) to help try and stop swaps. I've also increased > the shared memory limit from 32M to 100M. > > I'd also be very interested in knowing a better way to limit the > database transaction logs such that should I suffer a crash I'm not > having to dump the database and restore. I never really had this > issue with MSSQL. I expect to lose things like the message that is > being delivered, but not corrupt the dbmail_users table and everything > else. > > HELP... TIA > > -- > David A. Niblett | email: [EMAIL PROTECTED] > Network Administrator | Phone: (352) 334-3400 > Gainesville Regional Utilities | Web: http://www.gru.net/ > > _______________________________________________ > Dbmail mailing list > Dbmail@dbmail.org https://mailman.fastxs.nl/mailman/listinfo/dbmail > _______________________________________________ Dbmail mailing list Dbmail@dbmail.org https://mailman.fastxs.nl/mailman/listinfo/dbmail