Problem: vpopmail authentications failing randomly
grep vchkpw /var/log/maillog | grep -v success | grep dlb May 24 11:45:03 mail01 vpopmail[40833]: vchkpw-pop3: vpopmail user not found [EMAIL PROTECTED]:67.92.111.22 May 24 11:50:03 mail01 vpopmail[41401]: vchkpw-pop3: vpopmail user not found [EMAIL PROTECTED]:67.92.111.22 May 24 11:55:04 mail01 vpopmail[42117]: vchkpw-pop3: vpopmail user not found [EMAIL PROTECTED]:67.92.111.22 May 24 12:00:04 mail01 vpopmail[42735]: vchkpw-pop3: vpopmail user not found [EMAIL PROTECTED]:67.92.111.22 May 24 12:50:06 mail01 vpopmail[51623]: vchkpw-pop3: vpopmail user not found [EMAIL PROTECTED]:67.92.111.22 May 24 12:55:07 mail01 vpopmail[52208]: vchkpw-pop3: vpopmail user not found [EMAIL PROTECTED]:67.92.111.22 May 24 13:00:06 mail01 vpopmail[52799]: vchkpw-pop3: vpopmail user not found [EMAIL PROTECTED]:67.92.111.22 May 24 13:20:16 mail01 vpopmail[55953]: vchkpw-pop3: vpopmail user not found [EMAIL PROTECTED]:67.92.111.22 May 24 13:48:23 mail01 vpopmail[13650]: vchkpw-pop3: vpopmail user not found [EMAIL PROTECTED]:67.92.111.22 These are happening for various accounts, on a seemingly random basis. Out about 4,000 authentications today, it failed 100 times. qmail-pop3d is run as follows: exec softlimit -m 4096000 tcpserver -H -R -c50 0 pop3 qmail-popup mail01.example.net vchkpw qmail-pop3d Maildir 2>&1 System Specs: Pentium III 1.0GHz - 1.0GB RAM 36GB ATA disk FreeBSD 4.10-stable MySQL 4.0.24 with linuxthreads, running locally Vpopmail 5.4.2-5.4.10 ~400 users. Previous versions of MySQL were not compiled with linuxthreads but this problem existed then as well. The my.cnf file is based on my- medium.cnf, with two tweaks appropriate for this system (skip-innodb, bin-log disabled). I have used vpopmail versions 5.4.2, 5.4.8 and 5.4.10 and all exhibit this problem. I also see this error in the maillogs on occasion: delivery 14578: failure: vmysql:_sql_error [1]:_Can't_create_database_'vpopmail'._Database_exists/ vmysql:_sql_error[3]:_No_Database_Selected/ Sorry,_no_mailbox_here_by_that_name._vpopmail_(#5.1.1)/ delivery 14641: failure: vmysql:_sql_error [1]:_Can't_create_database_'vpopmail'._Database_exists/ vmysql:_sql_error[3]:_No_Database_Selected/ Sorry,_no_mailbox_here_by_that_name._vpopmail_(#5.1.1)/ There are no errors reported in MySQL's .err log. There are no other related errors reported in the system logs. The MySQL load is quite light, with less than one query executed every few seconds. The system load is also fairly light, mostly hovering between 0.4 and 0.5. All the symptoms are indicative of vpopmail having an issue with MySQL, so I set up a little perl script to test for me while I tried to replicate the problem. #!/usr/bin/perl my $email = $1 || '[EMAIL PROTECTED]'; my $limit = $2 || 1000; print "checking for $email $limit times (each . = success) "; for ( my $i = 0; $i < $limit; $i++){ my $dir =`~vpopmail/bin/vuserinfo -d $email`; chomp $dir; -d $dir ? print "." : print "$i fail\n"; sleep 1; }; So, I run this script in one terminal window while trying to trigger the problem in another. As near as I can tell, the problem is always during times when the system is busy. I cannot replicate this problem on any of my own servers. To help narrow down the problem, I put the system under sustained heavy load (make buildworld). As expected, I get frequent authentication and test (~45%) failures. So, there is certainly a problem with vpopmail and it's MySQL interaction. Again, this is with versions including 5.4.2, 5.4.8, and 5.4.10. I've seen random cases of this but I've never had an instance where it was repeatable. In such a case, shouldn't the vpopmail programs be returning a error indicative of the problem instead of a "user not found" error? A "user not found" error is not even close to accurately describing what the problem is. During this heavy load testing, I wanted to see if the issue was MySQL or vpopmail. To test this I ran another MySQL client and see if it too has problems interacting with the MySQL server. This is another C program, also compiled to access the vpopmail database and do a query. I also ran this program simultaneously with the test script and the "make buildworld". While the vpopmail test script was failing every other authentication, the other program succeeded, every single time, never failing once. So, I turned to vmysql.c and noticed a timeout setting there that affects the mysql connection timeout. I bumped it up from 2 to 5, and it has reduced the failure frequency but it's still happening fairly regularly. There are some slow queries in MySQL, but only 15 from the last day, so that doesn't even closely correspond with 300 "no found" errors. So, anyone got ideas on how to debug this issue further? Matt `````````````````````````````````````````````````````````````````` Show me a piano falling down a mineshaft and I'll show you A-flat minor. ```````````````````````````````````````````````````````````````````````` ````````````````````````