Problem: vpopmail authentications failing randomly
I made the following post to the vpopmail users list
grep vchkpw /var/log/maillog | grep -v success | grep dlb
These are happening for various accounts, on a seemingly random basis. Out about 4,000 authentications today, it failed 100 times.
qmail-pop3d is run as follows:
Previous versions of MySQL were not compiled with linuxthreads but this problem existed then as well. The my.cnf file is based on my-medium.cnf, with two tweaks appropriate for this system (skip-innodb, bin-log disabled).
I have used vpopmail versions 5.4.2, 5.4.8 and 5.4.10 and all exhibit this problem. I also see this error in the maillogs on occasion:
There are no errors reported in MySQL's .err log. There are no other related errors reported in the system logs. The MySQL load is quite light, with less than one query executed every few seconds. The system load is also fairly light, mostly hovering between 0.4 and 0.5.
All the symptoms are indicative of vpopmail having an issue with MySQL, so I set up a little perl script to test for me while I tried to replicate the problem.
So, I run this script in one terminal window while trying to trigger the problem in another. As near as I can tell, the problem is always during times when the system is busy. I cannot replicate this problem on any of my own servers. To help narrow down the problem, I put the system under sustained heavy load (make buildworld). As expected, I get frequent authentication and test (~45%) failures.
So, there is certainly a problem with vpopmail and it's MySQL interaction. Again, this is with versions including 5.4.2, 5.4.8, and 5.4.10. I've seen random cases of this but I've never had an instance where it was repeatable.
In such a case, shouldn't the vpopmail programs be returning a error indicative of the problem instead of a "user not found" error? A "user not found" error is not even close to accurately describing what the problem is.
During this heavy load testing, I wanted to see if the issue was MySQL or vpopmail. To test this I ran another MySQL client and see if it too has problems interacting with the MySQL server. This is another C program, also compiled to access the vpopmail database and do a query. I also ran this program simultaneously with the test script and the "make buildworld". While the vpopmail test script was failing every other authentication, the other program succeeded, every single time, never failing once.
So, I turned to vmysql.c and noticed a timeout setting there that affects the mysql connection timeout. I bumped it up from 2 to 5, and it has reduced the failure frequency but it's still happening fairly regularly. There are some slow queries in MySQL, but only 15 from the last day, so that doesn't even closely correspond with 300 "no found" errors.
Here was one somewhat helpful response:
Hi, Matt. I have previously had this problem. You are indeed correct. It is a problem with vpopmail talking with MySQL. Unfortunately, MySQL, as you said, reports nothing, regardless of the fact that vpopmail is returning an error. The series of events you're seeing in the vpopmail logs, is vpopmail receiving a 'Read error, MySQL server has gone away' during a database read. vpopmail then thinks the database may not exist, and it tries to create the database as if it was starting from scratch. I spent *hours* looking into *WHY* this occurs with no result other than the fact that it is indeed on MySQL's end. This is what I have yet so far figured out:
1) vpopmail connects and succesfully authenticates, and switches to the database.
2) vpopmail tries to read and receives the 'Server gone away' message.
3) The MySQL server does not document via error or debugging messages the actual cause of the error, but if it did, we might be able to determine the root cause.
In my trials, I tried adjusting packet sizes, timeouts on both vpopmail, and MySQL, turning on debugging, recompiling with various settings, etc. The interesting thing about the problem is that it just simply crops out of out no where, and simply will not go away. I would love to get this documented on qmailwiki.org, but as of yet, I've had no luck determining the cause, or finding a solution. You may wish to try upgrading or downgrading MySQL.
If you find anything else, please let me know.
The final solution was to recompile the MySQL server without linux threads.
Last modified on 5/26/05.