Hi everybody.
I've been hit by a bug in vdelivermail (in maildirquota.c, precisely), and I
want to share the
resolution I've found.
I've recently upgraded to vpopmail 5.4.33, and we experienced that occasionally
(once o twice a day) vdelivermail starts looping, eating all the CPU.
An strace on the offending instance resulted in:
read(5, 0xf5e4317, 2963227893) = -1 EINVAL (Invalid argument)
read(5, 0xf5e4316, 2963227894) = -1 EINVAL (Invalid argument)
read(5, 0xf5e4315, 2963227895) = -1 EINVAL (Invalid argument)
read(5, 0xf5e4314, 2963227896) = -1 EINVAL (Invalid argument)
read(5, 0xf5e4313, 2963227897) = -1 EINVAL (Invalid argument)
read(5, 0xf5e4312, 2963227898) = -1 EINVAL (Invalid argument)
read(5, 0xf5e4311, 2963227899) = -1 EINVAL (Invalid argument)
read(5, 0xf5e4310, 2963227900) = -1 EINVAL (Invalid argument)
read(5, 0xf5e430f, 2963227901) = -1 EINVAL (Invalid argument)
read(5, 0xf5e430e, 2963227902) = -1 EINVAL (Invalid argument)
read(5, 0xf5e430d, 2963227903) = -1 EINVAL (Invalid argument)
read(5, 0xf5e430c, 2963227904) = -1 EINVAL (Invalid argument)
...
...
The file descriptor number 5 of that process is pointing at the maildirsize
file of the mailbox, and is marked as "deleted" in /proc/<procid>/fd
We store the mail in a NetApp NFS share, without locking for performance
reasons, and I think I've found the problem:
the file is deleted by someone else while vdelivermail is reading it.
In maildirquota.c, function maildirsize_read, there is a while loop that reads:
while (l)
{
n=read(f, p, l);
if (n < 0)
{
But n is defined as a unsigned int (64 bit) , so even if "read" returns a
negative value (error) the "if" is never trigged.
So I've made this patch:
--- maildirquota.c.orig 2014-01-31 12:21:22.000000000 +0100
+++ maildirquota.c 2014-01-31 12:08:47.000000000 +0100
@@ -337,7 +337,6 @@
int f;
char *p;
unsigned l;
- storage_t n;
int first;
int ret = 0;
@@ -360,15 +359,16 @@
while (l)
{
- n=read(f, p, l);
- if (n < 0)
+ ssize_t nr;
+ nr=read(f, p, l);
+ if (nr < 0)
{
close(f);
return (-1);
}
- if (n == 0) break;
- p += n;
- l -= n;
+ if (nr == 0) break;
+ p += nr;
+ l -= nr;
}
if (l == 0 || ret) /* maildir too big */
{
which fixes the problem.
Any chance to incorporate the fix in the next version ?
Thanks
--
Simone Lazzaris
QCom S.p.A.
!DSPAM:52eb8a1334261024417156!