Dovecot (v1.1.rc8) died tonight, with an error about time moving
backwards by 4398 seconds. I can see from logs that this has happend a
few times before with the imap processes, without me noticing. I sure
noticed the master process missing, though :-).

I was puzzled that it was always 4398 seconds, in particular because
this server runs an NTP daemon. A little searching for this problem
shows that it is an issue with the Linux kernel gettimeofday(), see
e.g. http://lkml.org/lkml/2007/8/23/96

Below is a patch (untested) to work around this issue. Do you see
something wrong with this approach, apart from the uglyness?

I just picked the 4395-4400 values by chance. Can you figure out how
big the window should be?


Thanks,
Anders.


--- ./src/lib/ioloop.c-orig     2008-06-20 10:45:54.000000000 +0200
+++ ./src/lib/ioloop.c  2008-06-20 10:47:36.000000000 +0200
@@ -230,8 +230,13 @@
        struct timeval tv, tv_call;
         unsigned int t_id;
 
-       if (gettimeofday(&ioloop_timeval, &ioloop_timezone) < 0)
-               i_fatal("gettimeofday(): %m");
+       /* The Linux gettimeofday() will sometimes jump forward
+        * by approximately 4398 seconds. Ignore that reading. */
+       do {
+               if (gettimeofday(&ioloop_timeval, &ioloop_timezone) < 0)
+                       i_fatal("gettimeofday(): %m");
+       } while (4395 < (ioloop_timeval.tv_sec - ioloop_time)
+                    && (ioloop_timeval.tv_sec - ioloop_time) < 4400);
 
        /* Don't bother comparing usecs. */
        if (ioloop_time > ioloop_timeval.tv_sec) {

Reply via email to