On Fri, 2013-11-29 at 23:13 +0100, Gino Semseo wrote:
> 2013-11-29 20:29:01 1VmTkL-0002F8-5Q spam acl condition: cannot parse spamd 
> output


> Nov 29 16:00:55 hyperserver spamd[38925]: prefork: ordered 38942 to accept
> Nov 29 16:00:55 hyperserver spamd[38925]: prefork: sysread(7) not ready, wait 
> max 300 secs

Why does sysread() fail?

That's the parent process, so reading from the child failed in
M::SA::SpamdForkScaling sysread_with_timeout(), and we're falling back
to blocking I/O.

> Nov 29 16:00:55 hyperserver spamd[38925]: prefork: child 38942: entering 
> state 2
> Nov 29 16:00:55 hyperserver spamd[38925]: prefork: new lowest idle kid: 38944

The child we just ordered to process a message now is busy (state 2).
Following lines are logged by the child.

> Nov 29 16:00:55 hyperserver spamd[38942]: spamd: connection from localhost 
> [127.0.0.1] at port 44846
> Nov 29 16:00:55 hyperserver spamd[38942]: spamd: setuid to diversia succeeded
> Nov 29 16:00:55 hyperserver spamd[38942]: config: read_scoreonly_config: 
>  cannot open "/home/diversia/.spamassassin/user_prefs": No such file or 
> directory
> Nov 29 16:00:55 hyperserver spamd[38942]: info: user has changed

> Nov 29 16:00:55 hyperserver spamd[38942]: bayes: DB journal sync: last sync: 
> 1385736975

Last Bayes journal sync is just 5 minutes ago, Nov 29 15:56:15. That's
not necessarily bad, but seems a little odd given your configuration of
learning to journal:

  bayes_learn_to_journal 1
  bayes_journal_max_size 100000

The journal max size probably is not what you expect it to be. That
custom setting is hardly sufficiently different from the default 102400,
to even care about it.

> Nov 29 16:00:55 hyperserver spamd[38942]: config: score set 3 chosen.
> Nov 29 16:00:55 hyperserver spamd[38942]: spamd: running as uid 574
> Nov 29 16:00:55 hyperserver spamd[38942]: config: time limit 300.0 s
> Nov 29 16:00:55 hyperserver spamd[38942]: message: main message type: 
> multipart/mixed

The child obviously is processing a message, and the child's last few
logged debug messages do not indicate any issue. Next is the parent
noticing its child's death.

> Nov 29 16:00:55 hyperserver spamd[38925]: prefork: child closed connection
> Nov 29 16:00:55 hyperserver spamd[38925]: prefork: child states: BIIIII
> Nov 29 16:00:55 hyperserver spamd[38925]: prefork: child 38942: just exited
> Nov 29 16:00:55 hyperserver spamd[38925]: prefork: child 38942: entering 
> state 4

State 4, parent received SIGCHL.

> Nov 29 16:00:55 hyperserver spamd[38925]: prefork: new lowest idle kid: 38944
> Nov 29 16:00:55 hyperserver spamd[38925]: spamd: handled cleanup of child pid 
> [38942] due to SIGCHLD: exit 1
> Nov 29 16:00:55 hyperserver spamd[38925]: prefork: new lowest idle kid: 38944
> Nov 29 16:00:55 hyperserver spamd[38925]: prefork: select returned err 
>  Interrupted system call, probably signalled

That would be due to the above sysread() failing. We backed down to
using select() and timeouts.

Wait, the code in question does ring some bells with the full thread in
mind. That debug message is unique in the SA code, and comes with a nice
explanation in M::SA::SpamdForkScaling:

    if (exists &Errno::EINTR && $selerr == &Errno::EINTR)
    {
      # this happens if the process is signalled during the select(),
      # for example if someone sends SIGHUP to reload the configuration.
      # just return inmmediately
      dbg("prefork: select returned err $selerr, probably signalled");
      return;
    }

Indeed, we did have to revert to select(). Might we also have received
SIGHUP to reload the configuration? Like, say, a massive whitelist being
updated, and us being requested to immediately pull it in? (Thanks
Martin Gregorie for pointing out that game of "whitelist changed, reload
daemon" procedure.)

Gino, are there any processes or cron jobs, that recreate the whitelist
more frequently than once a day? How frequently? Do they signal spamd
about the configuration change?

If so, you owe both me and Martin a beer. ;)

> Nov 29 16:00:55 hyperserver spamd[38925]: spamd: server successfully spawned 
> child process, pid 39240
> Nov 29 16:00:55 hyperserver spamd[38925]: prefork: child 39240: entering 
> state 0

State 0, a new child is born. Off to business as usual...


> If I just execute spamassassin -t -D from shell, most times there is no 
> error and the message is fully scanned...

"MOST times there is no error". Emphasis mine. So you cannot reliably
reproduce the issue, and there is no test-case message. It just happens,
at random -- or rather at times other processes meddle with your spamd
daemon?


-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i<l;i++){ i%8? c<<=1:
(c=*++x); c&128 && (s+=h); if (!(h>>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Reply via email to