On 11/30/10 08:33, Rick Macklem wrote:
I've been running dovecot 1.1 on FreeBSD 7.x for a while with a bare
minimum of NFS problems, but it got worse with 8.x. I have 2-4 servers
(usually just 2) accessing mail on a Netapp over NFSv3 via imapd.
delivery is via procmail which doesn't touch the dovecot metadata and
webmail uses imapd. Client connections to imapd go to random servers
and I don't yet have solid means to keep certain users on certain
servers. I upgraded some of the servers to 8.x and dovecot 1.2 and ran
into Stale NFS file handles causing index/uidlist corruption causing
inboxes to appear as empty when they were not. In some situations
their
corrupt index had to be deleted manually. I first suspected dovecot
1.2
since it was upgraded at the same time but I downgraded to 1.1 and its
doing the same thing. I don't really have a wealth of details to go on
yet and I usually stay quiet until I do, and half the time it is
difficult to reproduce myself so I've had to put it in production to
get
a feel for progress. This only happens a dozen or so times per weekday
but I feel the need to start taking bigger steps. I'll probably do
what
I can to get IMAP back on a stable base (7.x?) and also try to debug
8.x
on the remaining servers. A binary search is within possibility if I
can reproduce the symptoms often enough even if I have to put a test
server in production for a few hours.
Any tips on where we could start looking, or alterations I could try
making such as sysctls to return to older behavior? It might be worth
noting that I've seen a considerable increase in traffic from my mail
servers since the 8.x upgrade timeframe, on the order of 5-10x as much
traffic to the NFS server. dovecot tries its hardest to flush out the
access cache when needed and it was working well enough since about
1.0.16 (years ago). It seems like FreeBSD is what regressed in this
scenario. dovecot 2.x is going in a different direction from my
situation and I'm not ready to start testing that immediately if I can
avoid it as it will involve some restructuring.
Thanks for any input. For now the following errors are about all I
have
to go on:
Nov 29 11:07:54 server1 dovecot: IMAP(user1):
o_stream_send(/home/user1/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist)
failed: Stale NFS file handle
Nov 29 13:19:51 server1 dovecot: IMAP(user1):
o_stream_send(/home/user1/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist)
failed: Stale NFS file handle
Nov 29 14:35:41 server1 dovecot: IMAP(user2):
o_stream_send(/home/user2/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist)
failed: Stale NFS file handle
Nov 29 15:07:05 server1 dovecot: IMAP(user3): read(mail, uid=128990)
failed: Stale NFS file handle
Nov 29 11:57:22 server2 dovecot: IMAP(user4):
open(/egr/mail/shared/vprgs/dovecot-acl-list) failed: Stale NFS file
handle
Nov 29 14:04:22 server2 dovecot: IMAP(user5):
o_stream_send(/home/user5/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist)
failed: Stale NFS file handle
Nov 29 14:27:21 server2 dovecot: IMAP(user6):
o_stream_send(/home/user6/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist)
failed: Stale NFS file handle
Nov 29 15:44:38 server2 dovecot: IMAP(user7):
open(/egr/mail/shared/decs/dovecot-acl-list) failed: Stale NFS file
handle
Nov 29 19:04:54 server2 dovecot: IMAP(user8):
o_stream_send(/home/user8/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist)
failed: Stale NFS file handle
Nov 29 06:32:11 server3 dovecot: IMAP(user9):
open(/egr/mail/shared/cmsc/dovecot-acl-list) failed: Stale NFS file
handle
Nov 29 10:03:58 server3 dovecot: IMAP(user10):
o_stream_send(/home/user10/Maildir/dovecot/private/control/.INBOX/dovecot-uidlist)
failed: Stale NFS file handle
Others have made good suggestions. One more you could try is disabling the
negative
name caching by setting the option "negnametimeo=0". The addition of negative
name
caching is also in FreeBSD7, but it is a fairly recent change, so your FreeBSD7
boxes
may not have had it. I also think trying the "dot-locking" and running without
statd
and lockd (you can mount with the "nolock" option) would be worth trying. And,
of course,
disabling attribute caching is mentioned on the web page others cited.
Good luck with it, rick
ps: Unfortunately the NFS protocol cannot support for POSIX file system
semantics, so
some apps can never run correctly on NFS mounted volumes. NFSv4 comes
closer, but
it still can't provide full POSIX semantics.
I'll give negnametimeo=0 a try on one server starting tonight, I'll be
busy tomorrow and don't want to risk making anything potentially worse
than it is yet. I can't figure out how to disable the attr cache in
FreeBSD. Neither suggestions seem to be valid, and years ago when I
looked into it I got the impression that you can't, but I'd love to be
proven wrong. I'll try dotlock when I can. Would disabling statd and
lockd be the same as using nolock on all mounts? The vacation binary is
the only thing I can think of that might use it, not sure how well it
would like missing it which is how I discovered I needed it in the first
place. Also, if disabling lockd shows an improvement, could it lead to
further investigation or is it just a workaround? Just trying to
understand the possibilities better. I know ESTALE means the file
vanished but for the files I had an error on, it is expected that
multiple systems are going to spontaneously replace the file. Thanks.
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"