On Mon, 2010-08-09 at 11:24 +0200, Lukas Kolbe wrote: > So, testing begins. > > First conclusion: not all traffic patterns produce the page allocation > failure. rdiff-backup only writing to an nfs-share does no harm; > rdiff-backup reading and writing (incremental backup) leads to (nearly > immediate) error. > > The nfs-share is always mounted with proto=tcp and nfsv3; /proc/mount says: > fileserver.backup...:/export/backup/lbork /.cbackup-mp nfs > rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=65535,timeo=600,retrans=2,sec=sys,mountport=65535,addr=x.x.x.x > 0 0 [...]
I've seen some recent discussion of a bug in the Linux NFS client that can cause it to stop working entirely in case of some packet loss events <https://bugzilla.kernel.org/show_bug.cgi?id=16494>. It is possible that you are running into that bug. I haven't yet seen an agreement on the fix for it. I also wonder whether the extremely large request sizes (rsize and wsize) you have selected are more likely to trigger the allocation failure in virtio_net. Please can you test whether reducing them helps? Ben. -- Ben Hutchings Once a job is fouled up, anything done to improve it makes it worse.
signature.asc
Description: This is a digitally signed message part