Hello,

We've recently setup a FreeBSD 9.0-RELEASE (x64) system to test as an NFS server for "live" network homes for Mac clients (mostly 10.5 and 10.6 clients).

We're a public school district and normally have around 150-200 users logged in at a time with network homes. Currently, we're using netatalk (AFP) on a Linux box, after migrating from an aging Mac OS X server. Unfortunately, netatalk has some serious performance issues under the load we're putting it under and we'd like to migrate to NFS.

We've tried several Linux distributions and various kernels and we're now testing FreeBSD (and tested FreeNAS) with similar setups. Unfortunately, they all suffer the same issue.

As a test, I have a series of scripts to simulate user activity on the clients (e.g. opening Word, opening a browser, doing some read/writes with dd, etc). After a while, NFS on the server runs into an issue where (what I think happens) rpc.statd can't talk to rpc.lockd. Being Mac clients, they all get a rather ugly dialog box stating that their connection to the server has been lost.

It's worth mentioning that this server is a KVM 'guest' on a Linux server. I'm aware of some I/O issues there, but I don't have a decent piece of hardware to really test this on. I allocated 4 CPUs to it and 10GB of RAM. I've tested with the virtio net drivers and without. Considering I've seen the same symptoms on around 6 Linux distributions, with various kernels, FreeNAS, and FreeBSD, I wouldn't be surprised to get the same results if I weren't virtualized.

I haven't really done any tuning on the FreeBSD server, it's fairly vanilla.

We have around ~2600 machines throughout our campus, with limited remote management capabilities (that's on the big agenda to tackle), so changing NFS mount options there would be rather difficult. These are LDAP accounts with the NFS mounts in LDAP as well, for what it's worth. The clients mount it pretty vanilla (output of 'mount' on client): freenas.dsdk12.schoollocal:/mnt/homes on /net/freenas.dsdk12.schoollocal/mnt/homes (nfs, nodev, nosuid, automounted, nobrowse)

On the server, my /etc/exports looks like this:
/srv/homes    -alldirs -network    172.30.0.0/16

This export doesn't have a lot of data - it's 150 small home directories of test accounts. No other activity is being done on this server. The filesystem if UFS.

/etc/rc.conf on the server:
rpcbind_enable="YES"
nfs_server_enable="YES"
mountd_flags="-r -l"
nfsd_enable="YES"
mountd_enable="YES"
rpc_lockd_enable="YES"
rpc_statd_enable="YES"
nfs_server_flags="-t -n 128"

When this occurs, /var/log/messages starts to fill up with this:

Mar 30 16:35:18 freefs kernel: Failed to contact local NSM - rpc error 5
Mar 30 16:35:20 freefs rpc.statd: unmon request from localhost, no matching monitor Mar 30 16:35:44 freefs rpc.statd: unmon request from localhost, no matching monitor
-- repeated a few times every few seconds --
Mar 30 16:54:50 freefs rpc.statd: Unsolicited notification from host hs00508s4434.dsdk12.schoollocal Mar 30 16:55:01 freefs rpc.statd: Unsolicited notification from host hs00520s4539.dsdk12.schoollocal Mar 30 16:55:10 freefs rpc.statd: Failed to call rpc.statd client at host localhost

nfsstat shortly after a failure:
Rpc Info:
 TimedOut   Invalid X Replies   Retries  Requests
        0         0         0         0      1208
Cache Info:
Attr Hits Misses Lkup Hits Misses BioR Hits Misses BioW Hits Misses 177 951 226 28 3 6 0 2 BioRLHits Misses BioD Hits Misses DirE Hits Misses Accs Hits Misses 49 3 13 5 9 0 148 9

Server Info:
Getattr Setattr Lookup Readlink Read Write Create Remove 262698 101012 1575347 29 1924761 2172712 0 43792 Rename Link Symlink Mkdir Rmdir Readdir RdirPlus Access 27447 0 21 5596 1691 118073 0 2596146
    Mknod    Fsstat    Fsinfo  PathConf    Commit
        0     83638       108       108    183632
Server Ret-Failed
                0
Server Faults
            0
Server Cache Stats:
   Inprog      Idem  Non-idem    Misses
        0         0         0   9172982
Server Write Gathering:
 WriteOps  WriteRPC   Opsaved
  2172712   2172712         0

rpcinfo shortly after a failure:
   program version netid     address                service    owner
    100000    4    tcp       0.0.0.0.0.111          rpcbind    superuser
    100000    3    tcp       0.0.0.0.0.111          rpcbind    superuser
    100000    2    tcp       0.0.0.0.0.111          rpcbind    superuser
    100000    4    udp       0.0.0.0.0.111          rpcbind    superuser
    100000    3    udp       0.0.0.0.0.111          rpcbind    superuser
    100000    2    udp       0.0.0.0.0.111          rpcbind    superuser
    100000    4    tcp6      ::.0.111               rpcbind    superuser
    100000    3    tcp6      ::.0.111               rpcbind    superuser
    100000    4    udp6      ::.0.111               rpcbind    superuser
    100000    3    udp6      ::.0.111               rpcbind    superuser
    100000    4    local     /var/run/rpcbind.sock  rpcbind    superuser
    100000    3    local     /var/run/rpcbind.sock  rpcbind    superuser
    100000    2    local     /var/run/rpcbind.sock  rpcbind    superuser
    100005    1    udp6      ::.2.119               mountd     superuser
    100005    3    udp6      ::.2.119               mountd     superuser
    100005    1    tcp6      ::.2.119               mountd     superuser
    100005    3    tcp6      ::.2.119               mountd     superuser
    100005    1    udp       0.0.0.0.2.119          mountd     superuser
    100005    3    udp       0.0.0.0.2.119          mountd     superuser
    100005    1    tcp       0.0.0.0.2.119          mountd     superuser
    100005    3    tcp       0.0.0.0.2.119          mountd     superuser
    100024    1    udp6      ::.3.191               status     superuser
    100024    1    tcp6      ::.3.191               status     superuser
    100024    1    udp       0.0.0.0.3.191          status     superuser
    100024    1    tcp       0.0.0.0.3.191          status     superuser
    100003    2    tcp       0.0.0.0.8.1            nfs        superuser
    100003    3    tcp       0.0.0.0.8.1            nfs        superuser
    100003    2    tcp6      ::.8.1                 nfs        superuser
    100003    3    tcp6      ::.8.1                 nfs        superuser
    100021    0    udp6      ::.3.248               nlockmgr   superuser
    100021    0    tcp6      ::.2.220               nlockmgr   superuser
    100021    0    udp       0.0.0.0.3.202          nlockmgr   superuser
    100021    0    tcp       0.0.0.0.2.255          nlockmgr   superuser
    100021    1    udp6      ::.3.248               nlockmgr   superuser
    100021    1    tcp6      ::.2.220               nlockmgr   superuser
    100021    1    udp       0.0.0.0.3.202          nlockmgr   superuser
    100021    1    tcp       0.0.0.0.2.255          nlockmgr   superuser
    100021    3    udp6      ::.3.248               nlockmgr   superuser
    100021    3    tcp6      ::.2.220               nlockmgr   superuser
    100021    3    udp       0.0.0.0.3.202          nlockmgr   superuser
    100021    3    tcp       0.0.0.0.2.255          nlockmgr   superuser
    100021    4    udp6      ::.3.248               nlockmgr   superuser
    100021    4    tcp6      ::.2.220               nlockmgr   superuser
    100021    4    udp       0.0.0.0.3.202          nlockmgr   superuser
    100021    4    tcp       0.0.0.0.2.255          nlockmgr   superuser
    300019    1    tcp       0.0.0.0.2.185          amd        superuser
    300019    1    udp       0.0.0.0.2.162          amd        superuser

The load can get fairly high during my 'stress' tests, but not *that* high. I'm surprised to see these particular symptoms that affect every connected user at the same time and would expect slowdowns rather than the issue I'm seeing.

Any ideas or nudges in the right direction are most welcome. This is severely plaguing us and our students :\

Thanks,
Josh

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to