More detailed description of readdir test and conclusion at the end: Roch asked me: > Is this a NFS V3 or V4 test or don't care ?
I am running NFS V3 but the short test of NFS V4 showed that the problem is there. Then Roch asked: > I've run rdir on a few of my large directories, However my > large directories are not much larger than ncsize, maybe > your's are. Do I understand that you hit the issue only upon > first large rdir after reboot ? After reboot of the NFS client (see below). Then Roch added: > If so, it might me that we get a speedup from the part of > the run in which we are initially filling the dnlc cache. > That could explain thge increase in sys time. But the real > time increase seems too much to be due to this. > > Anyway I'm interested in the directory size rdir reports and > the ncsize/D from mdb -k. Also a third pass through might > yield a lead. > > -r ncsize has a default value. People told me "don't increase dnlc size when running ZFS". # echo 'ncsize/D' | mdb -k ncsize: ncsize: 129675 Directory size? There are 160 ZFS'es under zpool tank1, each ZFS is 202MB, total 31.5GB, 1224000 files # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT tank1 382G 31.5G 351G 8% ONLINE - More detailed results: ZFS local runs - "normal behavior": 1. 2:33.406 2. 2:25.353 3. 2:27.033 NFS V3/ZFS runs - first is ok, then jumped up: 1. 3:14.185 2. 4:47.681 3. 4:52.213 4. 4:49.841 5. 4:53.069 6. 4:45.290 after reboot of the NFS client: 1. 2:56.760 2. 4:43.397 after reboot of both client and server: 1.real 3:12.841 2.real 4:50.869 after reboot of the NFS server only: 1. 5:15.048 2. 4:54.686 3. 4:48.713 It means the problem is on the NFS client: after rebbot of the client the first run is "ok", then all the rest are "bad". When the server was rebooted, it didn't help and the results stayed "bad". Roch replied : > I'd hypothesize that when the client doesn't know about a file he > just gets the data and boom. But once he's got a cached copy > he needs more time to figure out if the data is up to date. > > This seems to have been a tradeoff of metadata operations in favor of > faster data op (!?). > > Note also that SFS doesn't use the client's NFS code. It > runs it's own user space client. The fact that the described problem is 100%-NFS-client-problem, there is nothing to do with ZFS code to improve the situtaion. And the SFS problem we observed (see the first message in this thread) has nothing common with this one. Unfortunately, the abnormal behavior of NFS/ZFS during an SFS test didn't get much attention so I don't have any clue. Anyway, I'll update this thread when I have more information on the problem. This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss