More detailed description of readdir test and conclusion at the end:

Roch asked me:
> Is this a NFS V3 or V4 test or don't care ?

I am running NFS V3 but the short test of NFS V4 showed that the
problem is there.

Then Roch asked:
> I've run rdir  on a few of  my large directories, However my
> large directories  are  not much larger  than  ncsize, maybe
> your's are. Do I understand that you hit the issue only upon
> first large rdir after reboot ?

After reboot of the NFS client (see below).

Then Roch added:
> If so, it might me that we get a speedup from the part of
> the run in which we are initially filling the dnlc cache.
> That could explain thge increase in sys time. But the real
> time increase seems too much to be due to this.
>
> Anyway I'm interested in the directory size rdir reports and
> the ncsize/D from mdb -k. Also a third pass through might
> yield a lead.
>
> -r

ncsize has a default value. People told me "don't increase dnlc size when 
running ZFS".
# echo 'ncsize/D' | mdb -k
ncsize:
ncsize:         129675

Directory size? There are 160 ZFS'es under zpool tank1, each ZFS is
202MB, total 31.5GB, 1224000 files

# zpool list
NAME                    SIZE    USED   AVAIL    CAP  HEALTH     ALTROOT
tank1                   382G   31.5G    351G     8%  ONLINE     -

More detailed results:
ZFS local runs - "normal behavior":

1.     2:33.406
2.     2:25.353
3.     2:27.033

NFS V3/ZFS runs - first is ok, then jumped up:

1.     3:14.185
2.     4:47.681
3.     4:52.213
4.     4:49.841
5.     4:53.069
6.     4:45.290

after reboot of the NFS client:

1.     2:56.760
2.     4:43.397

after reboot of both client and server:

1.real     3:12.841
2.real     4:50.869

after reboot of the NFS server only:

1.     5:15.048
2.     4:54.686
3.     4:48.713

It means the problem is on the NFS client: after rebbot of the client the first 
run is "ok", then all the rest are "bad". When the server was rebooted, it 
didn't help and the results stayed "bad".

Roch replied :
> I'd hypothesize that when the client doesn't know about a file he
> just gets the data and boom. But once he's got a cached copy
> he needs more time to figure out if the data is up to date.
>
> This seems to have been a tradeoff of metadata operations in favor of
> faster data op (!?).
> 
> Note also that SFS doesn't use the client's NFS code. It
> runs it's own user space client.

The fact that the described problem is 100%-NFS-client-problem, there is 
nothing to do with ZFS code to improve the situtaion.
And the SFS problem we observed (see the first message in this thread) has 
nothing common with this one. Unfortunately, the abnormal behavior of NFS/ZFS 
during an SFS test didn't get much attention so I don't have any clue. Anyway, 
I'll update this thread when I have more information on the problem.
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to