Every day we see pause times of sometime 60 seconds to read 1K of a file for 
local reads as well as NFS in a test setup.

We have a x4500 setup as a single 4*( raid2z 9 + 2)+2 spare pool and have the 
files  system mounted over v5 krb5 NFS and accessed directly. The pool is a 
20TB pool and is using . There are three filesystems, backup, test and home. 
Test has about 20 million files and uses 4TB. These files range from 100B to 
200MB. Test has a cron job to take snapshots every 15 minutes from 1m on the 
hour. Every 15min at 2min on the hour a cron batch job runs to zfs send/recv to 
the backup filesystem. Home has only 100GB.

The test dir has 3 directories, 1 has 130,000 files other 2 have 10,000,000. We 
have 4 processes, 2 over NFS, 2 local, and 2 reading the dir with 130,000 
files, the other 2 reading the dir with 10,000,000. Every 35 seconds each 
process reads 1K of 64thK of 10 files and records the latency, then reads for 1 
second and record the throughput. At times of no other activity (outside the 
snapshot and send/recv times) we see read latencies of up to 60 seconds, maybe 
once a day at random times.

We are using an unpatched Solaris 10 08/07 build.

Pause times this long can to timeouts and jobs to fails which is problematic 
for us. Is this expected behaviour? Can anything be done to mitigate or 
diagnose the issue?
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to