Every day we see pause times of sometime 60 seconds to read 1K of a file for local reads as well as NFS in a test setup.
We have a x4500 setup as a single 4*( raid2z 9 + 2)+2 spare pool and have the files system mounted over v5 krb5 NFS and accessed directly. The pool is a 20TB pool and is using . There are three filesystems, backup, test and home. Test has about 20 million files and uses 4TB. These files range from 100B to 200MB. Test has a cron job to take snapshots every 15 minutes from 1m on the hour. Every 15min at 2min on the hour a cron batch job runs to zfs send/recv to the backup filesystem. Home has only 100GB. The test dir has 3 directories, 1 has 130,000 files other 2 have 10,000,000. We have 4 processes, 2 over NFS, 2 local, and 2 reading the dir with 130,000 files, the other 2 reading the dir with 10,000,000. Every 35 seconds each process reads 1K of 64thK of 10 files and records the latency, then reads for 1 second and record the throughput. At times of no other activity (outside the snapshot and send/recv times) we see read latencies of up to 60 seconds, maybe once a day at random times. We are using an unpatched Solaris 10 08/07 build. Pause times this long can to timeouts and jobs to fails which is problematic for us. Is this expected behaviour? Can anything be done to mitigate or diagnose the issue? This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss