I am having some odd ZFS performance issues and looking for some assistance on 
where to look to figure out what the underlying problem is.

System Config
- Solaris 10 Update 3 (11/06) on a Sun v440- 118833-36
- 9-Bay JBOD with 180GB disks on two SCSI channels.  4 on one, 5 on the 
other..I forget the type of SCSI connection
- 9-Bay is configured as a raid2z with all 9 disks, no spare

On Monday, I had to swap out one of the disks in the 9-Bay.  The disk wasn't 
bad, just had to swap it out for other reasons.  I took the disk offline, 
pulled the disk, put in a new disk, and used the zpool replace command.  2.5 
days later, the re-silver completed.  During the time of the re-silver, zpool 
status was saying some errors had occurred but no data was lost or something.  
I did a zpool clear once the re-silver completed and haven't seen any errors 
since. (only been a .5 day though)

Now when I use the storage pool, I will see the pool pause.  I see this by 
doing a copying multiple large files at once to another disk and running zpool 
iostat 1 and see multiple seconds of inactivity.  I also ran 'iostat -xv 1' and 
saw that the %w column for the disk that was replaced is at 100 all the time.  
On top of that, I am seeing some "scsi bus resets" messages in my 
/var/adm/messages log.  Going back, these messages were there before I replaced 
the disk.  Another odd thing I saw with the iostat was that most of the IO was 
on only one of the scsi channels, the one with 5 disks, and not much activity 
on the channel with 4 disks even though they are in the same raid2z pool.

So...any suggestions on where to start?  Not sure if I have a bad SCSI 
controller, bad disk, etc.  Any suggestions on where to poke around would be 
great.

Thanks!
Chris
 
 
This message posted from opensolaris.org
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to