Lately my zfs pool in my home server has degraded to a state where it can be said it doesn't work at all. Read spead is slower than I can read from the internet on my slow dsl-line... This is compared to just a short while ago, where I could read from it with over 50mb/sec over the network.
My setup: Running latest Solaris 10: # uname -a SunOS solssd01 5.10 Generic_142901-02 i86pc i386 i86pc # zpool status DATA pool: DATA state: ONLINE config: NAME STATE READ WRITE CKSUM DATA ONLINE 0 0 0 raidz1 ONLINE 0 0 0 c2t5d0 ONLINE 0 0 0 c2t4d0 ONLINE 0 0 0 c2t3d0 ONLINE 0 0 0 c2t2d0 ONLINE 0 0 0 spares c0t2d0 AVAIL errors: No known data errors # zfs list -r DATA NAME USED AVAIL REFER MOUNTPOINT DATA 3,78T 229G 3,78T /DATA All of the drives in this pool are 1.5tb western digital green drives. I am not seeing any error messages in /var/adm/messages, and "fmdump -eV" shows no errors... However, I am seeing some soft faults in "iostat -eEn": ---- errors --- s/w h/w trn tot device 2 0 0 2 c0t0d0 1 0 0 1 c1t0d0 2 0 0 2 c2t1d0 151 0 0 151 c2t2d0 151 0 0 151 c2t3d0 153 0 0 153 c2t4d0 153 0 0 153 c2t5d0 2 0 0 2 c0t1d0 3 0 0 3 c0t2d0 0 0 0 0 solssd01:vold(pid531) c0t0d0 Soft Errors: 2 Hard Errors: 0 Transport Errors: 0 Vendor: Sun Product: STK RAID INT Revision: V1.0 Serial No: Size: 31.87GB <31866224128 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 2 Predictive Failure Analysis: 0 c1t0d0 Soft Errors: 1 Hard Errors: 0 Transport Errors: 0 Vendor: _NEC Product: DVD_RW ND-3500AG Revision: 2.16 Serial No: Size: 0.00GB <0 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 1 Predictive Failure Analysis: 0 c2t1d0 Soft Errors: 2 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: SAMSUNG HD753LJ Revision: 1113 Serial No: Size: 750.16GB <750156373504 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 2 Predictive Failure Analysis: 0 c2t2d0 Soft Errors: 151 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD15EADS-00R Revision: 0A01 Serial No: Size: 1500.30GB <1500301909504 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 151 Predictive Failure Analysis: 0 c2t3d0 Soft Errors: 151 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD15EADS-00R Revision: 0A01 Serial No: Size: 1500.30GB <1500301909504 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 151 Predictive Failure Analysis: 0 c2t4d0 Soft Errors: 153 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD15EADS-00R Revision: 0A01 Serial No: Size: 1500.30GB <1500301909504 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 153 Predictive Failure Analysis: 0 c2t5d0 Soft Errors: 153 Hard Errors: 0 Transport Errors: 0 Vendor: ATA Product: WDC WD15EADS-00R Revision: 0A01 Serial No: Size: 1500.30GB <1500301909504 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 153 Predictive Failure Analysis: 0 c0t1d0 Soft Errors: 2 Hard Errors: 0 Transport Errors: 0 Vendor: Sun Product: STK RAID INT Revision: V1.0 Serial No: Size: 31.87GB <31866224128 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 2 Predictive Failure Analysis: 0 c0t2d0 Soft Errors: 3 Hard Errors: 0 Transport Errors: 0 Vendor: Sun Product: STK RAID INT Revision: V1.0 Serial No: Size: 1497.86GB <1497859358208 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 3 Predictive Failure Analysis: 0 I am curious as to why the counter for "Illegal request" goes up all the time. The machine was rebooted ~11 hours ago, and it goes up all the time when I try to use the pool... The machine is a quite powerful one, and top shows no cpu load, no iowait and plenty of available memory. The machine basicly doesn't do anything at the moment, still it can take several minutes to copy a 300mb file from somewhere in the pool to /tmp/... # top last pid: 1383; load avg: 0.01, 0.00, 0.00; up 0+10:47:57 01:39:17 55 processes: 54 sleeping, 1 on cpu CPU states: 99.0% idle, 0.0% user, 1.0% kernel, 0.0% iowait, 0.0% swap Kernel: 193 ctxsw, 3 trap, 439 intr, 298 syscall, 3 flt Memory: 8186M phys mem, 4699M free mem, 2048M total swap, 2048M free swap I thought I might have run into problems described here on the forums with the ARC and fragmentation, but it doesn't seem so: # echo "::arc"|mdb -k hits = 490044 misses = 37004 demand_data_hits = 282392 demand_data_misses = 2113 demand_metadata_hits = 191757 demand_metadata_misses = 21034 prefetch_data_hits = 851 prefetch_data_misses = 10265 prefetch_metadata_hits = 15044 prefetch_metadata_misses = 3592 mru_hits = 73416 mru_ghost_hits = 16 mfu_hits = 401500 mfu_ghost_hits = 24 deleted = 1555 recycle_miss = 0 mutex_miss = 0 evict_skip = 1487 hash_elements = 37032 hash_elements_max = 37045 hash_collisions = 10094 hash_chains = 4365 hash_chain_max = 4 p = 3576 MB c = 7154 MB c_min = 894 MB c_max = 7154 MB size = 1797 MB hdr_size = 8002680 data_size = 1866272256 other_size = 10519712 l2_hits = 0 l2_misses = 0 l2_feeds = 0 l2_rw_clash = 0 l2_read_bytes = 0 l2_write_bytes = 0 l2_writes_sent = 0 l2_writes_done = 0 l2_writes_error = 0 l2_writes_hdr_miss = 0 l2_evict_lock_retry = 0 l2_evict_reading = 0 l2_free_on_write = 0 l2_abort_lowmem = 0 l2_cksum_bad = 0 l2_io_error = 0 l2_size = 0 l2_hdr_size = 0 memory_throttle_count = 0 arc_no_grow = 0 arc_tempreserve = 0 MB arc_meta_used = 372 MB arc_meta_limit = 1788 MB arc_meta_max = 372 MB I then tried to start a scrub, and it seems like it will take forever... It used to take a few hours, now it says it will be done in almost 700 hours: scrub: scrub in progress for 4h43m, 0,68% done, 685h2m to go Does anyone have any clue as to what is happening, and what I can do? If a disk is failing without the OS noticing, it would be nice to find a way to know which drive it is, and get it exchanged before it's too late. All help is apreciated... Yours sincerly, Morten-Christian Bernson -- This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss