> Is your ZFS pool configured with redundancy (e.g mirrors, raidz) or is > it non-redundant? If non-redundant, then there is not much that ZFS > can really do if a device begins to fail.
It's RAID 10 (more info here: http://www.opensolaris.org/jive/thread.jspa?threadID=57425): NAME STATE READ WRITE CKSUM box5 ONLINE 0 0 4 mirror ONLINE 0 0 2 c1d0 ONLINE 0 0 4 c2d0 ONLINE 0 0 4 mirror ONLINE 0 0 2 c2d1 ONLINE 0 0 4 c1d1 ONLINE 0 0 4 Actually, there's no damaged data so far. I don't get any "unable to read/write" kind of errors. It's just very strange checksum errors synchronized over all disks. > That's a bit harsh. ZFS is telling you that you u have corrupted data > based on the checksums. Other types of filesystems would likely simply > pass the corrupted data on silently. Checksums are good, no complaints about that. > Do you have the panic messages? ZFS won't cause panics based on bad > checksums. It will by default cause panic if it can't write data out to > any device or if it completely loses access to non-redundant devices or > loses both redundant devices at the same time. A number of panic messages and crash dump stack trace are attached to the original post (http://www.opensolaris.org/jive/thread.jspa?threadID=57425). Here is the short snip: > ::status debugging crash dump vmcore.5 (64-bit) from core operating system: 5.10 Generic_127112-07 (i86pc) panic message: BAD TRAP: type=e (#pf Page fault) rp=fffffe800017f8d0 addr=238 occurred in module "unix" due to a NULL pointer dereference dump content: kernel pages only > > ::stack mutex_enter+0xb() zio_buf_alloc+0x1a() zio_read+0xba() spa_scrub_io_start+0xf1() spa_scrub_cb+0x13d() traverse_callback+0x6a() traverse_segment+0x118() traverse_more+0x7b() spa_scrub_thread+0x147() thread_start+8() > Since this seems to show the same number of checksum errors across 2 > different channels and 4 different drives. Given that, I'd assume that > this is likely a dual-channel HBA of some sort. It would appear that > you either have bad hardware or some sort of driver issue. You right, this is the dual-channel Intel's ICH6 SATA controller. 10U4 has native support/drivers for this SATA controller (AHCI drivers afaik). The thing is that this hardware and ZFS were in production for almost 2 years (ok, not the best argument). However this problem occurred recently (20 days). It's even more strange because I didn't made any OS/diver upgrade or patch during last 2-3 months. However, this is good point. I've seen some new SATA/AHCI drivers available in 10U5. Maybe I should try to upgrade and see if it helps. Thanks Phil. -- Rustam This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss