> > > Hello Matthew, > > > Tuesday, September 12, 2006, 7:57:45 PM, you > > wrote: > > > MA> Ben Miller wrote: > > > >> I had a strange ZFS problem this morning. > The > > entire system would > > >> hang when mounting the ZFS filesystems. After > > trial and error I > > >> determined that the problem was with one of > the > > 2500 ZFS filesystems. > > >> When mounting that users' home the system > would > > hang and need to be > > >> rebooted. After I removed the snapshots (9 of > > them) for that > > >> filesystem everything was fine. > > >> > > >> I don't know how to reproduce this and didn't > get > > a crash dump. I > > >> don't remember seeing anything about this > before > > so I wanted to > > >> report it and see if anyone has any ideas. > > > > MA> Hmm, that sounds pretty bizarre, since I > don't > > think that mounting a > > MA> filesystem doesn't really interact with > snapshots > > at all. > > MA> Unfortunately, I don't think we'll be able to > > diagnose this without a > > MA> crash dump or reproducibility. If it happens > > again, force a crash dump > > MA> while the system is hung and we can take a > look > > at it. > > > > Maybe it wasn't hung after all. I've seen similar > > behavior here > > sometimes. Did your disks used in a pool were > > actually working? > > > > There was lots of activity on the disks (iostat and > status LEDs) until it got to this one filesystem > and > everything stopped. 'zpool iostat 5' stopped > running, the shell wouldn't respond and activity on > the disks stopped. This fs is relatively small > (175M used of a 512M quota). > Sometimes it takes a lot of time (30-50minutes) to > > mount a file system > > - it's rare, but it happens. And during this ZFS > > reads from those > > disks in a pool. I did report it here some time > ago. > > > In my case the system crashed during the evening > and it was left hung up when I came in during the > morning, so it was hung for a good 9-10 hours. > > The problem happened again last night, but for a > different users' filesystem. I took a crash dump > with it hung and the back trace looks like this: > > ::status > debugging crash dump vmcore.0 (64-bit) from hostname > operating system: 5.11 snv_40 (sun4u) > panic message: sync initiated > dump content: kernel pages only > > ::stack > 0xf0046a3c(f005a4d8, 2a100047818, 181d010, 18378a8, > 1849000, f005a4d8) > prom_enter_mon+0x24(2, 183c000, 18b7000, 2a100046c61, > 1812158, 181b4c8) > debug_enter+0x110(0, a, a, 180fc00, 0, 183e000) > abort_seq_softintr+0x8c(180fc00, 18abc00, 180c000, > 2a100047d98, 1, 1859800) > intr_thread+0x170(600019de0e0, 0, 6000d7bfc98, > 600019de110, 600019de110, > 600019de110) > zfs_delete_thread_target+8(600019de080, > ffffffffffffffff, 0, 600019de080, > 6000d791ae8, 60001aed428) > zfs_delete_thread+0x164(600019de080, 6000d7bfc88, 1, > 2a100c4faca, 2a100c4fac8, > 600019de0e0) > thread_start+4(600019de080, 0, 0, 0, 0, 0) > > In single user I set the mountpoint for that user to > be none and then brought the system up fine. Then I > destroyed the snapshots for that user and their > filesystem mounted fine. In this case the quota was > reached with the snapshots and 52% used without. > > Ben
Hate to re-open something from a year ago, but we just had this problem happen again. We have been running Solaris 10u3 on this system for awhile. I searched the bug reports, but couldn't find anything on this. I also think I understand what happened a little more. We take snapshots at noon and the system hung up during that time. When trying to reboot the system would hang on the ZFS mounts. After I boot into single use and remove the snapshot from the filesystem causing the problem everything is fine. The filesystem in question at 100% use with snapshots in use. Here's the back trace for the system when it was hung: > ::stack 0xf0046a3c(f005a4d8, 2a10004f828, 0, 181c850, 1848400, f005a4d8) prom_enter_mon+0x24(0, 0, 183b400, 1, 1812140, 181ae60) debug_enter+0x118(0, a, a, 180fc00, 0, 183d400) abort_seq_softintr+0x94(180fc00, 18a9800, 180c000, 2a10004fd98, 1, 1857c00) intr_thread+0x170(2, 30007b64bc0, 0, c001ed9, 110, 60002400000) 0x985c8(300adca4c40, 0, 0, 0, 0, 30007b64bc0) dbuf_hold_impl+0x28(60008cd02e8, 0, 0, 0, 7b648d73, 2a105bb57c8) dbuf_hold_level+0x18(60008cd02e8, 0, 0, 7b648d73, 0, 0) dmu_tx_check_ioerr+0x20(0, 60008cd02e8, 0, 0, 0, 7b648c00) dmu_tx_hold_zap+0x84(60011fb2c40, 0, 0, 0, 30049b58008, 400) zfs_rmnode+0xc8(3002410d210, 2a105bb5cc0, 0, 60011fb2c40, 30007b3ff58, 30007b56ac0) zfs_delete_thread+0x168(30007b56ac0, 3002410d210, 600009a4778, 30007b56b28, 2a105bb5aca, 2a105bb5ac8) thread_start+4(30007b56ac0, 0, 0, 489a4800000000, d83a10bf28, 5000000000386) Has this been fixed in more recent code? I can make the crash dump available. Ben This message posted from opensolaris.org _______________________________________________ zfs-discuss mailing list zfs-discuss@opensolaris.org http://mail.opensolaris.org/mailman/listinfo/zfs-discuss