Victor,

> Well, since we are talking about ZFS any thread somewhere in ZFS module are
> interesting, and there should not be too many of them. Though in this case
> it is clear - it is trying to update config object and waits for the update
> to sync. There should be another thread with stack similar to this:
>
> genunix:cv_wait()
> zfs:zio_wait()
> zfs:dbuf_read()
> zfs:dmu_buf_will_dirty()
> zfs:dmu_write()
> zfs:spa_sync_nvlist()
> zfs:spa_sync_config_object()
> zfs:spa_sync()
> zfs:txg_sync_thread()
> unix:thread_start()
>

Ok, this would be the thread you're referring to:

ffffff000746ec80 fffffffffbc287b0                0   0  60 ffffff01b53ebd88
  PC: _resume_from_idle+0xf1    THREAD: txg_sync_thread()
  stack pointer for thread ffffff000746ec80: ffffff000746e860
  [ ffffff000746e860 _resume_from_idle+0xf1() ]
    swtch+0x17f()
    cv_wait+0x61()
    zio_wait+0x5f()
    dbuf_read+0x1b5()
    dbuf_will_dirty+0x3d()
    dmu_write+0xcd()
    spa_sync_nvlist+0xa7()
    spa_sync_config_object+0x71()
    spa_sync+0x20b()
    txg_sync_thread+0x226()
    thread_start+8()

As you say, there are more zfs/spa/txg-threads that end up in the same
wait-state.

> It wait due to checksum error detected while reading old config object from
> disk (call to dmu_read() above). It means that all ditto-blocks of config
> object got corrupted. On Solaris 10 there's no

Now this is getting interesting. Is there any chance to recover from
this scenario?

>>> Btw, why does timestamp on your uberblock show July 1?
>>
>> Well, this is about the time when the crash happened. The clock on the
>> server is correct.
>
> Wow! Why did you wait almost two months?
There has been a lot on reading up about zfs and finding local
recovery-experts. But yes, of course I should have posted to this
mailing-list earlier.

Again, thanks Victor!

Regards
Erik
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to