Thanks for reporting this. I have fixed this bug (6822816) in build 127. Here is the evaluation from the bug report:

The problem is that the clone's dsobj does not appear in the origin's ds_next_clones_obj. The bug can occur can occur under certain circumstances if there was a "botched upgrade" when doing "zpool upgrade" from pool version 10 or earlier to version 11 or later, while there was a clone in the pool.

The problem is caused because upgrade_clones_cb() failed to call dmu_buf_will_dirty(origin->ds_dbuf).

This bug can have several effects:

1. assertion failure from dsl_dataset_destroy_sync()
2. assertion failure from dsl_dataset_snapshot_sync()
3. assertion failure from dsl_dataset_promote_sync()
4. incomplete scrub or resilver, potentially leading to data loss

The fix will address the root cause, and also work around all of these issues on pools that have already experienced the botched upgrade, whether or not they have encountered any of the above effects.

Anyone who may have a botched upgrade should run "zpool scrub" after upgrading to bits with the fix in place (build 127 or later).

--matt

Albert Chin wrote:
Running snv_114 on an X4100M2 connected to a 6140. Made a clone of a
snapshot a few days ago:
  # zfs snapshot a...@b
  # zfs clone a...@b tank/a
  # zfs clone a...@b tank/b

The system started panicing after I tried:
  # zfs snapshot tank/b...@backup

So, I destroyed tank/b:
  # zfs destroy tank/b
then tried to destroy tank/a
  # zfs destroy tank/a

Now, the system is in an endless panic loop, unable to import the pool
at system startup or with "zpool import". The panic dump is:
  panic[cpu1]/thread=ffffff0010246c60: assertion failed: 0 == zap_remove_int(mos, 
ds_prev->ds_phys->ds_next_clones_obj, obj, tx) (0x0 == 0x2), file: 
../../common/fs/zfs/dsl_dataset.c, line: 1512

  ffffff00102468d0 genunix:assfail3+c1 ()
  ffffff0010246a50 zfs:dsl_dataset_destroy_sync+85a ()
  ffffff0010246aa0 zfs:dsl_sync_task_group_sync+eb ()
  ffffff0010246b10 zfs:dsl_pool_sync+196 ()
  ffffff0010246ba0 zfs:spa_sync+32a ()
  ffffff0010246c40 zfs:txg_sync_thread+265 ()
  ffffff0010246c50 unix:thread_start+8 ()

We really need to import this pool. Is there a way around this? We do
have snv_114 source on the system if we need to make changes to
usr/src/uts/common/fs/zfs/dsl_dataset.c. It seems like the "zfs
destroy" transaction never completed and it is being replayed, causing
the panic. This cycle continues endlessly.


_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Reply via email to