Hi,
Can anyone identify whether this is a known issue (perhaps 6667208) and
if the fix is going to be pushed out to Solaris 10 anytime soon? I'm
getting badly beaten up over this weekly, essentially anytime we drop a
packet between our twenty-odd iscsi-backed zones and the filer.
Chris was kind enough to provide his synopsis here (thanks Chris):
http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFailmodeProblem
Also, I really need a workaround for the meantime. Is someone out
there handy enough with the undocumented stuff to recommend a zdb
command or something that will pound the delinquent pool into submission
without crashing everything? Surely there's a pool hard-reset command
somewhere for the QA guys, right?
thx
jake
Chris Siebenmann wrote:
You write:
| Now I'd asked about this some months ago, but didn't get an answer so
| forgive me for asking again: What's the difference between wait and
| continue in my scenario? Will this allow the one faulted pool to fully
| fail and accept that it's broken, thereby allowing me to frob the iscsi
| initiator, re-import the pool and restart the zone? [...]
Our experience here in a similar iscsi-based environment is that
neither 'wait' nor 'continue' will enable the pool to recover, and that
frequently the entire system will eventually hang in a state where no
ZFS pools can be used and the system can't even be rebooted cleanly.
My primary testing has been on Solaris 10 update 6, and I wrote
up the results here:
http://utcc.utoronto.ca/~cks/space/blog/solaris/ZFSFailmodeProblem
I have recently been able to do preliminary testing on Solaris 10
update 8, and it appears to behave more or less the same.
I wish I had better news for you.
- cks
_______________________________________________
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss