This also looks to me like a bug I recently fixed in illumos for
sata... bug 896 for illumos. The problem there is that a reset on the
SATA bus can cause bad things to happen.
I think you need to update your code.
- Garrett
On 04/28/11 09:25 AM, Jason wrote:
On Thu, Apr 28, 2011 at 11:00 AM, Jason Herring <jaherr...@usa.net
<mailto:jaherr...@usa.net>> wrote:
More fun. I NFS mounted the VM from a Linux box and went to start
the VM - this also panicked the *opensolaris* kernel! Maybe this
is something to do with the network stack? Though I see a lot of
references to sata/scsi and si3124 (sata controller card in
machine) in the panic:
Apr 28 08:52:54 atlantis unix: [ID 836849 kern.notice]
Apr 28 08:52:54 atlantis ^Mpanic[cpu0]/thread=ffffff00064cbc60:
Apr 28 08:52:54 atlantis genunix: [ID 103648 kern.notice]
recursive mutex_enter, lp=ffffff019e1fc1e8 owner=ffffff00064cbc60
thread=ffffff00064cbc60
Apr 28 08:52:54 atlantis unix: [ID 100000 kern.notice]
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cb560 unix:mutex_panic+73 ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cb5c0 unix:mutex_vector_enter+190 ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cb690 si3124:si_mop_commands+6e ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cb700 si3124:si_reject_all_reset_pkts+7c ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cb760 si3124:si_tran_reset_dport+9b ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cb7d0 sata:sata_scsi_reset+ab ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cb800 scsi:scsi_reset+52 ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cb870 sd:sd_sense_key_medium_or_hardware_error+fb ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cb8d0 sd:sd_decode_sense+e5 ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cb930 sd:sd_handle_auto_request_sense+100 ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cb980 sd:sdintr+145 ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cb9b0 scsi:scsi_hba_pkt_comp+15c ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cba00 sata:sata_txlt_rw_completion+1d3 ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cbad0 si3124:si_mop_commands+401 ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cbb40 si3124:si_intr_command_error+f7 ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cbbb0 si3124:si_intr+227 ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cbc00 unix:av_dispatch_autovect+7c ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff00064cbc40 unix:dispatch_hardint+33 ()
Apr 28 08:52:54 atlantis genunix: [ID 655072 kern.notice]
ffffff0006405aa0 unix:switch_sp_and_call+13 ()
Apr 28 08:52:54 atlantis unix: [ID 100000 kern.notice]
Apr 28 08:52:54 atlantis genunix: [ID 672855 kern.notice] syncing
file systems...
Apr 28 08:52:54 atlantis genunix: [ID 904073 kern.notice] done
Apr 28 08:52:55 atlantis genunix: [ID 111219 kern.notice] dumping
to /dev/zvol/dsk/rpool/dump, offset 65536, content: kernel
Apr 28 08:53:03 atlantis genunix: [ID 100000 kern.notice]
Apr 28 08:53:03 atlantis genunix: [ID 665016 kern.notice] ^M100%
done: 129259 pages dumped,
Apr 28 08:53:03 atlantis genunix: [ID 851671 kern.notice] dump
succeeded
I need to get this VM up and running - any thoughts? I might have
to go to Linux for this server if I can't get this figured out and
I'd rather not do that.
Based on the stack trace, it appears the adapter is encountering an
error from the HBA, which it is handling incorrectly (rather the
interrupt code locks something, which then calls code that tries to
lock the same thing again, causing the panic). This is probably one
of: 6358757, 6957964, or 6959541 (since bugs.opensolaris.org
<http://bugs.opensolaris.org> is down, I cannot tell from the synopsis
which one it is). b145 or later appears to fix the issue (based on
inspecting the source for the driver). I don't know of any immediate
workarounds. You could try booting the latest openindiana iso or
whatever Oracle is calling the latest Solaris 11 preview to get at
your data, as they should have fixed drivers. You _might_ if really
desparate try booting + copying the si3124 driver from them over (but
there is no guarantee it would work -- sometimes you get lucky,
sometimes you don't. I would suggest keeping a copy of the original
handy).
_______________________________________________
opensolaris-code mailing list
opensolaris-code@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code
_______________________________________________
opensolaris-code mailing list
opensolaris-code@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/opensolaris-code