(Konrad, this looks potentially swiotlb like, what do you think? Full bug log is at http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=596419 )
> On Sun, 2010-09-12 at 07:58 +0200, Artur Linhart - Linux communication wrote: Even after the downgrade of kernel and of the corresponding files to the > version 2.6.32-18 and downgrade of mdadm the problem still persists, so it > is not bound specificallz to this package and to this version. Thanks Artur, if possible can you reproduce with a serial console connected so that you can capture precise logs? If not then it might be worth posting a digital photograph of the stack trace somewhere -- there is often a bunch of useful information preceding the actual stack trace, you can usually use Shift-PgUp to go back to earlier messages. Also, can you look in /var/log/kern.log* and see if any of the errors made it there, which is possible if your root partition isn't on the same device that is failing. The reference to aac_build_sgraw and BUG at drivers/scsi/aacraid/aachba.c:2825 both point to nseg = scsi_dma_map(scsicmd); BUG_ON(nseg < 0); scsi_dma_map turns into dma_map_sg which in turn probably goes via SWIOTLB on Xen but possibly does not when running under native. Perhaps your system is running out of TLB memory and adding "swiotlb=<NN>" to the command line will help? You should see a log message on boot telling you how big the swiotlb is at the moment, perhaps try doubling it? I'm not sure but I think the default is 64M which == 32768 slabs, perhaps try swiotlb=65536? I'm not aware of any swiotlb related fixes going into xen.git since e73f4955a821f850f5b88c32d12a81714523a95f, which is what package 2.6.32-21 contains. I'm not sure why any of this would tie in with shutting down domains though. Ian. > I have identified now (after the downgrades to 2.6.30-18) the following > initial stack trace (some lines are missing from the top, I think, they were > no longer on the screen): > > [<....>] ? bio_alloc_bioset+0x45/0xb7 > [<....>] ? submit_bio+0xd6/0xf2 > [<....>] ? md_super_write+0x84/0xb2 [md_mod] > [<....>] ? xen_restore_fl_direct_end+0x0/0x1 > [<....>] ? md_update_sb+0x268/0x31e > [<....>] ? md_check_recovery+0x1e2/0x4b9 [md_mod] > [<....>] ? raid1d+0x42/0xe0b [raid1] > [<....>] ? finish_task_switch+0x44/0xaf > [<....>] ? schedule_timeout+0x2e/0xdd > [<....>] ? xen_restore_fl_direct_end+0x0/0x1 > [<....>] ? xen_force_evtchn_callback+0x9/0xa > [<....>] ? check_events+0x12/0x20 > [<....>] ? xen_restore_fl_direct_end+0x0/0x1 > [<....>] ? md_thread+0xf1/0x10f [md_mod] > [<....>] ? autoremove_wake_function+0x0/0x2e > [<....>] ? md_thread+0x0/0x10f [md_mod] > [<....>] ? kthread+0x79/0x01 > [<....>] ? child_rip+0xa/0x20 > [<....>] ? int_ret_from_szs_call+0x7/0x1b > [<....>] ? retinit_restore_args+0x5/0x6 > [<....>] ? xen-restore-fl-direct-end+0x0/0x1 > [<....>] ? xen-restore-fl-direct-end+0x0/0x1 > [<....>] ? child_rip+0x0/0x20 > Code: 00 00 c7 46 0c 00 00 00 00 c7 46 10 00 00 00 00 c7 46 14 00 > 00 00 00 c7 46 18 00 00 00 00 e8 10 63 fa ff 83 f8 00 41 89 c6 7d 04 <0f> 0b > eb > fe 75 08 45 31 e4 e9 9c 00 00 00 49 8b 7f 58 48 89 eb > RIP [<....>] aac_build_sgraw+0x51/0x10a [aacraid] > RSP <ffff88003cd998e0> > --- [ end trace .... ] --- > > Now also this stack trace stays on the screen and nothing happens also after > very long time (1 hour) > > > > -- Ian Campbell Current Noise: Raise Hell - Rising Love is sentimental measles. -- To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/1284368952.14311.14312.ca...@zakaz.uk.xensource.com