> -----Original Message----- > From: Rich, Jason > Sent: Tuesday, September 10, 2013 1:04 PM > To: 'Willy Tarreau' > Cc: linux-kernel@vger.kernel.org > Subject: RE: Panic at _blk_run_queue on 2.6.32 > > > -----Original Message----- > > From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- > > ow...@vger.kernel.org] On Behalf Of Willy Tarreau > > Sent: Wednesday, July 10, 2013 3:27 PM > > To: Rich, Jason > > Cc: linux-kernel@vger.kernel.org > > Subject: Re: Panic at _blk_run_queue on 2.6.32 > > > > Hi Jason, > > > > On Tue, Jul 09, 2013 at 05:42:29PM +0000, Rich, Jason wrote: > > > Greetings, > > > I've recently encountered an issue where multiple hosts are failing > > > to boot up about 1/5 of the time. So far I have confirmed this > > > issue on three > > seperate host machines. The issue presents itself after updating > > 2.6.32.39 to patch 50 and patch 61. > > > Both patch levels result in the failure described below. Since this > > > occurs on > > multiple hosts, I feel I can safely rule out hardware. > > > > First, thank you for your very detailed report. Do you think you could > > narrow this down to a specific kernel version ? Given that there are > > exactly 10 versions between .39 and .50, I think that a version-level > > bisect would take > > 3 or 4 builds (so probably around 20 reboots). > > > > It would help us spot the faulty patch. Right now, there are 546 > > patches between .39 and .50 so it's quite hard to find the culprit, > > even with your full trace. That does not mean we'll immediately spot > > it, maybe a deeper bisect will be needed, but it should be easier. > > > > > It is also of note that I have not seen this behavior on the 3.4.26 > > > kernel, or > > on any of my 32bit hosts. > > > > This is a good news, because we're probably missing one fix from a > > more recent version that addressed a similar regression and that we > > might backport into 2.6.32.62. > > > > > That said, I have to support this software release (which runs on > > > the 2.6 > > kernel) for at least another two years. > > > > Be careful on this point, 2.6.32 is planned for EOL next year : > > > > https://www.kernel.org/category/releases.html > > > > You might want to consider migrating to a supported distro kernel or > > to 3.2 instead. That said, if you follow carefully the updates from > > later kernels, you might prefer to maintain your own backports of the > > patches that are relevant to your usage. > > > > Best regards, > > Willy > > > > Greeting Willy, > You helped me out with this particular issue about 2 months ago. What we > found is that my particular panic appears to be addressed by a specific > commit you referred me to: > b485462 [SCSI] Stop accepting SCSI requests before removing a device > > Without going into too much detail, I'm not able to jump directly to that hash > because I have about 7 different drivers failing to compile due to other > changes between 2.6.32.61 and that hash. In particular, some header files > were renamed, others deleted and replaced by newer features. To go > through and update my proprietary drivers is as big of a headache as just > getting this scsi panic fixed on top of patch 61. > > I've spent the last couple of weeks playing with getting the scsi fix applied > on > top of patch 61 and am having a very difficult time. There are so many > dependencies from prior commits to the scsi code it is making it quite > difficult > to determine what exactly I need. > > I'm hoping you might be able to help me out with some advice or perhaps > you are familiar enough with the scsi code as to help me apply the concept of > the fix to the top of patch 61. I have attached the patch I've come up with > so > far, but this is obviously missing other dependencies as I keep ending up with > panics. It goes without saying that this code is very foreign to me and I > don't > completely understand what it is doing. > > I know your time is valuable so I've attached the patch I've been working on > so far, however, this code causes its own kernel panic and should not be run > on a live system. That said, perhaps it will give you a baseline as to what > I'm > trying to do. Again, this patch is based off on the official 2.6.32.61 tag. > > Thanks for any help, > Jason Rich
Apologies, I had been tweaking that patch file and didn't realize I corrupted it. I deleted a line in the scsi_sysfs.c area of the diff and forgot to update the line numbers. Should be +912,24 (not 25) : +++ linux-2.6.32.new/drivers/scsi/scsi_sysfs.c 2013-09-09 14:01:38.249104690 -0500 @@ -912,16 +912,24 @@ I have attached the corrected patch file. Don't want to waste your time with the old one. Again, apologies. > > > -- > > To unsubscribe from this list: send the line "unsubscribe > > linux-kernel" in the body of a message to majord...@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > Please read the FAQ at http://www.tux.org/lkml/
0001-scsi_panic.patch
Description: 0001-scsi_panic.patch