Hello Francesco, On Wed, Nov 13, 2024 at 11:15:03PM +0100, Francesco Poli wrote: > On Mon, 11 Nov 2024 11:22:26 +0100 Uwe Kleine-König wrote: > > [...] > > Hello, > > Hi Uwe, thanks for your followup. > > > > > On Thu, Oct 31, 2024 at 07:53:52PM +0100, Francesco Poli (wintermute) wrote: > [...] > > > I filed this bug report against the Debian Linux kernel, in order > > > to warn other users about this issue, and in order to ask the Debian > > > Kernel Team to investigate the issue and/or to forward the bug report > > > to the relevant upstream Linux kernel maintainers. > > > > > > Please do not reassign to package opensm with the intention of > > > merging with bug [#1085300], unless you know for sure that the > > > issue is in opensm and you know how to fix it. > > > > Please do not report multiple bugs for the same issue. The right(er) > > thing to do is to make use of "affects". Now there are three bug reports > > (2 for Debian and one upstream) and someone being aware of only one (or > > two) of them, might miss some action which results in duplicate work. > > You are right, the "affects" field is the most appropriate means to > show that a bug report against a given package also affects other > packages. > > However, in this case, the lack of replies from opensm maintainers made > me doubtful about the best possible course of action. Sorry about that. > > > > > > Please help, I would very much like to run the head node with > > > an up-to-date kernel! > > > > This is hard to act on without further input. Some questions to debug > > this: > > > > I guess the kernel provides a directory "/sys/class/infiniband_mad". Do > > its contents look different on 6.10.x and 6.11.x? > > I will look into this as soon as I can reboot the cluster head node. > > > > > Can you please bisect the problem? > [...] > > I have to find a time window where I can perform multiple reboots, > which can result in a non-working InfiniBand network... It won't be > easy, since the cluster has entered production and users keep launching > jobs. > > Anyway, what I have done so far is: I have tried and rebuilt a Linux > kernel image Debian package, following your instructions. > After some failed attempts (due to missing dependencies and/or required > tools), I think I succeeded, but I had to reply to a number of > questions during the procedure: I have always replied with the default > answer (by hitting [Enter]), I hope that was the right thing to do!
Yes, that sounds right. I wouldn't have expected that questions are asked, but that's a problem with my expectations, not your following of my instructions. yes '' | make localmodconfig would be the better recommendation it seems. > Before I go on and try to install the resulting Debian package, could > you please review the transcript of what I did (see the attached file)? Looks good. Probably the individual answers don't matter much and the default should be fine. Just continue with my instructions and if the resulting kernels boots and behave as the respective versions packaged by Debian, everything is fine. Iff that fails, a more detailed review is needed. Best regards Uwe
signature.asc
Description: PGP signature