On Thu, Nov 29, 2018 at 06:07:05PM +0000, Mark Cave-Ayland wrote: > On 29/11/2018 17:38, Guenter Roeck wrote: > > >> This patch is very interesting, as I have a long-running regression trying > >> to boot > >> NextSTEP 3.3 on qemu-system-sparc which I eventually bisected down to the > >> commit that > >> turned on iothread by default in QEMU. > >> > >> The symptom is that ESP SCSI requests hang/timeout before the kernel is > >> able to get > >> to the userspace installer: however if you launch QEMU with "taskset > >> –cpu-list 1 > >> qemu-system-sparc ..." then it works and you can complete the installation. > >> > >> So certainly this suggests that there is a race condition still present in > >> ESP > >> somewhere. I've given this patch a spin, and in a few quick tests here I > >> was able to > >> consistently get further in kernel boot, but it still doesn't completely > >> solve issue > >> for me :/ > >> > > > > Can you try the attached patch ? It is a bit cleaner than the first version, > > and works for me as well. > > > > Note that this isn't perfect. Specifically, I see differences in handling > > STAT_TC. The controller specification is a bit ambiguous in that regard, > > but comparing the qemu code with real controller behavior shows that the > > real controller does not reset STAT_TC when reading the interrupt status > > register. That doesn't seem to matter for Linux, but it may influence > > other guests. > > Hi Guenter, > > Thanks for the patch. I just gave it a quick test, and unfortunately my > NextSTEP ISO > still hangs in the same place on boot :( > Too bad. Is it "same place" as with the first version of the patch, or "same place" as in upstream qemu ? That might be important, as the two patch versions behave differently (one caches RSTAT/RINTR/RSEQ, one defers command complete handling).
> Not sure if it helps, but attached is a simple trace backend log from "-trace > 'esp*'" > from startup all the way to the point where the kernel hangs on boot whilst > enumerating the SCSI bus (it does seem to hang at random points in the bus > enumeration process). > This is interesting; yours seems to be a different problem. I don't see any command_complete_deferred traces in your log. I also don't see any suspicious activity between esp_raise_irq and esp_lower_irq. Can you try tracing in singlethreaded mode ? Maybe that can help us finding the difference. Thanks, Guenter