It is possible that the change I MFCed today (r321207 in head, r321415 in 
stable/11) is related, but Mark will have to boot his machine with the fix to 
see if it makes any difference.

What happened in my case on one particular machine (not on most machines in our 
lab running the same code) was that mps_wait_command() / mpr_wait_command() 
would not wait the full 60 seconds for a write to the DPM table (Driver 
Persistent Mapping) table in the controller.  So, it reported that there was a 
timeout.

There is a secondary bug that is still in the mps(4) / mpr(4) drivers when a 
timeout does happen — the error recovery code in the wait_command() routine 
reinitializes the controller, which clears out all the commands.  When the 
wait_command() routine returns, the command passed in has been freed, but the 
caller doesn’t know that.  So the caller (it happens in a number of places) 
dereferences a pointer to freed memory and the kernel panics.

I’m planning to fix that bug, too, if slm@ doesn’t get to it first, I’ve just 
had other bugs to fix first.

Eliminating bogus timeouts will eliminate most all of the sources of those 
panics anyway.

Ken
— 
Ken Merry
k...@freebsd.org



> On Jul 24, 2017, at 12:10 PM, Steven Hartland <kill...@multiplay.co.uk> wrote:
> 
> Based on your boot info you're using mps, so this could be related to mps fix 
> committed to stable/11 today by ken@
> https://svnweb.freebsd.org/changeset/base/321415 
> <https://svnweb.freebsd.org/changeset/base/321415>
> 
> re@ cc'ed as this could cause hangs for others too on 11.1-RELEASE if this is 
> the case.
> 
>     Regards
>     Steve
> 
> On 24/07/2017 15:55, Mark Martinec wrote:
>>> Thanks! Tried it, and the message (or a backtrace) does not show 
>>> during a boot of a generic (patched) kernel, at least not in 
>>> the last 40-lines screen before the hang occurs. 
>>> (It also does not show during a "Safe mode" successful boot.) 
>> 
>> Btw (may or may not be relevant): after the above experiment 
>> I have rebooted the machine in "Safe mode" (generic kernel, 
>> EARLY_AP_STARTUP enabled by default) - and spent some time 
>> doing non-intensive interactive work on this host (web browsing, 
>> editor, shell, all under KDE) - and after about an hour the 
>> machine froze: clock display not updating, keyboard unresponsive, 
>> console virtual terminals inaccessible) - so had to reboot. 
>> According to fans speed the machine was idle. 
>> The /var/log/messages does not show anything of interest 
>> before the freeze. All disks are under ZFS. 
>> 
>> Can EARLY_AP_STARTUP have an effect also _after_ booting? 
>> This host never hung during normal work when EARLY_AP_STARTUP 
>> was disabled (or with 11.0 and earlier). 
>> 
>>   Mark 
>> _______________________________________________ 
>> freebsd-stable@freebsd.org <mailto:freebsd-stable@freebsd.org> mailing list 
>> https://lists.freebsd.org/mailman/listinfo/freebsd-stable 
>> <https://lists.freebsd.org/mailman/listinfo/freebsd-stable> 
>> To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org" 
>> <mailto:freebsd-stable-unsubscr...@freebsd.org> 
> 

_______________________________________________
freebsd-stable@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to