On Aug 15, 2013, at 2:39 PM, Rick Macklem <rmack...@uoguelph.ca> wrote:

> Michael Tratz wrote:
>> 
>> On Jul 27, 2013, at 11:25 PM, Konstantin Belousov
>> <kostik...@gmail.com> wrote:
>> 
>>> On Sat, Jul 27, 2013 at 03:13:05PM -0700, Michael Tratz wrote:
>>>> Let's assume the pid which started the deadlock is 14001 (it will
>>>> be a different pid when we get the results, because the machine
>>>> has been restarted)
>>>> 
>>>> I type:
>>>> 
>>>> show proc 14001
>>>> 
>>>> I get the thread numbers from that output and type:
>>>> 
>>>> show thread xxxxx
>>>> 
>>>> for each one.
>>>> 
>>>> And a trace for each thread with the command?
>>>> 
>>>> tr xxxx
>>>> 
>>>> Anything else I should try to get or do? Or is that not the data
>>>> at all you are looking for?
>>>> 
>>> Yes, everything else which is listed in the 'debugging deadlocks'
>>> page
>>> must be provided, otherwise the deadlock cannot be tracked.
>>> 
>>> The investigator should be able to see the whole deadlock chain
>>> (loop)
>>> to make any useful advance.
>> 
>> Ok, I have made some excellent progress in debugging the NFS
>> deadlock.
>> 
>> Rick! You are genius. :-) You found the right commit r250907 (dated
>> May 22) is the definitely the problem.
>> 
>> Here is how I did the testing: One machine received a kernel before
>> r250907, the second machine received a kernel after r250907. Sure
>> enough within a few hours the machine with r250907 went into the
>> usual deadlock state. The machine without that commit kept on
>> working fine. Then I went back to the latest revision (r253726), but
>> leaving r250907 out. The machines have been running happy and rock
>> solid without any deadlocks. I have expanded the testing to 3
>> machines now and no reports of any issues.
>> 
>> I guess now Konstantin has to figure out why that commit is causing
>> the deadlock. Lovely! :-) I will get that information as soon as
>> possible. I'm a little behind with normal work load, but I expect to
>> have the data by Tuesday evening or Wednesday.
>> 
> Have you been able to pass the debugging info on to Kostik?
> 
> It would be really nice to get this fixed for FreeBSD9.2.
> 
> Thanks for your help with this, rick

Sorry Rick, I wasn't able to get you guys that info quickly enough. I thought I 
would have enough time, before my own wedding and honeymoon came along, but 
everything went a little crazy and stressful. I didn't think it would be this 
nuts. :-)

I'm caught up with everything and from what I can see from the discussions is 
that we know now what the problem is.

I can report that the machines which I have had without r250907 have been 
running without any problems for 27+ days.

If you need me to test any new patches, please let me know. If I should test 
with the partial merge of r253927 I'll be happy to do so.

Thanks,

Michael




_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "freebsd-stable-unsubscr...@freebsd.org"

Reply via email to