On Sun, Nov 25, 2012 at 3:39 PM, Andriy Gapon <a...@freebsd.org> wrote:
> on 25/11/2012 16:01 Attilio Rao said the following:
>> On Sun, Nov 25, 2012 at 12:55 PM, Andriy Gapon <a...@freebsd.org> wrote:
>>> on 25/11/2012 14:29 Attilio Rao said the following:
>>>> I think the patch you propose makes such effects even worse, because
>>>> it disables interrupts in generic_stop_cpus().
>>>> What I suggest to do, is the following:
>>>> - The CPU which wins the race for generic_stop_cpus also signals the
>>>> CPUs it is willing to stop on a global mask
>>>> - Another CPU entering generic_stop_cpus() and loosing the race,
>>>> checks the mask of cpus which might be stopped and stops itself if
>>>> necessary (ie. not yet done). We must be careful with races here, but
>>>> I'm confindent this can be done easily enough.
>>>
>>> I think that you either misunderstood my patch or I misunderstand your
>>> suggestion, because my patch does exactly what you wrote above.
>>
>> The patch is someway incomplete:
>> - I don't think that we need specific checks in cpustop_handler() (and
>> if you have added them to prevent races, I don't think they are
>> enough, see below)
>
> The check is to cover the following scenario:
> - two CPUs race in hard-stop scenario, and both are in NMI contexts

Please consider also another type of (similar race): two CPUs race in
(normal) stop scenario and the loser is with interrupt disabled. I
don't this will be completely fatal, but I'm sure it can lead to
issues (double stop, etc.).

> - one CPU wins and the other does spinning right in generic_stop_cpus
> - NMI for the spinning CPU is blocked and remains pending
> - the winning CPU releases all other CPUs
> - the spinning CPU exits the NMI context and get the NMI which was pending
>
>> - setting of "stopping_cpus" map must happen atomically/before the
>> stopper_cpu cpuid setting, otherwise some CPUs may end up using a NULL
>> mask in the check
>
> Not NULL, but empty or stale.  But a very good point, I agree.
> The logic must be redone.
>
>> - Did you consider the races about when a stop and restart request
>> happen just after the CPU_ISSET() check? I think CPUs can deadlock
>> there.
>
> Yeah, good point again.  This seems to be a different side of the issue above.
> stopping_cpus is probably a bad idea.
>
>> - I'm very doubious about the spinlock_enter() stuff, I think I can
>> just make the problem worse atm.
>
> Well, this is where I disagree.  I think that cpu_stop must already be called 
> in
> context which effectively disable pre-emption and interrupts.
> The added spinlock_enter() stuff is kind of a safety belt to make things more
> explicit, but it could be changed into some sort of an assert if that's 
> possible.

Maybe it can be come really safe once we take care of all the races
involved and reported above. I reserve to suspend my judgement until
we don't care of the other races.

>
>> However you are right, the concept of your patch is the same I really
>> wanted to get, we maybe need to just lift it up a bit.
>
> I agree.
>
> To add some gas to the fire.  Do you recall my wish to drop the mask parameter
> from the stop calls?  If that was done then it would be simpler to handle 
> these
> things.  In that case only "stopper_cpu" ID (master/winner) would be needed.

If you really want to do something like that please rename
s/generic_stop_cpus/generic_stop_butself() or similar convention and I
may be not opposed to it.

Thanks,
Attilio


-- 
Peace can only be achieved by understanding - A. Einstein
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"

Reply via email to