On Sun, Nov 25, 2012 at 3:39 PM, Andriy Gapon <a...@freebsd.org> wrote: > on 25/11/2012 16:01 Attilio Rao said the following: >> On Sun, Nov 25, 2012 at 12:55 PM, Andriy Gapon <a...@freebsd.org> wrote: >>> on 25/11/2012 14:29 Attilio Rao said the following: >>>> I think the patch you propose makes such effects even worse, because >>>> it disables interrupts in generic_stop_cpus(). >>>> What I suggest to do, is the following: >>>> - The CPU which wins the race for generic_stop_cpus also signals the >>>> CPUs it is willing to stop on a global mask >>>> - Another CPU entering generic_stop_cpus() and loosing the race, >>>> checks the mask of cpus which might be stopped and stops itself if >>>> necessary (ie. not yet done). We must be careful with races here, but >>>> I'm confindent this can be done easily enough. >>> >>> I think that you either misunderstood my patch or I misunderstand your >>> suggestion, because my patch does exactly what you wrote above. >> >> The patch is someway incomplete: >> - I don't think that we need specific checks in cpustop_handler() (and >> if you have added them to prevent races, I don't think they are >> enough, see below) > > The check is to cover the following scenario: > - two CPUs race in hard-stop scenario, and both are in NMI contexts
Please consider also another type of (similar race): two CPUs race in (normal) stop scenario and the loser is with interrupt disabled. I don't this will be completely fatal, but I'm sure it can lead to issues (double stop, etc.). > - one CPU wins and the other does spinning right in generic_stop_cpus > - NMI for the spinning CPU is blocked and remains pending > - the winning CPU releases all other CPUs > - the spinning CPU exits the NMI context and get the NMI which was pending > >> - setting of "stopping_cpus" map must happen atomically/before the >> stopper_cpu cpuid setting, otherwise some CPUs may end up using a NULL >> mask in the check > > Not NULL, but empty or stale. But a very good point, I agree. > The logic must be redone. > >> - Did you consider the races about when a stop and restart request >> happen just after the CPU_ISSET() check? I think CPUs can deadlock >> there. > > Yeah, good point again. This seems to be a different side of the issue above. > stopping_cpus is probably a bad idea. > >> - I'm very doubious about the spinlock_enter() stuff, I think I can >> just make the problem worse atm. > > Well, this is where I disagree. I think that cpu_stop must already be called > in > context which effectively disable pre-emption and interrupts. > The added spinlock_enter() stuff is kind of a safety belt to make things more > explicit, but it could be changed into some sort of an assert if that's > possible. Maybe it can be come really safe once we take care of all the races involved and reported above. I reserve to suspend my judgement until we don't care of the other races. > >> However you are right, the concept of your patch is the same I really >> wanted to get, we maybe need to just lift it up a bit. > > I agree. > > To add some gas to the fire. Do you recall my wish to drop the mask parameter > from the stop calls? If that was done then it would be simpler to handle > these > things. In that case only "stopper_cpu" ID (master/winner) would be needed. If you really want to do something like that please rename s/generic_stop_cpus/generic_stop_butself() or similar convention and I may be not opposed to it. Thanks, Attilio -- Peace can only be achieved by understanding - A. Einstein _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"