On 12/10/2012 02:43 AM, Oleg Nesterov wrote:
> Damn, sorry for noise. I missed this part...
> 
> On 12/10, Srivatsa S. Bhat wrote:
>>
>> On 12/10/2012 12:44 AM, Oleg Nesterov wrote:
>>> the latency. And I guess something like kick_all_cpus_sync() is "too heavy".
>>
>> I hadn't considered that. Thinking of it, I don't think it would help us..
>> It won't get rid of the currently running preempt_disable() sections no?
> 
> Sure. But (again, this is only my feeling so far) given that 
> get_online_cpus_atomic()
> does cli/sti,

Ah, that one! Actually, the only reason I do that cli/sti is because, 
potentially
interrupt handlers can be hotplug readers too. So we need to protect the portion
of the code of get_online_cpus_atomic() which is not re-entrant.
(Which reminds me to try and reduce the length of cli/sti in that code, if 
possible).

> this can help to implement ensure-the-readers-must-see-the-pending-writer.
> IOW this might help to implement sync-with-readers.
> 

2 problems:

1. It won't help with cases like this:

   preempt_disable()
        ...
        preempt_disable()
                ...
                        <------- Here
                ...
        preempt_enable()
        ...
   preempt_enable()

If the IPI hits at the point marked above, the IPI is useless, because, at
that point, since we are already in a nested read-side critical section, we 
can't
switch the synchronization protocol. We need to wait till we start a fresh
non-nested read-side critical section, in order to switch to global rwlock.
The reason is that preempt_enable() or put_online_cpus_atomic() can only undo
what its predecessor (preempt_disable()/get_online_cpus_atomic()) did.

2. Part of the reason we want to get rid of stop_machine() is to avoid the
latency it induces on _all_ CPUs just to take *one* CPU offline. If we use
kick_all_cpus_sync(), we get into that territory again : we unfairly interrupt
every CPU, _even when_ that CPU's existing preempt_disabled() sections might
not actually be hotplug readers! (ie., not bothered about CPU Hotplug).

So, I think whatever synchronization scheme we develop, must not induce the very
same problems that stop_machine() had. Otherwise, we can end up going a 
full-circle
and coming back to the same stop_machine() scenario that we intended to get rid 
of.

(That's part of the reason why I initially tried to provide that _light() 
variant
of the reader APIs in the previous version of the patchset, so that light 
readers
can remain as undisturbed from cpu hotplug as possible, thereby avoiding 
indirectly
inducing the "stop_machine effect", like I mentioned here:
https://lkml.org/lkml/2012/12/5/369)

Regards,
Srivatsa S. Bhat

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to