Hi Pierre,

Thanks for the good explanation, will try to respond inline below:

On 7 dec. 2015, at 23:10, Pierre Habouzit 
<phabou...@apple.com<mailto:phabou...@apple.com>> wrote:

Hi Joakim, Kevin,

[ Full disclosure, I made that reply in rdar://problem/16436943 and your use 
case was slightly different IIRC but you’re right it’s a close enough problem ]

Dispatch internally has a notion of something that does almost that, called 
_dispatch_barrier_trysync_f[1]. However, it is used internally to serialize 
state changes on sources and queues such as setting the target queue or event 
handlers.

The problem is that this call bypasses the target queue hierarchy in its 
fastpath, which while it’s correct when changing the state of a given source or 
queue, is generally the wrong thing to do. Let’s consider this code assuming 
the dispatch_barrier_trysync()


    dispatch_queue_t outer = dispatch_queue_create("outer", NULL);
    dispatch_queue_t inner = dispatch_queue_create("inner", NULL);
    dispatch_set_target_queue(outer, inner);

    dispatch_async(inner, ^{
        // write global state protected by inner
    });
    dispatch_barrier_trysync(outer, ^{
        // write global state protected by inner
    });


Then if it works like the internal version we have today, the code above has a 
data race, which we’ll all agree is bad.
Or we do an API version that when the queue you do the trysync on is not 
targetted at a global root queue always go through async, and that is weird, 
because the performance characteristics would completely depend on the target 
queue hierarchy, which when layering and frameworks start to be at play, is a 
bad characteristic for a good API.

Yes, we could currently assume that we only targeted a root queue for our use 
case, so our implementation has this limitation (so it is not a valid general 
solution as you say).

It would perhaps be a bit strange to have different performance characteristics 
depending on the target queue hierarchy as you say, but there are already some 
performance differences in actual behavior if using e.g. an overcommit queue vs 
a non, so perhaps another option would be to have this as an optional queue 
attribute instead of an additional generic API (queue attribute ’steal calling 
thread for inline processing of requests if the queue was empty when 
dispatching’) …?


Or we don’t give up right away when the hierarchy is deep, but then that means 
that dispatch_trysync would need to be able to unwind all the locks it took, 
and then you have ordering issues because enqueuing that block that couldn’t 
run synchronously may end up being after another one and break the FIFO 
ordering of queues. Respecting this which is a desired property of our API and 
getting an efficient implementation are somehow at odds.

Yes, agree it is a desirable property of the API to retain the ordering.

The other argument against trysync that way, is that during testing trysync 
would almost always go through the non contended codepath, and lead developers 
to not realize that they should have taken copies of variables and the like 
(this is less of a problem on Darwin with obj-c and ARC), but trysync running 
on the same thread will hide that. except that once it starts being contended 
in production, it’ll bite you hard with memory corruption everywhere.

Less of an issue for us as we depend on the _f interfaces throughout due to 
portability concerns, but fair point.

Technically what you’re after is that bringing up a new thread is very costly 
and that you’d rather use the one that’s asyncing the request because it will 
soon give up control. The wake up of a queue isn’t that expensive, in the sense 
that the overhead of dispatch_sync() in terms of memory barriers and locking is 
more or less comparable. What’s expensive is creating a thread to satisfy this 
enqueue.

Yes, in fact, bringing up a new thread is so costly that we keep a pool around 
in the libpwq implementation. Unfortunately we would often see double-digit 
microsecond latency incurred by this, which is unacceptable for us, so we had 
to (for some configurations/special deployments) have a dedicated spin thread 
that will grab the next queue to work on (that cut down the latency with a 
factor of 10 or so) and the next thread woken from the thread pool would take 
over a spinner…

In my opinion, to get the win you’re after, you’d rather want an async() 
version that if it wakes up the target queue hierarchy up to the root then you  
want to have more resistance in bringing up a new thread to satisfy that 
request. Fortunately, the overcommit property of queues could be used by a 
thread pool to decide to apply that resistance. There are various parts of the 
thread pool handling (especially without kernel work queues support) that could 
get some love to get these exact benefits without changing the API.

That would indeed be a very interesting idea, the problem is that the thread 
using ‘dispatch_barrier_trysync’ is not returning to the pthread_workqueue pool 
to grab the next dispatch queue for processing, but is instead going back to 
block on a syscall (e.g. read() from a socket) - and even the latency to wake 
up a thread (as is commonly done now) with mutex/condition signaling is way too 
slow for the use case we have (thus the very ugly workaround with a spin thread 
for some deployments).

Essentially, for these kind of operations we really want to avoid all context 
switches as long as we can keep up with the rate of inbound data, and in 
general such dynamics would be a nice property to have - if the thread 
performing the async call was known to always return to the global pwq thread 
pool, it would be nicely solved by applying resistance as you suggest, the 
problem is what to do when it gets blocked and you thus get stuck.

Perhaps we have to live with the limited implementation we have for practical 
purposes, but I have the feeling that the behavior we are after would be useful 
for other use cases, perhaps the queue attribute suggested above could be 
another way of expressing it without introducing new dispatch API.

Cheers,

Joakim


________________________________

This e-mail is confidential and may contain legally privileged information. It 
is intended only for the addressees. If you have received this e-mail in error, 
kindly notify us immediately by telephone or e-mail and delete the message from 
your system.
_______________________________________________
swift-corelibs-dev mailing list
swift-corelibs-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-corelibs-dev

Reply via email to