Do you have any more specific information on mutex contention? We have in fact already looked at doing this, I think back in 2015 with Dan Xu. The goal there was to have queues with the mutexes to avoid rescheduling. As a precursor Dan did some profiling and the only significant lock contention he could find was in the cache. That lead to the partial object caching work setting up queues for the hot locks, but it was decided that given the lack of contention elsewhere, it wasn't worth the complexity.
I think thread affinity is a better choice because no lock contention will always beat even the most optimized lock contention resolution. If Continuations related to the same constellation of data objects are on the same thread, then the locks are never contested, which makes it as fast as possible. On Mon, Sep 30, 2019 at 3:45 PM Walt Karas <wka...@verizonmedia.com.invalid> wrote: > From the longer-term TSers I've heard comments about seeing profiling > results that show that waiting on mutexes is a significant performance > issue with TS. But I'm not aware of any write-ups of such results. > Unfortunately, I'm relatively new to TS and Linux, so I'm not currently > familiar with the best approaches to profiling TS. > > For better performance, I think that having a single to-run Continuation > queue, or one per core, with a queue feeding multiple event threads is the > main thing. It's more resilient to Continuations that block. There > doesn't seem to be enthusiasm for getting hard-core about not having > blocking Continuations (see > https://github.com/apache/trafficserver/pull/5412 ). I'm not sure > changing > to queue-based mutexes would have a significant performance impact. But it > seems a cleaner design, making sure Continuations in the to-run list(s) are > actually ready to run. But a different mutex implementation is not > strictly necessary in order to consolidate to-run Continuation queues. > > On Mon, Sep 30, 2019 at 2:39 PM Kees Spoelstra <kspoels...@we-amp.com> > wrote: > > > Sounds very interesting. > > But what is the problem we're trying to solve here, I like the thread > > affinity because it gives us head ache free concurrency in some cases, > and > > I'll bet that there is some code which doesn't have the proper > continuation > > mutexes because we know it runs on the same thread. > > > > Are we seeing a lot of imbalanced threads (too much processing causing > long > > queues of continuations, which I can imagine in some cases) ? And > shouldn't > > we balance based on transactions or connections, move those around when > we > > see imbalance and aim for embarrassingly parallel processing :) Come > > to think of it, this might introduce another set of problems, how to know > > which continuations are part of the life cycle of a connection :/ > > > > Jumping threads in one transaction is not always ideal either, this can > > really hurt performance. But your proposed model seems to handle that > > somewhat better than the current implementation. > > > > Very interested and wondering what this would mean for plugin developers. > > > > On Mon, 30 Sep 2019, 19:20 Walt Karas, <wka...@verizonmedia.com.invalid> > > wrote: > > > > > If a Continuation is scheduled, but its mutex is locked, it's put in a > > > queue specific to that mutex. The release function for the mutex > (called > > > when a Continuation holding the mutex exists) would put the > Continuation > > at > > > the front of the mutex's queue (if not empty) into the ready-to-run > queue > > > (transferring the lock to that Continuation). A drawback is that the > > queue > > > would itself need a mutex (spinlock?), but the critical section would > be > > > very short. > > > > > > There would be a function to lock a mutex directly. It would create a > > > Continuation that had two condition variables. It would assign the > mutex > > > to this Continuation and schedule it. (In this case, it might make > sense > > > to put this Continuation at the front of the mutex's queue, since it > > would > > > be blocking an entire event thread.) The direct-lock function would > then > > > block on the first condition variable. When the Continuation ran, it > > would > > > trigger the first condition variable, and then block on the second > > > condition variable. The direct-lock function would then exit, allowing > > the > > > calling code to enter its critical section. At the end of the critical > > > section, another function to release the direct lock would be called. > It > > > would trigger the second condition variable, which would cause the > > function > > > of the Continuation created for the direct lock to exit (thus releasing > > the > > > mutex). > > > > > > With this approach, I'm not sure thread affinities would be of any > value. > > > I think perhaps each core should have it's own list of ready-to-run > > > Continuations, and a pool of event threads with affinity to that core. > > Not > > > having per-event-thread ready-to-run lists means that a Continuation > > > function that blocks is less likely to block other ready-to-run > > > Continuations. If Continuations had core affinities to some degree, > this > > > might reduce evictions in per-core memory cache. (Multiple > Continuations > > > having the same function should have the same core affinity.) > > > > > >