>From the longer-term TSers I've heard comments about seeing profiling results that show that waiting on mutexes is a significant performance issue with TS. But I'm not aware of any write-ups of such results. Unfortunately, I'm relatively new to TS and Linux, so I'm not currently familiar with the best approaches to profiling TS.
For better performance, I think that having a single to-run Continuation queue, or one per core, with a queue feeding multiple event threads is the main thing. It's more resilient to Continuations that block. There doesn't seem to be enthusiasm for getting hard-core about not having blocking Continuations (see https://github.com/apache/trafficserver/pull/5412 ). I'm not sure changing to queue-based mutexes would have a significant performance impact. But it seems a cleaner design, making sure Continuations in the to-run list(s) are actually ready to run. But a different mutex implementation is not strictly necessary in order to consolidate to-run Continuation queues. On Mon, Sep 30, 2019 at 2:39 PM Kees Spoelstra <kspoels...@we-amp.com> wrote: > Sounds very interesting. > But what is the problem we're trying to solve here, I like the thread > affinity because it gives us head ache free concurrency in some cases, and > I'll bet that there is some code which doesn't have the proper continuation > mutexes because we know it runs on the same thread. > > Are we seeing a lot of imbalanced threads (too much processing causing long > queues of continuations, which I can imagine in some cases) ? And shouldn't > we balance based on transactions or connections, move those around when we > see imbalance and aim for embarrassingly parallel processing :) Come > to think of it, this might introduce another set of problems, how to know > which continuations are part of the life cycle of a connection :/ > > Jumping threads in one transaction is not always ideal either, this can > really hurt performance. But your proposed model seems to handle that > somewhat better than the current implementation. > > Very interested and wondering what this would mean for plugin developers. > > On Mon, 30 Sep 2019, 19:20 Walt Karas, <wka...@verizonmedia.com.invalid> > wrote: > > > If a Continuation is scheduled, but its mutex is locked, it's put in a > > queue specific to that mutex. The release function for the mutex (called > > when a Continuation holding the mutex exists) would put the Continuation > at > > the front of the mutex's queue (if not empty) into the ready-to-run queue > > (transferring the lock to that Continuation). A drawback is that the > queue > > would itself need a mutex (spinlock?), but the critical section would be > > very short. > > > > There would be a function to lock a mutex directly. It would create a > > Continuation that had two condition variables. It would assign the mutex > > to this Continuation and schedule it. (In this case, it might make sense > > to put this Continuation at the front of the mutex's queue, since it > would > > be blocking an entire event thread.) The direct-lock function would then > > block on the first condition variable. When the Continuation ran, it > would > > trigger the first condition variable, and then block on the second > > condition variable. The direct-lock function would then exit, allowing > the > > calling code to enter its critical section. At the end of the critical > > section, another function to release the direct lock would be called. It > > would trigger the second condition variable, which would cause the > function > > of the Continuation created for the direct lock to exit (thus releasing > the > > mutex). > > > > With this approach, I'm not sure thread affinities would be of any value. > > I think perhaps each core should have it's own list of ready-to-run > > Continuations, and a pool of event threads with affinity to that core. > Not > > having per-event-thread ready-to-run lists means that a Continuation > > function that blocks is less likely to block other ready-to-run > > Continuations. If Continuations had core affinities to some degree, this > > might reduce evictions in per-core memory cache. (Multiple Continuations > > having the same function should have the same core affinity.) > > >