I put most of my discussion on the PR 6577, but discussing this with Sudheer and Fei, I think the consensus is
1) Accept the PR for ATS 9. 2) For ATS 10 (or maybe 9.1?), deprecate TSContSchedule. Make it explicit to schedule on a pool or thread. On Thu, Mar 26, 2020 at 2:51 PM Walt Karas <wka...@verizonmedia.com.invalid> wrote: > I was having problems with this too in a new plugin I was writing. I had > to make this small change to get it to work: > > https://github.com/apache/trafficserver/pull/6489/files#diff-8c09c39a3b13e183ba1bfd1e590cfb21 > > I found that I had to use TSContScheduleOnPool() instead of > TSContSchedule(). > > On Thu, Mar 26, 2020 at 12:55 PM Sudheer Vinukonda > <sudheervinuko...@yahoo.com.invalid> wrote: > > > While investigating a core dump that we ran into during our ATS9 > > migration, we discovered a couple of problems with the thread affinity > > changes and the TSContSchedule API. > > 1) It looks like the API TSContSchedule() and TSContScheduleEvery() now > > don't work unless TSContSetThreadAffinity() was called prior to calling > > them. This sounds unreasonable, so, I'm proposing to change the API to > set > > the Continuation's thread affinity to the calling thread of the API, if > not > > already set. > > 2) We have a use case where one of our plugins calls TSContSchedule* API > > from a non-ATS thread (created using pthread_create). This borks in a > weird > > fashion because the FORCE_PLUGIN_SCOPED_MUTEX ends up not acquiring the > > mutex (because the calling thread isn't an EThread, so, this_ethread() is > > NULL and the holding thread for the cont's mutex is also NULL, which > means > > in Mutex_Lock(), the if (m->thread_holding != t) { evaluates to false > > bypassing the mutex acquire call. However, the subsequent mutex release > > does get called via Mutex_unlock() and this makes bad things to happen > (as > > the __nusers of the mutex object goes below 0 (it's an unsigned int)). > > > > > https://github.com/apache/trafficserver/blob/4a503c817766fcad3b9eab3ff3a8e9e73d483ae7/iocore/eventsystem/I_Lock.h#L320 > > > > So, we could either > > - prevent calling TSContSchedule*() API from non-ATS threads by asserting > > if this_ethread() is NULL before trying to acquire the mutex. ie. > > basically, fail hard and fast and in more obvious way (than an obscure > > random side-effect that happens much later because a release() was called > > on a lock without calling an acquire()) > > OR > > - With the thread affinity changes, we could check if the continuation > has > > a thread affinity set to an EThread and still allow calling > > TSContSchedule*() API from non-ATS Threads, as long as the thread > affinity > > is set to an ATS thread. This requires to acquire the mutex on the > affinity > > thread than the calling thread. > > > > After discussing with Bryan, we prefer option (1) which is consistent > with > > the expectation that Continuation Schedule API should only be called from > > ATS threads. > > > > Note: Both the proposed changes here should not be breaking any > > compatibility or expectations, but rather retain compatibility with (1) > > above and make the API a bit more predictable/robust with (2) > > > > Let me know if there are any comments/concerns or questions. > > > > Thanks, > > Sudheer > > > > >