While investigating a core dump that we ran into during our ATS9 migration, we 
discovered a couple of problems with the thread affinity changes and the 
TSContSchedule API.
1) It looks like the API TSContSchedule() and TSContScheduleEvery() now don't 
work unless TSContSetThreadAffinity() was called prior to calling them. This 
sounds unreasonable, so, I'm proposing to change the API to set the 
Continuation's thread affinity to the calling thread of the API, if not already 
set.
2) We have a use case where one of our plugins calls TSContSchedule* API from a 
non-ATS thread (created using pthread_create). This borks in a weird fashion 
because the FORCE_PLUGIN_SCOPED_MUTEX ends up not acquiring the mutex (because 
the calling thread isn't an EThread, so, this_ethread() is NULL and the holding 
thread for the cont's mutex is also NULL, which means in Mutex_Lock(), the if 
(m->thread_holding != t) { evaluates to false bypassing the mutex acquire call. 
However, the subsequent mutex release does get called via Mutex_unlock() and 
this makes bad things to happen (as the __nusers of the mutex object goes below 
0 (it's an unsigned int)). 
https://github.com/apache/trafficserver/blob/4a503c817766fcad3b9eab3ff3a8e9e73d483ae7/iocore/eventsystem/I_Lock.h#L320

So, we could either 
- prevent calling TSContSchedule*() API from non-ATS threads by asserting if 
this_ethread() is NULL before trying to acquire the mutex. ie. basically, fail 
hard and fast and in more obvious way (than an obscure random side-effect that 
happens much later because a release() was called on a lock without calling an 
acquire())
OR 
- With the thread affinity changes, we could check if the continuation has a 
thread affinity set to an EThread and still allow calling TSContSchedule*() API 
from non-ATS Threads, as long as the thread affinity is set to an ATS thread. 
This requires to acquire the mutex on the affinity thread than the calling 
thread.

After discussing with Bryan, we prefer option (1) which is consistent with the 
expectation that Continuation Schedule API should only be called from ATS 
threads.

Note: Both the proposed changes here should not be breaking any compatibility 
or expectations, but rather retain compatibility with (1) above and make the 
API a bit more predictable/robust with (2)

Let me know if there are any comments/concerns or questions.

Thanks,
Sudheer

Reply via email to