I put most of my discussion on the PR 6577, but discussing this with
Sudheer and Fei, I think the consensus is

1) Accept the PR for ATS 9.
2) For ATS 10 (or maybe 9.1?), deprecate TSContSchedule. Make it explicit
to schedule on a pool or thread.

On Thu, Mar 26, 2020 at 2:51 PM Walt Karas <wka...@verizonmedia.com.invalid>
wrote:

> I was having problems with this too in a new plugin I was writing.  I had
> to make this small change to get it to work:
>
> https://github.com/apache/trafficserver/pull/6489/files#diff-8c09c39a3b13e183ba1bfd1e590cfb21
>
> I found that I had to use TSContScheduleOnPool() instead of
> TSContSchedule().
>
> On Thu, Mar 26, 2020 at 12:55 PM Sudheer Vinukonda
> <sudheervinuko...@yahoo.com.invalid> wrote:
>
> > While investigating a core dump that we ran into during our ATS9
> > migration, we discovered a couple of problems with the thread affinity
> > changes and the TSContSchedule API.
> > 1) It looks like the API TSContSchedule() and TSContScheduleEvery() now
> > don't work unless TSContSetThreadAffinity() was called prior to calling
> > them. This sounds unreasonable, so, I'm proposing to change the API to
> set
> > the Continuation's thread affinity to the calling thread of the API, if
> not
> > already set.
> > 2) We have a use case where one of our plugins calls TSContSchedule* API
> > from a non-ATS thread (created using pthread_create). This borks in a
> weird
> > fashion because the FORCE_PLUGIN_SCOPED_MUTEX ends up not acquiring the
> > mutex (because the calling thread isn't an EThread, so, this_ethread() is
> > NULL and the holding thread for the cont's mutex is also NULL, which
> means
> > in Mutex_Lock(), the if (m->thread_holding != t) { evaluates to false
> > bypassing the mutex acquire call. However, the subsequent mutex release
> > does get called via Mutex_unlock() and this makes bad things to happen
> (as
> > the __nusers of the mutex object goes below 0 (it's an unsigned int)).
> >
> >
> https://github.com/apache/trafficserver/blob/4a503c817766fcad3b9eab3ff3a8e9e73d483ae7/iocore/eventsystem/I_Lock.h#L320
> >
> > So, we could either
> > - prevent calling TSContSchedule*() API from non-ATS threads by asserting
> > if this_ethread() is NULL before trying to acquire the mutex. ie.
> > basically, fail hard and fast and in more obvious way (than an obscure
> > random side-effect that happens much later because a release() was called
> > on a lock without calling an acquire())
> > OR
> > - With the thread affinity changes, we could check if the continuation
> has
> > a thread affinity set to an EThread and still allow calling
> > TSContSchedule*() API from non-ATS Threads, as long as the thread
> affinity
> > is set to an ATS thread. This requires to acquire the mutex on the
> affinity
> > thread than the calling thread.
> >
> > After discussing with Bryan, we prefer option (1) which is consistent
> with
> > the expectation that Continuation Schedule API should only be called from
> > ATS threads.
> >
> > Note: Both the proposed changes here should not be breaking any
> > compatibility or expectations, but rather retain compatibility with (1)
> > above and make the API a bit more predictable/robust with (2)
> >
> > Let me know if there are any comments/concerns or questions.
> >
> > Thanks,
> > Sudheer
> >
> >
>

Reply via email to