Re: Thread affinity on Continuations vs TSContSchedule API

Sudheer Vinukonda Fri, 27 Mar 2020 11:01:09 -0700

 Generally speaking, we schedule Continuations for 2 main use cases. 
1) Make async sideways calls
2) Some offloading of heavy CPU work (such as encrypting/decrypting a token etc)
For (1) we don't use separate thread pools and just schedule them on the Net 
threads. As these involve i/o, it's more predictable (and IMO recommended) to 
stick to Net threads.
For (2) it's totally dependent on your use case. We pretty much run with the 
default 2 task threads and that seems to work reasonably for us.
But, given that this is clearly use case specific, I wonder if it may be more 
appropriate to be put in Best practices/FAQ sections of the docs than the API 
docs itself.
Thanks,
Sudheer



    On Friday, March 27, 2020, 10:45:46 AM PDT, Alan Carroll 
<solidwallofc...@verizonmedia.com.invalid> wrote:  
 
 Yes, that's a problem. One of the many things on my list.

On Fri, Mar 27, 2020 at 12:42 PM David Calavera <david.calav...@gmail.com>
wrote:

> Something I find challenging about TSContScheduleOnPool and others is that
> there is no much documentation about good practices to configure thread
> pools. For example the only documentation I found about task threads only
> says that you must have at least 1 task thread:
>
>
> https://docs.trafficserver.apache.org/en/8.0.x/admin-guide/files/records.config.en.html?highlight=records%20config#proxy-config-task-threads
>
> This option obviously depends on how much work you're doing in task
> threads, but it'd be very good to have some suggestions from people's
> experience changing that setting in real production environments.
>
> On Fri, Mar 27, 2020 at 9:03 AM Alan Carroll
> <solidwallofc...@verizonmedia.com.invalid> wrote:
>
> > I put most of my discussion on the PR 6577, but discussing this with
> > Sudheer and Fei, I think the consensus is
> >
> > 1) Accept the PR for ATS 9.
> > 2) For ATS 10 (or maybe 9.1?), deprecate TSContSchedule. Make it explicit
> > to schedule on a pool or thread.
> >
> > On Thu, Mar 26, 2020 at 2:51 PM Walt Karas <wka...@verizonmedia.com
> > .invalid>
> > wrote:
> >
> > > I was having problems with this too in a new plugin I was writing.  I
> had
> > > to make this small change to get it to work:
> > >
> > >
> >
> https://github.com/apache/trafficserver/pull/6489/files#diff-8c09c39a3b13e183ba1bfd1e590cfb21
> > >
> > > I found that I had to use TSContScheduleOnPool() instead of
> > > TSContSchedule().
> > >
> > > On Thu, Mar 26, 2020 at 12:55 PM Sudheer Vinukonda
> > > <sudheervinuko...@yahoo.com.invalid> wrote:
> > >
> > > > While investigating a core dump that we ran into during our ATS9
> > > > migration, we discovered a couple of problems with the thread
> affinity
> > > > changes and the TSContSchedule API.
> > > > 1) It looks like the API TSContSchedule() and TSContScheduleEvery()
> now
> > > > don't work unless TSContSetThreadAffinity() was called prior to
> calling
> > > > them. This sounds unreasonable, so, I'm proposing to change the API
> to
> > > set
> > > > the Continuation's thread affinity to the calling thread of the API,
> if
> > > not
> > > > already set.
> > > > 2) We have a use case where one of our plugins calls TSContSchedule*
> > API
> > > > from a non-ATS thread (created using pthread_create). This borks in a
> > > weird
> > > > fashion because the FORCE_PLUGIN_SCOPED_MUTEX ends up not acquiring
> the
> > > > mutex (because the calling thread isn't an EThread, so,
> this_ethread()
> > is
> > > > NULL and the holding thread for the cont's mutex is also NULL, which
> > > means
> > > > in Mutex_Lock(), the if (m->thread_holding != t) { evaluates to false
> > > > bypassing the mutex acquire call. However, the subsequent mutex
> release
> > > > does get called via Mutex_unlock() and this makes bad things to
> happen
> > > (as
> > > > the __nusers of the mutex object goes below 0 (it's an unsigned
> int)).
> > > >
> > > >
> > >
> >
> https://github.com/apache/trafficserver/blob/4a503c817766fcad3b9eab3ff3a8e9e73d483ae7/iocore/eventsystem/I_Lock.h#L320
> > > >
> > > > So, we could either
> > > > - prevent calling TSContSchedule*() API from non-ATS threads by
> > asserting
> > > > if this_ethread() is NULL before trying to acquire the mutex. ie.
> > > > basically, fail hard and fast and in more obvious way (than an
> obscure
> > > > random side-effect that happens much later because a release() was
> > called
> > > > on a lock without calling an acquire())
> > > > OR
> > > > - With the thread affinity changes, we could check if the
> continuation
> > > has
> > > > a thread affinity set to an EThread and still allow calling
> > > > TSContSchedule*() API from non-ATS Threads, as long as the thread
> > > affinity
> > > > is set to an ATS thread. This requires to acquire the mutex on the
> > > affinity
> > > > thread than the calling thread.
> > > >
> > > > After discussing with Bryan, we prefer option (1) which is consistent
> > > with
> > > > the expectation that Continuation Schedule API should only be called
> > from
> > > > ATS threads.
> > > >
> > > > Note: Both the proposed changes here should not be breaking any
> > > > compatibility or expectations, but rather retain compatibility with
> (1)
> > > > above and make the API a bit more predictable/robust with (2)
> > > >
> > > > Let me know if there are any comments/concerns or questions.
> > > >
> > > > Thanks,
> > > > Sudheer
> > > >
> > > >
> > >
> >
>

Re: Thread affinity on Continuations vs TSContSchedule API

Reply via email to