Graham Dumpleton <graham.dumple...@gmail.com> added the comment:

Just to make few things clear. It isn't mod_wsgi itself that relies on daemon 
threads, it is going to be users WSGI applications (or the things they need) 
that do.

As a concrete example of things that would stop working are monitoring systems 
such as New Relic, DataDog, Elastic APM etc. These all fire off a background 
thread to handle aggregation of data collected from the application, with that 
data then being sent off once a minute to the backend servers.

It isn't just these though. Over the years have see many instances of people 
using background threads to off load small tasks to be done in process rather 
than using full blown queuing system such as Celery etc. So I don't believe it 
is a rare use case. Monitoring systems are a big use case though.

These would all usually use a daemon thread so they can be started and 
effectively forgotten, with no need to do anything to shut them down when the 
process is exiting.

Some (such as New Relic, which I wrote so know how it works), will register an 
atexit callback in order to flush data out before a process stops, but it may 
not actually exit the thread. Even if it does exit the thread, you can't just 
switch it to use a non daemon thread as that will not work.

The problem here is that atexit callbacks are only called after the 
(sub)interpreter shutdown code has waited on non daemon threads. Thus there is 
no current standard way I know of to notify a non daemon thread to shutdown. 
The result would be that if these were switched to non daemon thread, the 
process would hang on shutdown at the point of waiting for non daemon threads.

So if you are going to eliminate daemon threads (even if only in sub 
interpreters at this point), you are going to have to introduce a way to 
register something similar to an atexit callback which would be invoked before 
waiting on non daemon threads, so an attempt can be made to notify them that 
they need to shutdown. Use of this mechanism is going to have to be added to 
any code out there currently using daemon threads if they are going to be 
forced to use non daemon threads. This includes stuff in the stdlib such as the 
multiprocessing thread pools. They can't just switch to non daemon threads, 
they have to add the capability to register and be notified of (sub)interpreter 
shutdown so they can exit the thread else process hangs will occur.

Now a few other things about history and usage of mod_wsgi to give context.

Once upon a time mod_wsgi did try and delete sub interpreters and replace them 
in the life of a process. This as you can probably imagine now was very buggy 
because of issues in CPython sub interpreter support. As a result mod_wsgi 
discarded that ability and so a sub interpreter always persisted and was used 
for the life of the process. That way problems with clean up of sub 
interpreters wasn't a big issue.

During cleanup of (sub)interpreters on process shutdown, although crashes could 
sometimes occur (usually quite rare), what usually happened was that a Python 
exception would occur. The reason for this would be in cleaning up a 
(sub)interpreter, sys.modules was cleared up with everything appearing to be 
set to None. You would therefore get a Python exception because some code 
trying to access a class instance found the instance replaced by None and so it 
failed. Even this was rare and not a big deal.

Now although a crash or Python exception could in rare cases occur, for 
mod_wsgi it didn't really matter since we were talking about sub process of the 
Apache master process, and the master process didn't care. If Apache was 
stopping anyway, it just stopped normally. If Apache was doing a restart and 
child process were told to stop because of that, or if a maximum request 
threshold was reach and so process was being recycled, then Apache was going to 
replace the process anyway, so everything just carried on normally and a new 
process started in its place.

In the case where a process lockup managed to occur on process shutdown, for 
example if non daemon thread were used explicitly, then process shutdown 
timeouts applied by mod_wsgi on daemon processes would kick in and the process 
would be force killed anyway. So all up it was quite resilient and kept 
working. If embedded mode of mod_wsgi was used, it would though lock up the 
Apache process indefinitely if something used non daemon threads explicitly.

On the issue of non daemon threads, usually these would never arise. This is 
because usually people don't explicitly say a thread is non daemon. Where 
nothing is done to say that, a thread actually inherits the mode of the thread 
it was created in. Since all request handler threads in mod_wsgi are actually 
externally created threads which call into Python, they get assigned the 
DummyThread object to track them. These are treated as non daemon threads. As a 
result any new threads created which don't explicitly say they are non daemon, 
get marked as daemon threads anyway. The consequence of this is that they never 
get waited upon on shutdown and everything works.

Anyway, going forward, if use of daemon threads is blocked in sub interpreters 
to satisfy the new envisioned use case for them, of using them to run sub tasks 
out of another interpreter, then first thing would be that mod_wsgi would 
deprecate use of sub interpreters, disabling them by default and requiring 
people to explicitly enable ability to use them. This would be just an interim 
measure in a transition period.

In a followup version mod_wsgi would then discard support for sub interpreters, 
as well as likely also disable embedded mode (except for specific case of 
Apache access control hooks, would mean Windows support would still be dropped 
though). Thus people would be forced to use daemon mode of mod_wsgi where 
separate processes are used to the Apache child processes to run a WSGI 
application. This is actually was mod_wsgi-express effectively enforces 
already. If separation were need for separate applications, or if a single 
application needed to be split, separate groups of daemon processes would be 
used with requests redirected to the appropriate instance in a daemon process 
group.

This use of daemon processes, recommendation to not use sub interpreters and 
use the main interpreter context has existed for many years due to many third 
party packages not working in sub interpreters anyway due to simplified GIL API 
restrictions. So this isn't new, it just hasn't been required and enforced 
(except in mod_wsgi-express). So these steps just force people in the direction 
which has been recommended for a long time.

As to the question of why disable/discard sub interpreter support in mod_wsgi, 
that comes down to support burden. This is not support burden in mod_wsgi, but 
the effort it will take to deal with all those people out there whose 
applications will stop working if run in sub interpreters were daemon thread 
usage prevented. I don't have time to be hand holding all these people and 
educate or help them to fix their applications or tell them how some third 
party package they use needs to be changed. Will be easier and less impact on 
me as the only person who supports mod_wsgi to discard sub interpreter support 
and document how people need to move to use of daemon mode and main interpreter 
as has been recommended for a long time anyway but which couldn't be made the 
default purely because of history of how mod_wsgi was developed and features 
added over time.

Now later on if someone decides to eliminate daemon threads for the main 
interpreter context, or changes stdlib so everything uses non daemon threads, 
then at that point I would stop supporting mod_wsgi in those Python versions. I 
just feel that is going to have a huge impact on user code at that point and 
create lots of problems so don't even want to go there. The impact of dropping 
daemon threads from the main interpreter will likely have affects way beyond 
mod_wsgi as well, so right now I can't see how you could even make that 
decision.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue40234>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to