[Python-Dev] What is the purpose of the _PyThreadState_Current symbol in Python 3?

2018-09-28 Thread Gabriele
Hi Victor,

> I understand that you are writing a debugger and you can only *read*
> modify, not execute code, right?

I'm working on a frame stack sampler that runs independently from the
Python process. The project is "Austin"
(https://github.com/P403n1x87/austin). Whilst I could, in principle,
execute code with other system calls, I prefer not to in this case.

> In the master branch, it's now _PyRuntime.gilstate.tstate_current. If
> you run time.sleep(3600) and look into
> _PyRuntime.gilstate.tstate_current using gdb, you can a NULL pointer
> (tstate_current=0) because Python releases the GIL..

I would like my application to make as few assumptions as possible.
The _PyRuntime symbol might not be available if all the symbols have
been stripped out of the binaries. That's why I was trying to rely on
_PyThreadState_Current, which is in the .dynsym section. Judging by
the output of nm -D `which python3` (I'm on Python 3.6.6 at the
moment) I cannot see anything more useful than that.

My current strategy is to try and make something out of this symbol
and then fall back to a brute force approach to scan the .bss section
for valid PyInterpreterState instances (which works reliably well and
is quite fast too, but a bit ugly).

> There is also _PyGILState_GetInterpreterStateUnsafe() which gives
> access to the current Python interpreter:
> _PyRuntime.gilstate.autoInterpreterState. From the interpreter, you
> can use the linked list of thread states from interp->tstate_head.
>
> I hope that I helped :-)

Yes thanks! Your comment made me realise why I can use
PyThreadState_Current at the very beginning, and it is because Python
is going through the intensive startup process, which involves, among
other things, the loading of frozen modules (I can clearly see most if
not all the steps in the output of Austin, as mentioned in the repo's
README). During this phase, the main (and only thread) holds the GIL
and is quite busy doing stuff. The long-running applications that I
was trying to attach to have very long wait periods where they sit
idle waiting for a timer to trigger the next operations, that fire
very quickly and put the threads back to sleep again.

If this is what the _PyThreadState_Current is designed for, then I
guess I cannot really rely on it, especially when attaching Austin to
another process.

Best regards,
Gabriele
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Documenting the private C API (was Re: Questions about signal handling.)

2018-09-28 Thread Nick Coghlan
On Wed, 26 Sep 2018 at 00:33, Barry Warsaw  wrote:
>
> On Sep 25, 2018, at 10:18, Antoine Pitrou  wrote:
> >
> > Not really.  Many are just like "static" (i.e. module-private)
> > functions, except that they need to be shared by two or three different
> > C modules.  It's definitely the case for _PyEval_SignalReceived().
>
> Purely static functions which appear only in the file they are defined in are 
> probably fine not to document, although I do still think we should take care 
> to comment on their semantics and external behaviors (i.e. reference 
> counting).  But if they’re used in multiple C files, then I think they *can* 
> deserve placement within the documentation.

We run into this problem with the test.support helpers as well (we
have more helpers than just those in the docs, but the others tend to
rely on contributors and/or PR reviewers having looked at other tests
that already use them).

Fleshing out on the "internals" docs idea that some folks have mentioned:

1. Call it "Doc/_internals" and keep the leading underscore in the
published docs
2. Use it to cover both C internals and Python internals (such as test.support)
3. Permit use of autodoc tools that we don't allow in the main docs
(as these docs would be for CPython contributors, so the intended
audience for the docs is the same as the audience for the code)
4. Potentially pull in some specific files and sections from the
source code as literal include blocks (as per
http://docutils.sourceforge.net/docs/ref/rst/directives.html#include)
rather than rewriting them

Cheers,
Nick.

P.S. While it wouldn't be usable directly,
https://github.com/jnikula/hawkmoth at least demonstrates the
principle of extracting Sphinx API docs from C source files.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Summary of Python tracker Issues

2018-09-28 Thread Python tracker

ACTIVITY SUMMARY (2018-09-21 - 2018-09-28)
Python tracker at https://bugs.python.org/

To view or respond to any of the issues listed below, click on the issue.
Do NOT respond to this message.

Issues counts and deltas:
  open6781 (-14)
  closed 39803 (+80)
  total  46584 (+66)

Open issues with patches: 2703 


Issues opened (54)
==

#12782: Multiple context expressions do not support parentheses for co
https://bugs.python.org/issue12782  reopened by lukasz.langa

#28655: Tests altered the execution environment in isolated mode
https://bugs.python.org/issue28655  reopened by vstinner

#32528: Change base class for futures.CancelledError
https://bugs.python.org/issue32528  reopened by yselivanov

#34768: Add documentation explaining __init__.py in packages
https://bugs.python.org/issue34768  opened by bkestelman

#34769: _asyncgen_finalizer_hook running in wrong thread
https://bugs.python.org/issue34769  opened by twisteroid ambassador

#34771: test_ctypes failing on Linux SPARC64
https://bugs.python.org/issue34771  opened by kelledin-3

#34773: sqlite3 module inconsistently returning only some rows from a 
https://bugs.python.org/issue34773  opened by shankargopal

#34774: IDLE: use theme colors for help viewer
https://bugs.python.org/issue34774  opened by terry.reedy

#34775: pathlib.PurePath division raises TypeError instead of returnin
https://bugs.python.org/issue34775  opened by Roger Aiudi

#34776: Postponed annotations break inspection of dataclasses
https://bugs.python.org/issue34776  opened by drhagen

#34778: Memoryview for column-major (f_contiguous) arrays from bytes i
https://bugs.python.org/issue34778  opened by lgautier

#34779: IDLE internals show up in tracebacks when returning objects th
https://bugs.python.org/issue34779  opened by ppperry

#34780: Hang on startup if stdin refers to a pipe with an outstanding 
https://bugs.python.org/issue34780  opened by izbyshev

#34781: infinite waiting in multiprocessing.Pool
https://bugs.python.org/issue34781  opened by coells

#34782: Pdb crashes when code is executed in a mapping that does not d
https://bugs.python.org/issue34782  opened by ppperry

#34784: Heap-allocated StructSequences
https://bugs.python.org/issue34784  opened by eelizondo

#34785: pty.spawn -- auto-termination after child process is dead (a z
https://bugs.python.org/issue34785  opened by jarryshaw

#34788: ipaddress module fails on rfc4007 scoped IPv6 addresses
https://bugs.python.org/issue34788  opened by Jeremy McMillan

#34789: Make xml.sax.make_parser accept iterables
https://bugs.python.org/issue34789  opened by adelfino

#34790: Deprecate passing coroutine objects to asyncio.wait()
https://bugs.python.org/issue34790  opened by yselivanov

#34791: xml package does not obey sys.flags.ignore_environment
https://bugs.python.org/issue34791  opened by christian.heimes

#34792: Tutorial doesn''t discuss / and * function arguments
https://bugs.python.org/issue34792  opened by diekhans

#34793: Remove support for "with (await asyncio.lock):"
https://bugs.python.org/issue34793  opened by yselivanov

#34794: memory leak in TkApp:_createbytearray
https://bugs.python.org/issue34794  opened by dtalkin

#34795: loop.sock_recv failure because of delayed callback handling
https://bugs.python.org/issue34795  opened by kyuupichan

#34796: Tkinter scrollbar issues on Mac.
https://bugs.python.org/issue34796  opened by terry.reedy

#34797: Convert heapq to the argument clinic
https://bugs.python.org/issue34797  opened by pablogsal

#34798: pprint ignores the compact parameter for dicts
https://bugs.python.org/issue34798  opened by Nicolas Hug

#34799: When function in tracing returns None, tracing continues.
https://bugs.python.org/issue34799  opened by fabioz

#34800: email.contentmanager raises error when policy.max_line_length=
https://bugs.python.org/issue34800  opened by silane

#34801: codecs.getreader() splits lines containing control characters
https://bugs.python.org/issue34801  opened by nascheme

#34804: Repetition of 'for example' in documentation
https://bugs.python.org/issue34804  opened by rarblack

#34805: Explicitly specify `MyClass.__subclasses__()` returns classes 
https://bugs.python.org/issue34805  opened by pekka.klarck

#34806: distutils tests fail with recent 3.7 branch
https://bugs.python.org/issue34806  opened by doko

#34807: pathlib.[r]glob fails when the toplevel directory is not reada
https://bugs.python.org/issue34807  opened by Antony.Lee

#34810: Maximum and minimum value of C types integers from Python
https://bugs.python.org/issue34810  opened by scls

#34811: test_gdb fails with latest gdb
https://bugs.python.org/issue34811  opened by cstratak

#34812: support.args_from_interpreter_flags() doesn't inherit -I (isol
https://bugs.python.org/issue34812  opened by vstinner

#34814: makesetup: must link C extensions to libpython when compiled i
https://bugs.python.org/issue34814  opened by vstinner

#34816: ctypes + hasattr
https://bugs.pyth

[Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-09-28 Thread Sean Harrington
I am proposing an extension to the multiprocessing.Pool API that allows for
an alternative way to pass data to Pool worker processes, *without* using
globals.

A PR has been opened ,
extensive test coverage is also included, with all tests & CI passing on
github.

Please see this blog post

for details, motivation, and use cases of the API extension before reading
on.

In *short*, the implementation of the feature works as follows:

   1. Exposes a kwarg on Pool.__init__ called `expect_initret`, that
   defaults to False. When set to True:
  1. Capture the return value of the initializer kwarg of Pool
  2. Pass this value to the function being applied, as a kwarg.

Again, in *short,* the motivation of the feature is to provide an explicit
"flow of data" from parent process to worker process, and to avoid being
*forced* to using the *global* keyword in initializer, or being *forced* to
create global variables in the parent process.

The interface is 100% backwards compatible through Python3.x (and perhaps
beyond).
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What is the purpose of the _PyThreadState_Current symbol in Python 3?

2018-09-28 Thread Nathaniel Smith
What information do you wish the interpreter provided, that would make your
program simpler and more reliable?

On Fri, Sep 28, 2018, 07:21 Gabriele  wrote:

> Hi Victor,
>
> > I understand that you are writing a debugger and you can only *read*
> > modify, not execute code, right?
>
> I'm working on a frame stack sampler that runs independently from the
> Python process. The project is "Austin"
> (https://github.com/P403n1x87/austin). Whilst I could, in principle,
> execute code with other system calls, I prefer not to in this case.
>
> > In the master branch, it's now _PyRuntime.gilstate.tstate_current. If
> > you run time.sleep(3600) and look into
> > _PyRuntime.gilstate.tstate_current using gdb, you can a NULL pointer
> > (tstate_current=0) because Python releases the GIL..
>
> I would like my application to make as few assumptions as possible.
> The _PyRuntime symbol might not be available if all the symbols have
> been stripped out of the binaries. That's why I was trying to rely on
> _PyThreadState_Current, which is in the .dynsym section. Judging by
> the output of nm -D `which python3` (I'm on Python 3.6.6 at the
> moment) I cannot see anything more useful than that.
>
> My current strategy is to try and make something out of this symbol
> and then fall back to a brute force approach to scan the .bss section
> for valid PyInterpreterState instances (which works reliably well and
> is quite fast too, but a bit ugly).
>
> > There is also _PyGILState_GetInterpreterStateUnsafe() which gives
> > access to the current Python interpreter:
> > _PyRuntime.gilstate.autoInterpreterState. From the interpreter, you
> > can use the linked list of thread states from interp->tstate_head.
> >
> > I hope that I helped :-)
>
> Yes thanks! Your comment made me realise why I can use
> PyThreadState_Current at the very beginning, and it is because Python
> is going through the intensive startup process, which involves, among
> other things, the loading of frozen modules (I can clearly see most if
> not all the steps in the output of Austin, as mentioned in the repo's
> README). During this phase, the main (and only thread) holds the GIL
> and is quite busy doing stuff. The long-running applications that I
> was trying to attach to have very long wait periods where they sit
> idle waiting for a timer to trigger the next operations, that fire
> very quickly and put the threads back to sleep again.
>
> If this is what the _PyThreadState_Current is designed for, then I
> guess I cannot really rely on it, especially when attaching Austin to
> another process.
>
> Best regards,
> Gabriele
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/njs%40pobox.com
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] What is the purpose of the _PyThreadState_Current symbol in Python 3?

2018-09-28 Thread Gabriele
On Fri, 28 Sep 2018 at 23:12, Nathaniel Smith  wrote:
> What information do you wish the interpreter provided, that would make your 
> program simpler and more reliable?

An exported global variable that points to the head of the
PyInterpreterState linked list (i.e. the return value of
PyInterpreterState_Head). This way my program could just look this up
from the dynsym section instead of scanning a dump of the bss section
in memory to find a possible candidate. It would be grand if also the
string in the rodata section that gives the Python version could be
dereferenced from dynsym, but that's a different question.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-09-28 Thread Antoine Pitrou


Hi,

On Fri, 28 Sep 2018 17:07:33 -0400
Sean Harrington  wrote:
> 
> In *short*, the implementation of the feature works as follows:
> 
>1. Exposes a kwarg on Pool.__init__ called `expect_initret`, that
>defaults to False. When set to True:
>   1. Capture the return value of the initializer kwarg of Pool
>   2. Pass this value to the function being applied, as a kwarg.
> 
> Again, in *short,* the motivation of the feature is to provide an explicit
> "flow of data" from parent process to worker process, and to avoid being
> *forced* to using the *global* keyword in initializer, or being *forced* to
> create global variables in the parent process.

Thanks for taking the time to explain your use case and write a
proposal.

My reactions to this are:

1. The proposed API is ugly.  This basically allows you to pass an
argument which changes with which arguments another function is later
called...
2. A global variable seems like the adequate way to represent a
process-global object (which is exactly your use case).
3. If you don't like globals, you could probably do something like
lazily-initialize the resource when a function needing it is executed;
this also avoids creating the resource if the child doesn't use it at
all.  Would that work for you?

As a more general remark, I understand the desire to make the Pool
object more flexible, but we can also not pile up features until it
satisfies all use cases.

As another general remark, concurrent.futures is IMHO the preferred API
for the future, and where feature work should probably concentrate.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-09-28 Thread Sean Harrington
Hi Antoine - see inline below for my response...thanks for your time!

On Fri, Sep 28, 2018 at 6:45 PM Antoine Pitrou  wrote:

>
> Hi,
>
> On Fri, 28 Sep 2018 17:07:33 -0400
> Sean Harrington  wrote:
> >
> > In *short*, the implementation of the feature works as follows:
> >
> >1. Exposes a kwarg on Pool.__init__ called `expect_initret`, that
> >defaults to False. When set to True:
> >   1. Capture the return value of the initializer kwarg of Pool
> >   2. Pass this value to the function being applied, as a kwarg.
> >
> > Again, in *short,* the motivation of the feature is to provide an
> explicit
> > "flow of data" from parent process to worker process, and to avoid being
> > *forced* to using the *global* keyword in initializer, or being *forced*
> to
> > create global variables in the parent process.
>
> Thanks for taking the time to explain your use case and write a
> proposal.
>
> My reactions to this are:
>
> 1. The proposed API is ugly.  This basically allows you to pass an
> argument which changes with which arguments another function is later
> called...

> Yes I agree that this is a not-perfect contract, but isn't this also a
concern with the current implementation? And isn't this pattern arguably
more explicit than "The function-being-applied relying on the initializer
to create a global variable from within it's lexical scope"?



2. A global variable seems like the adequate way to represent a
> process-global object (which is exactly your use case)

> There is nothing wrong with using a global variable, especially in nearly
every toy example found on the internet of using multiprocessing.Pool (i.e.
optimizing a simple script). But what happens when you have lots of nested
function calls in your applied function? My simple argument is that the
developer should not be constrained to make the objects passed globally
available in the process, as this MAY break encapsulation for large
projects.



3. If you don't like globals, you could probably do something like
> lazily-initialize the resource when a function needing it is executed;
> this also avoids creating the resource if the child doesn't use it at
> all.  Would that work for you?
>
> I have nothing against globals, my gripe is with being enforced to use
them for every Pool use case. Further, if initializing the resource is
expensive, we only want to do this ONE time per worker process. So no, this
will not ~always~ work.


> As a more general remark, I understand the desire to make the Pool
> object more flexible, but we can also not pile up features until it
> satisfies all use cases.
>
> I understand that this is a legitimate concern, but this is about API
approachability.  Python end-users of Pool are forced to declare a global
from a lexical scope. Most Python end-users probably don't even know this
is possible. Sure, this is adding a feature for a use case that I outlined,
but really this is one of the two major use cases of "initializer" and
"initargs" (see my blog post for the 2 generalized use cases
),
not some obscure use case. This is making that *very common* use case more
approachable.


> As another general remark, concurrent.futures is IMHO the preferred API
> for the future, and where feature work should probably concentrate.
>
> This is good to hear and know. And will keep this mind moving forward!


> Regards
>
> Antoine.
>
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/seanharr11%40gmail.com
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] bpo-34837: Multiprocessing.Pool API Extension - Pass Data to Workers w/o Globals

2018-09-28 Thread Michael Selik
On Fri, Sep 28, 2018 at 2:11 PM Sean Harrington  wrote:
> kwarg on Pool.__init__ called `expect_initret`, that defaults to False. When 
> set to True:
> Capture the return value of the initializer kwarg of Pool
> Pass this value to the function being applied, as a kwarg.

The parameter name you chose, "initret" is awkward, because nowhere
else in Python does an initializer return a value. Initializers mutate
an encapsulated scope. For a class __init__, that scope is an
instance's attributes. For a subprocess managed by Pool, that
encapsulated scope is its "globals". I'm using quotes to emphasize
that these "globals" aren't shared.


On Fri, Sep 28, 2018 at 4:39 PM Sean Harrington  wrote:
> On Fri, Sep 28, 2018 at 6:45 PM Antoine Pitrou  wrote:
>> 3. If you don't like globals, you could probably do something like
>> lazily-initialize the resource when a function needing it is executed
>
> if initializing the resource is expensive, we only want to do this ONE time 
> per worker process.

We must have a different concept of "lazily-initialize". I understood
Antoine's suggestion to be a one-time initialize per worker process.


On Fri, Sep 28, 2018 at 4:39 PM Sean Harrington  wrote:
> My simple argument is that the developer should not be constrained to make 
> the objects passed globally available in the process, as this MAY break 
> encapsulation for large projects.

I could imagine someone switching from Pool to ThreadPool and getting
into trouble, but in my mind using threads is caveat emptor. Are you
worried about breaking encapsulation in a different scenario?
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com