[Python-Dev] Can we stop adding to the C API, please?

2020-06-03 Thread Mark Shannon

Hi,

The size of the C API, as measured by `git grep PyAPI_FUNC | wc -l` has 
been steadily increasing over the last few releases.


3.5 1237
3.6 1304
3.7 1408
3.8 1478
3.9 1518


For reference the 2.7 branch has "only" 973 functions

I've heard many criticisms of Python 2 over the years, but that it 
needed a bigger C API wasn't one of them ;)


Why are these functions being added? Wasn't 1000 C functions enough?

Every one of these functions represents a maintenance burden.
Removing them is painful and takes a lot of effort, but adding them is 
done casually, without a PEP or, in many cases, even a review.


We need to address what to do about the C API in the long term, but for 
now can we just stop making it larger? Please.


Also, can we remove all the new API functions added in 3.9 before the 
release and it is too late?


Cheers,
Mark.
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/6CE75BIJC2GSQBO2MUJHW3MA6Q2MAWCB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Can we stop adding to the C API, please?

2020-06-03 Thread Victor Stinner
Hi,

In Python 3.9, I *removed* dozens of functions from the *public* C
API, or moved them to the "internal" C API:
https://docs.python.org/dev/whatsnew/3.9.html#id3

For a few internal C API, I replaced PyAPI_FUNC() with extern to
ensure that they cannot be used outside CPython code base: Python 3.9
is now built with -fvisibility=hidden on compilers supporting it (like
GCC and clang).

I also *added* a bunch of *new* "getter" or "setter" functions to the
public C API for my project of hiding implementation details, like
making structures opaque:
https://docs.python.org/dev/whatsnew/3.9.html#id1

For example, I added PyThreadState_GetInterpreter() which replaces
"tstate->interp", to prepare C extensions for an opaque PyThreadState
structure.

The other 4 new Python 3.9 functions:

* PyObject_CallNoArgs(): "most efficient way to call a callable Python
object without any argument"
* PyModule_AddType(): "adding a type to a module". I hate the
PyObject_AddObject() function which steals a reference on success.
* PyObject_GC_IsTracked() and PyObject_GC_IsFinalized(): "query if
Python objects are being currently tracked or have been already
finalized by the garbage collector respectively": functions requested
in bpo-40241.

Would you mind to elaborate why you consider that these functions must
not be added to Python 3.9?


> Every one of these functions represents a maintenance burden.
> Removing them is painful and takes a lot of effort, but adding them is
> done casually, without a PEP or, in many cases, even a review.

For the new functions related to hiding implementation details, I have
a draft PEP:
https://github.com/vstinner/misc/blob/master/cpython/pep-opaque-c-api.rst

But it seems like this PEP is trying to solve too many problems in a
single document, and that I have to split it into multiple PEPs.


> Why are these functions being added? Wasn't 1000 C functions enough?

My PEP lists flaws of the existing C API functions. Sadly, fixing
flaws requires adding new functions and deprecating old ones in a slow
migration.

I'm open to ideas how to fix these flaws differently (without having
new functions?).

As written in my PEP, another approach is to design a new C API on top
of the existing one. That's exactly what the HPy project does. But my
PEP also explains why I consider that it only fixes a subset of the
issues that I listed. ;-)
https://github.com/vstinner/misc/blob/master/cpython/pep-opaque-c-api.rst#hpy-project

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/2FSMLZ22XJXGSQQHXDSZHOFOVPETPVWS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Can we stop adding to the C API, please?

2020-06-03 Thread Simon Cross
Maybe we can have a two-for-one special? You can add a new function to the
API if you deprecate two.
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/XE6ZO5Z4LKTJBVE3P77AMLR5SDQ2RQXA/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Can we stop adding to the C API, please?

2020-06-03 Thread Mark Shannon

Hi Victor,

On 03/06/2020 2:42 pm, Victor Stinner wrote:

Hi,

In Python 3.9, I *removed* dozens of functions from the *public* C
API, or moved them to the "internal" C API:
https://docs.python.org/dev/whatsnew/3.9.html#id3

For a few internal C API, I replaced PyAPI_FUNC() with extern to
ensure that they cannot be used outside CPython code base: Python 3.9
is now built with -fvisibility=hidden on compilers supporting it (like
GCC and clang).

I also *added* a bunch of *new* "getter" or "setter" functions to the
public C API for my project of hiding implementation details, like
making structures opaque:
https://docs.python.org/dev/whatsnew/3.9.html#id1


Adding "setters" is generally a bad idea.
"getters" can be computed if the underlying field disappears, but the 
same may not be true for setters if the relation is not one-to-one.
I don't think there are any new setters in 3.9, so it's not an immediate 
problem.




For example, I added PyThreadState_GetInterpreter() which replaces
"tstate->interp", to prepare C extensions for an opaque PyThreadState
structure.


`PyThreadState_GetInterpreter()` can't replace `tstate->interp` for two 
reasons.
1. There is no way to stop third party C code accessing the internals of 
data structures. We can warn them not to, but that's all.
2. The internal layout of C structures has never been part of the API, 
with arguably two exceptions; the PyTypeObject struct and the 
`ob_refcnt` field of PyObject.




The other 4 new Python 3.9 functions:

* PyObject_CallNoArgs(): "most efficient way to call a callable Python
object without any argument"
* PyModule_AddType(): "adding a type to a module". I hate the
PyObject_AddObject() function which steals a reference on success.
* PyObject_GC_IsTracked() and PyObject_GC_IsFinalized(): "query if
Python objects are being currently tracked or have been already
finalized by the garbage collector respectively": functions requested
in bpo-40241.

Would you mind to elaborate why you consider that these functions must
not be added to Python 3.9?


I'm not saying that no C functions should be added to the API. I am 
saying that none should be added without a PEP or proper review.


Addressing the four function you list.

PyObject_CallNoArgs() seems harmless.
Rationalizing the call API has merit, but PyObject_CallNoArgs()
leads to PyObject_CallOneArg(), PyObject_CallTwoArgs(), etc. and an even 
larger API.


PyModule_AddType(). This seems perfectly reasonable, although if it is a 
straight replacement for another function, that other function should be 
deprecated.


PyObject_GC_IsTracked(). I don't like this.
Shouldn't GC track *all* objects?
Even if it were named PyObject_Cycle_GC_IsTracked() it would be exposing 
internal implementation details for no good reason. A cycle GC that 
doesn't "track" individual objects, but treats all objects the same 
could be more efficient. In which case, what would this mean?


What is the purpose of PyObject_GC_IsFinalized()?
Third party objects can easily tell if they have been finalized.
Why they would ever need this information is a mystery to me.





Every one of these functions represents a maintenance burden.
Removing them is painful and takes a lot of effort, but adding them is
done casually, without a PEP or, in many cases, even a review.


For the new functions related to hiding implementation details, I have
a draft PEP:
https://github.com/vstinner/misc/blob/master/cpython/pep-opaque-c-api.rst

But it seems like this PEP is trying to solve too many problems in a
single document, and that I have to split it into multiple PEPs.



It does need splitting up, I agree.




Why are these functions being added? Wasn't 1000 C functions enough?


My PEP lists flaws of the existing C API functions. Sadly, fixing
flaws requires adding new functions and deprecating old ones in a slow
migration.


IMO, at least one function should be deprecated for each new function 
added. That way the API won't get any bigger.


Cheers,
Mark.



I'm open to ideas how to fix these flaws differently (without having
new functions?). >
As written in my PEP, another approach is to design a new C API on top
of the existing one. That's exactly what the HPy project does. But my
PEP also explains why I consider that it only fixes a subset of the
issues that I listed. ;-)
https://github.com/vstinner/misc/blob/master/cpython/pep-opaque-c-api.rst#hpy-project

Victor


___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/5WLOHBCSFVMZJEFSJSKQQANZASU2WFV3/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Can we stop adding to the C API, please?

2020-06-03 Thread Pablo Galindo Salgado
Just some comments on the GC stuff as I added them myself.

> Shouldn't GC track *all* objects?
No, extension types need to opt-in to the garbage collector and if so,
implement the interface.

> Even if it were named PyObject_Cycle_GC_IsTracked() it would be exposing
internal implementation details for no good reason.

In python, there is gc.is_tracked() in Python 3.1 and the GC module already
exposes a lot
of GC functionality since many versions ago. This just allows the same
calls that you can do
in Python using the C-API.

>What is the purpose of PyObject_GC_IsFinalized()?

Because some objects may have been resurrected and this allows you to know
if a given object
has already been finalized. This can help to gather advance GC stats, to
control some tricky situations
with finalizers and the gc in C extensions or just to know all objects that
are being resurrected. Note that
an equivalent gc.is_finalized() was added in 3.8 as well to query this
information from Python in the GC module
and this call just allows you to do the same from the C-API.

Cheers,
Pablo

On Wed, 3 Jun 2020 at 18:26, Mark Shannon  wrote:

> Hi Victor,
>
> On 03/06/2020 2:42 pm, Victor Stinner wrote:
> > Hi,
> >
> > In Python 3.9, I *removed* dozens of functions from the *public* C
> > API, or moved them to the "internal" C API:
> > https://docs.python.org/dev/whatsnew/3.9.html#id3
> >
> > For a few internal C API, I replaced PyAPI_FUNC() with extern to
> > ensure that they cannot be used outside CPython code base: Python 3.9
> > is now built with -fvisibility=hidden on compilers supporting it (like
> > GCC and clang).
> >
> > I also *added* a bunch of *new* "getter" or "setter" functions to the
> > public C API for my project of hiding implementation details, like
> > making structures opaque:
> > https://docs.python.org/dev/whatsnew/3.9.html#id1
>
> Adding "setters" is generally a bad idea.
> "getters" can be computed if the underlying field disappears, but the
> same may not be true for setters if the relation is not one-to-one.
> I don't think there are any new setters in 3.9, so it's not an immediate
> problem.
>
> >
> > For example, I added PyThreadState_GetInterpreter() which replaces
> > "tstate->interp", to prepare C extensions for an opaque PyThreadState
> > structure.
>
> `PyThreadState_GetInterpreter()` can't replace `tstate->interp` for two
> reasons.
> 1. There is no way to stop third party C code accessing the internals of
> data structures. We can warn them not to, but that's all.
> 2. The internal layout of C structures has never been part of the API,
> with arguably two exceptions; the PyTypeObject struct and the
> `ob_refcnt` field of PyObject.
>
> >
> > The other 4 new Python 3.9 functions:
> >
> > * PyObject_CallNoArgs(): "most efficient way to call a callable Python
> > object without any argument"
> > * PyModule_AddType(): "adding a type to a module". I hate the
> > PyObject_AddObject() function which steals a reference on success.
> > * PyObject_GC_IsTracked() and PyObject_GC_IsFinalized(): "query if
> > Python objects are being currently tracked or have been already
> > finalized by the garbage collector respectively": functions requested
> > in bpo-40241.
> >
> > Would you mind to elaborate why you consider that these functions must
> > not be added to Python 3.9?
>
> I'm not saying that no C functions should be added to the API. I am
> saying that none should be added without a PEP or proper review.
>
> Addressing the four function you list.
>
> PyObject_CallNoArgs() seems harmless.
> Rationalizing the call API has merit, but PyObject_CallNoArgs()
> leads to PyObject_CallOneArg(), PyObject_CallTwoArgs(), etc. and an even
> larger API.
>
> PyModule_AddType(). This seems perfectly reasonable, although if it is a
> straight replacement for another function, that other function should be
> deprecated.
>
> PyObject_GC_IsTracked(). I don't like this.
> Shouldn't GC track *all* objects?
> Even if it were named PyObject_Cycle_GC_IsTracked() it would be exposing
> internal implementation details for no good reason. A cycle GC that
> doesn't "track" individual objects, but treats all objects the same
> could be more efficient. In which case, what would this mean?
>
> What is the purpose of PyObject_GC_IsFinalized()?
> Third party objects can easily tell if they have been finalized.
> Why they would ever need this information is a mystery to me.
>
> >
> >
> >> Every one of these functions represents a maintenance burden.
> >> Removing them is painful and takes a lot of effort, but adding them is
> >> done casually, without a PEP or, in many cases, even a review.
> >
> > For the new functions related to hiding implementation details, I have
> > a draft PEP:
> >
> https://github.com/vstinner/misc/blob/master/cpython/pep-opaque-c-api.rst
> >
> > But it seems like this PEP is trying to solve too many problems in a
> > single document, and that I have to split it into multiple PEPs.
> >
>
> It does need splitting up, I agr

[Python-Dev] Re: Can we stop adding to the C API, please?

2020-06-03 Thread Victor Stinner
Le mer. 3 juin 2020 à 19:17, Mark Shannon  a écrit :
> > I also *added* a bunch of *new* "getter" or "setter" functions to the
> > public C API for my project of hiding implementation details, like
> > making structures opaque:
> > https://docs.python.org/dev/whatsnew/3.9.html#id1
>
> Adding "setters" is generally a bad idea.
> "getters" can be computed if the underlying field disappears, but the
> same may not be true for setters if the relation is not one-to-one.
> I don't think there are any new setters in 3.9, so it's not an immediate
> problem.

You're making the assumption that the member can be set directly. But
my plan is to make the structure opaque. In that case, you need
getters and setters for all fields you would like to access. No member
would be accessible directly anymore.

> `PyThreadState_GetInterpreter()` can't replace `tstate->interp` for two
> reasons.
> 1. There is no way to stop third party C code accessing the internals of
> data structures. We can warn them not to, but that's all.
> 2. The internal layout of C structures has never been part of the API,
> with arguably two exceptions; the PyTypeObject struct and the
> `ob_refcnt` field of PyObject.

My long term plan is to make all structures opaque :-) So far,
PyInterpreterState structure was made opaque in Python 3.7. It helped
*a lot* the development of Python 3.8 and 3.9, especially for
subinterpreters. And I made PyGC_Head opaque in Python 3.9.

Examples of issues to make structures opaque:

PyGC_Head: https://bugs.python.org/issue40241 (done in Python 3.9)
PyObject: https://bugs.python.org/issue39573
PyTypeObject: https://bugs.python.org/issue40170
PyThreadState: https://bugs.python.org/issue39947
PyInterpreterState: https://bugs.python.org/issue35886 (done in Python 3.8)

For the short term, my plan is to make structure opaque in the limited
C API, before breaking more stuff in the public C API :-)


> PyObject_CallNoArgs() seems harmless.
> Rationalizing the call API has merit, but PyObject_CallNoArgs()
> leads to PyObject_CallOneArg(), PyObject_CallTwoArgs(), etc. and an even
> larger API.

PyObject_CallOneArg() also exists:
https://docs.python.org/dev/c-api/call.html#c.PyObject_CallOneArg

It was added as a private function https://bugs.python.org/issue37483
add made public in commit 3f563cea567fbfed9db539ecbbacfee2d86f7735
"bpo-39245: Make Vectorcall C API public (GH-17893)".

But it's missing in What's New in Python 3.9.

There is no plan for two or more arguments.


> PyObject_GC_IsTracked(). I don't like this.
> Shouldn't GC track *all* objects?
> Even if it were named PyObject_Cycle_GC_IsTracked() it would be exposing
> internal implementation details for no good reason. A cycle GC that
> doesn't "track" individual objects, but treats all objects the same
> could be more efficient. In which case, what would this mean?
>
> What is the purpose of PyObject_GC_IsFinalized()?
> Third party objects can easily tell if they have been finalized.
> Why they would ever need this information is a mystery to me.

Did you read the issues which added these functions to see the
rationale? https://bugs.python.org/issue40241

I like the "(Contributed by xxx in bpo-xxx.)" in What's New in Python
3.9: it became trivial to find such rationale.

Victor
-- 
Night gathers, and now my watch begins. It shall not end until my death.
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/QZ2Q7ELTDZUQLVS54T53CPEINWNQB6HF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Can we stop adding to the C API, please?

2020-06-03 Thread Nathaniel Smith
On Wed, Jun 3, 2020 at 2:10 PM Victor Stinner  wrote:
> For the short term, my plan is to make structure opaque in the limited
> C API, before breaking more stuff in the public C API :-)

But you're also breaking the public C API:
https://github.com/MagicStack/immutables/issues/46
https://github.com/pycurl/pycurl/pull/636

I'm not saying you're wrong to do so, I'm just confused about whether
your plan is to break stuff or not and on which timescale.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/UMGH7AOPW25IXZ7IWD73EKSVYY6ROCLC/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Can we stop adding to the C API, please?

2020-06-03 Thread Gregory P. Smith
On Wed, Jun 3, 2020 at 2:13 PM Victor Stinner  wrote:

> Le mer. 3 juin 2020 à 19:17, Mark Shannon  a écrit :
> > > I also *added* a bunch of *new* "getter" or "setter" functions to the
> > > public C API for my project of hiding implementation details, like
> > > making structures opaque:
> > > https://docs.python.org/dev/whatsnew/3.9.html#id1
> >
> > Adding "setters" is generally a bad idea.
> > "getters" can be computed if the underlying field disappears, but the
> > same may not be true for setters if the relation is not one-to-one.
> > I don't think there are any new setters in 3.9, so it's not an immediate
> > problem.
>
> You're making the assumption that the member can be set directly. But
> my plan is to make the structure opaque. In that case, you need
> getters and setters for all fields you would like to access. No member
> would be accessible directly anymore.
>
> > `PyThreadState_GetInterpreter()` can't replace `tstate->interp` for two
> > reasons.
> > 1. There is no way to stop third party C code accessing the internals of
> > data structures. We can warn them not to, but that's all.
> > 2. The internal layout of C structures has never been part of the API,
> > with arguably two exceptions; the PyTypeObject struct and the
> > `ob_refcnt` field of PyObject.
>
> My long term plan is to make all structures opaque :-) So far,
> PyInterpreterState structure was made opaque in Python 3.7. It helped
> *a lot* the development of Python 3.8 and 3.9, especially for
> subinterpreters. And I made PyGC_Head opaque in Python 3.9.
>
> Examples of issues to make structures opaque:
>
> PyGC_Head: https://bugs.python.org/issue40241 (done in Python 3.9)
> PyObject: https://bugs.python.org/issue39573
> PyTypeObject: https://bugs.python.org/issue40170
> PyThreadState: https://bugs.python.org/issue39947
> PyInterpreterState: https://bugs.python.org/issue35886 (done in Python
> 3.8)
>
> For the short term, my plan is to make structure opaque in the limited
> C API, before breaking more stuff in the public C API :-)
>

Indeed, your plan and the work you've been doing and discussing with other
core devs about this (including at multiple sprints and summits) over the
past 4+ years is the right one.  Our reliance on structs and related cpp
macros unfortunately exposed as public is a burden that freezes reasonable
CPython VM implementation evolution options.  This work moves us away from
that into a better place one step at a time without mass disruption.

More prior references related to this work are critical reading and should
not be overlooked:

[2017 "Keeping Python Competitive"] https://lwn.net/Articles/723949/
[2018 "Lets change the C API" thread]
https://mail.python.org/archives/list/[email protected]/thread/B67MYCAO4H4AJNMLSWVT3UVFTHSDGQRB/#B67MYCAO4H4AJNMLSWVT3UVFTHSDGQRB
[2019 "The C API"]
https://pyfound.blogspot.com/2019/06/python-language-summit-lightning-talks-part-2.html
[2020-04 "PEP: Modify the C API to hide implementation details" thread -
with a lot of links to much earlier 2017 and such references]
https://mail.python.org/archives/list/[email protected]/thread/HKM774XKU7DPJNLUTYHUB5U6VR6EQMJF/#HKM774XKU7DPJNLUTYHUB5U6VR6EQMJF
and Victors overall https://pythoncapi.readthedocs.io/roadmap.html as
referenced a few places in those.

It is also worth paying attention to the
https://mail.python.org/archives/list/[email protected]/latest mailing
list for anyone with a CPython C API interest.

-gps


>
> > PyObject_CallNoArgs() seems harmless.
> > Rationalizing the call API has merit, but PyObject_CallNoArgs()
> > leads to PyObject_CallOneArg(), PyObject_CallTwoArgs(), etc. and an even
> > larger API.
>
> PyObject_CallOneArg() also exists:
> https://docs.python.org/dev/c-api/call.html#c.PyObject_CallOneArg
>
> It was added as a private function https://bugs.python.org/issue37483
> add made public in commit 3f563cea567fbfed9db539ecbbacfee2d86f7735
> "bpo-39245: Make Vectorcall C API public (GH-17893)".
>
> But it's missing in What's New in Python 3.9.
>
> There is no plan for two or more arguments.
>
>
> > PyObject_GC_IsTracked(). I don't like this.
> > Shouldn't GC track *all* objects?
> > Even if it were named PyObject_Cycle_GC_IsTracked() it would be exposing
> > internal implementation details for no good reason. A cycle GC that
> > doesn't "track" individual objects, but treats all objects the same
> > could be more efficient. In which case, what would this mean?
> >
> > What is the purpose of PyObject_GC_IsFinalized()?
> > Third party objects can easily tell if they have been finalized.
> > Why they would ever need this information is a mystery to me.
>
> Did you read the issues which added these functions to see the
> rationale? https://bugs.python.org/issue40241
>
> I like the "(Contributed by xxx in bpo-xxx.)" in What's New in Python
> 3.9: it became trivial to find such rationale.
>
> Victor
> --
> Night gathers, and now my watch begins. It shall not end until my death.
> ___