[Python-Dev] C-level duck typing

2012-05-16 Thread Dag Sverre Seljebotn

Hi python-dev,

these ideas/questions comes out of the Cython and NumPy developer lists.

What we want is a way to communicate things on the C level about the 
extension type instances we pass around. The solution today is often to 
rely on PyObject_TypeCheck. For instance, hundreds of handcrafted C 
extensions rely on the internal structure of NumPy arrays, and Cython 
will check whether objects are instances of a Cython class or not.


However, this creates one-to-many situations; only one implementor of an 
object API/ABI, but many consumers. What we would like is multiple 
implementors and multiple consumers of mutually agreed-upon standards. 
We essentially want more duck typing on the C level.


A similar situation was PEP 3118. But there's many more such things one 
might want to communicate at the C level, many of which are very 
domain-specific and not suitable for a PEP at all. Also PEPs don't 
backport well to older versions of Python.


What we *think* we would like (but we want other suggestions!) is an 
arbitrarily extensible type object, without tying this into the type 
hierarchy. Say you have


typedef struct {
unsigned long extension_id;
void *data;
} PyTypeObjectExtensionEntry;

and then a type object can (somehow!) point to an array of these. The 
array is linearly scanned by consumers for IDs they recognize (most 
types would only have one or two entries). Cython could then get a 
reserved ID space to communicate whatever it wants, NumPy another one, 
and there could be "unofficial PEPs" where two or more projects get 
together to draft a spec for a particular type extension ID without 
having to bother python-dev about it.


And, we want this to somehow work with existing Python; we still support 
users on Python 2.4.


Options we've thought of so far:

 a) Use dicts and capsules to get information across. But 
performance-wise the dict lookup is not an option for what we want to 
use this for in Cython.


 b) Implement a metaclass which extends PyTypeObject in this way. 
However, that means a common runtime dependency for libraries that want 
to use this scheme, which is a big disadvantage to us. Today, Cython 
doesn't ship a runtime library but only creates standalone compileable C 
files, and there's no dependency from NumPy on Cython or the other way 
around.


 c) Hijack a free bit in tp_flags (22?) which we use to indicate that 
the PyTypeObject struct is immediately followed by a pointer to such an 
array.


The final approach is drafted in more detail at 
http://wiki.cython.org/enhancements/cep1001 . To us that looks very 
attractive both for the speed and for the lack of runtime dependencies, 
and it seems like it should work in existing versions of Python. But do 
please feel free to tell us we are misguided. Hijacking a flag bit 
certainly feels dirty.


Examples of how this would be used:

 - In Cython, we'd like to use this to annotate callable objects that 
happen to wrap a C function with their corresponding C function 
pointers. That way, callables that wrap a C function could be "unboxed", 
so that Cython could "cast" the Python object "scipy.special.gamma" to a 
function pointer at runtime and speed up the call with an order of 
magnitude. SciPy and Cython just needs to agree on a spec.


 - Lots of C extensions rely on using PyObject_TypeCheck (or even do an 
exact check) before calling the NumPy C API with PyArrayObject* 
arguments. This means that new features all have to go into NumPy; it is 
rather difficult to create new experimental array libraries. Extensible 
PyTypeObject would open up the way for other experimental array 
libraries; NumPy could make the standards, but others implement them 
(without getting NumPy as a runtime dependency, which is the consequence 
of subclassing). Of course, porting over the hundreds (thousands?) of 
extensions relying on the NumPy C API is a lot of work, but we can at 
least get started...


Ideas?

Dag Sverre Seljebotn
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] devguide: Add VS2010 link and text, then restructure a few things

2012-05-16 Thread Martin v. Löwis

+All versions previous to 3.3 use Microsoft Visual Studio 2008, available at
+https://www.microsoft.com/visualstudio/en-us/products/2008-editions/express.


This isn't actually the case. 2.4 and 2.5 used Visual Studio 2003, 2.0 
to 2.3 used VC6, 1.4 and 1.5 used Visual C++ 1.5; versions before that

were available only from Mark Hammond.

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Nick Coghlan
Use PyObject_HasAttr, just as people use hasattr() for ducktyping in Python.

If you want something more structured, use Abstract Base Classes,
that's what they're for.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Dag Sverre Seljebotn

On 05/16/2012 10:11 AM, Nick Coghlan wrote:

Use PyObject_HasAttr, just as people use hasattr() for ducktyping in Python.


In the Cython wrap-function-pointers case we really want performance 
comparable to C, so we couldn't do the whole thing.


But I guess we could intern some char* (somehow), pass that to 
tp_getattr, and then cast the returned PyObject* (which would not be a 
PyObject*) to a custom struct. As long as the interned strings can never 
be reached from Python that's almost safe. It's still slight abuse of 
tp_getattr.


As I said, if we didn't worry about performance we'd just retrieve 
capsules through attributes.


Dag




If you want something more structured, use Abstract Base Classes,
that's what they're for.

Cheers,
Nick.



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Martin v. Löwis

> And, we want this to somehow work with existing Python; we still
> support users on Python 2.4.

This makes the question out-of-scope for python-dev - we only discuss
new versions of Python here. Old versions cannot be developed anymore
(as they are released already).


typedef struct {
unsigned long extension_id;
void *data;
} PyTypeObjectExtensionEntry;

and then a type object can (somehow!) point to an array of these. The
array is linearly scanned


It's unclear to me why you think that a linear scan is faster than
a dictionary lookup. The contrary will be the case - the dictionary
lookup (PyObject_GetAttr) will be much faster.

Just make sure to use interned strings, and to cache the interned strings.

Regards,
Martin

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Dag Sverre Seljebotn

On 05/16/2012 10:36 AM, "Martin v. Löwis" wrote:

 > And, we want this to somehow work with existing Python; we still
 > support users on Python 2.4.

This makes the question out-of-scope for python-dev - we only discuss
new versions of Python here. Old versions cannot be developed anymore
(as they are released already).


Point taken. Sorry about that, and I appreciate your patience with me.

I guess my idea was that if some mechanism was approved for future 
Python versions, we would feel easier about hacking around older Python 
versions.


Of course, nothing is better than this not being a problem, as you seem 
to suggest. But:





typedef struct {
unsigned long extension_id;
void *data;
} PyTypeObjectExtensionEntry;

and then a type object can (somehow!) point to an array of these. The
array is linearly scanned


It's unclear to me why you think that a linear scan is faster than
a dictionary lookup. The contrary will be the case - the dictionary
lookup (PyObject_GetAttr) will be much faster.


I've benchmarked using a PyObject* as a function-pointer-capsule using 
the above mechanism; that added about 2-3 nanoseconds of overhead on 
what would be a 5 nanosecond call in pure C. (There will only be 1 or 2 
entries in that list...)


Dict lookups are about 18 nanoseconds for me, using interned string 
objects (see below). Perhaps that can be reduced somewhat, but I highly 
doubt you'll get to 3-4 nanoseconds?


Cython benchmark (which does translate do what you'd do in C):

def hammer_dict(int n):
cdef dict the_dict

a = "hello"
b = "there"
the_dict = {a : a, b : a}
for i in range(n):
the_dict[b]

Dag
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Stefan Behnel
"Martin v. Löwis", 16.05.2012 10:36:
>> And, we want this to somehow work with existing Python; we still
>> support users on Python 2.4.
> 
> This makes the question out-of-scope for python-dev - we only discuss
> new versions of Python here. Old versions cannot be developed anymore
> (as they are released already).

Well, it's in scope because CPython would have to support this in a future
version, or at least have to make sure it knows about it so that it can
stay out of the way (depending on how the solution eventually ends up
working). We're also very much interested in input from the CPython core
developers regarding the design, because we think that this should become a
general feature of the Python platform (potentially also for PyPy etc.).

The fact that we need to support it in older CPython versions is also
relevant, because the solution we choose shouldn't conflict with older
versions. The fact that they are no longer developed actually helps,
because it will prevent them from interfering in the future any more than
they do now.


>> typedef struct {
>> unsigned long extension_id;
>> void *data;
>> } PyTypeObjectExtensionEntry;
>>
>> and then a type object can (somehow!) point to an array of these. The
>> array is linearly scanned
> 
> It's unclear to me why you think that a linear scan is faster than
> a dictionary lookup. The contrary will be the case - the dictionary
> lookup (PyObject_GetAttr) will be much faster.

Agreed in general, but in this case, it's really not that easy. A C
function call involves a certain overhead all by itself, so calling into
the C-API multiple times may be substantially more costly than, say,
calling through a function pointer once and then running over a returned C
array comparing numbers. And definitely way more costly than running over
an array that the type struct points to directly. We are not talking about
hundreds of entries here, just a few. A linear scan in 64 bit steps over
something like a hundred bytes in the L1 cache should hardly be measurable.

This might sound like a premature micro optimisation, but these things can
quickly add up, e.g. when running a user provided function over a large
array. (And, yes, we'd try to do caching and all that...)

Stefan

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Martin v. Löwis

Agreed in general, but in this case, it's really not that easy. A C
function call involves a certain overhead all by itself, so calling into
the C-API multiple times may be substantially more costly than, say,
calling through a function pointer once and then running over a returned C
array comparing numbers. And definitely way more costly than running over
an array that the type struct points to directly. We are not talking about
hundreds of entries here, just a few. A linear scan in 64 bit steps over
something like a hundred bytes in the L1 cache should hardly be measurable.


I give up, then. I fail to understand the problem. Apparently, you want
to do something with the value you get from this lookup operation, but
that something won't involve function calls (or else the function call
overhead for the lookup wouldn't be relevant).

I still think this is out of scope for python-dev. If this is something
you want to be able to do for Python 2.4 as well, then you don't need
any change to Python - you can do whatever you come up with for all
Python versions, no need to (or point in) changing Python 3.4 (say).

As this is apparently only relevant to speed fanatics, too, I suggest 
that you check how fast PyPI works for you. Supposedly, they have very

efficient lookup procedures, supported by the JIT. If this doesn't work
for some reason, I suggest that you'll have to trade speed for
convenience: a compile-time fixed layout will beat any dynamic lookup
any time. Just define a common base class, and have all interesting
types inherit from it.

Regards,
Martin


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Mark Shannon

Martin v. Löwis wrote:

 > And, we want this to somehow work with existing Python; we still
 > support users on Python 2.4.

This makes the question out-of-scope for python-dev - we only discuss
new versions of Python here. Old versions cannot be developed anymore
(as they are released already).


typedef struct {
unsigned long extension_id;
void *data;
} PyTypeObjectExtensionEntry;

and then a type object can (somehow!) point to an array of these. The
array is linearly scanned


It's unclear to me why you think that a linear scan is faster than
a dictionary lookup. The contrary will be the case - the dictionary
lookup (PyObject_GetAttr) will be much faster.


PyObject_GetAttr does a lot more than just a dictionary lookup.

Perhaps making _PyType_Lookup() public might provide what it is needed?



Just make sure to use interned strings, and to cache the interned strings.

Regards,
Martin

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/mark%40hotpy.org


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Dag Sverre Seljebotn

On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote:

Agreed in general, but in this case, it's really not that easy. A C
function call involves a certain overhead all by itself, so calling into
the C-API multiple times may be substantially more costly than, say,
calling through a function pointer once and then running over a
returned C
array comparing numbers. And definitely way more costly than running over
an array that the type struct points to directly. We are not talking
about
hundreds of entries here, just a few. A linear scan in 64 bit steps over
something like a hundred bytes in the L1 cache should hardly be
measurable.


I give up, then. I fail to understand the problem. Apparently, you want
to do something with the value you get from this lookup operation, but
that something won't involve function calls (or else the function call
overhead for the lookup wouldn't be relevant).


In our specific case the value would be an offset added to the 
PyObject*, and there we would find a pointer to a C function (together 
with a 64-bit signature), and calling that C function (after checking 
the 64 bit signature) is our final objective.



I still think this is out of scope for python-dev. If this is something
you want to be able to do for Python 2.4 as well, then you don't need
any change to Python - you can do whatever you come up with for all
Python versions, no need to (or point in) changing Python 3.4 (say).


We can go ahead and hijack tp_flags bit 22 to make things work in 
existing versions. But what if Python 3.8 then starts using that bit for 
something else?



As this is apparently only relevant to speed fanatics, too, I suggest
that you check how fast PyPI works for you. Supposedly, they have very
efficient lookup procedures, supported by the JIT. If this doesn't work
for some reason, I suggest that you'll have to trade speed for
convenience: a compile-time fixed layout will beat any dynamic lookup
any time. Just define a common base class, and have all interesting
types inherit from it.


Did you mean PyPy? Me and Stefan are Cython developers, so that's kind 
of our angle... And I'm a Cython developer because it solves a practical 
need (in my case in scientific computation), not because I think it's 
that beautiful. PyPy won't work for me (let's not go down that road now...)


Defining a common base class is what NumPy already does, and Cython 
would be forced to without this proposal. We just think it has 
significant disadvantages and were looking for something else.


Dag
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Stefan Behnel
Dag Sverre Seljebotn, 16.05.2012 12:48:
> On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote:
>>> Agreed in general, but in this case, it's really not that easy. A C
>>> function call involves a certain overhead all by itself, so calling into
>>> the C-API multiple times may be substantially more costly than, say,
>>> calling through a function pointer once and then running over a
>>> returned C
>>> array comparing numbers. And definitely way more costly than running over
>>> an array that the type struct points to directly. We are not talking
>>> about
>>> hundreds of entries here, just a few. A linear scan in 64 bit steps over
>>> something like a hundred bytes in the L1 cache should hardly be
>>> measurable.
>>
>> I give up, then. I fail to understand the problem. Apparently, you want
>> to do something with the value you get from this lookup operation, but
>> that something won't involve function calls (or else the function call
>> overhead for the lookup wouldn't be relevant).
> 
> In our specific case the value would be an offset added to the PyObject*,
> and there we would find a pointer to a C function (together with a 64-bit
> signature), and calling that C function (after checking the 64 bit
> signature) is our final objective.

I think the use case hasn't been communicated all that clearly yet. Let's
give it another try.

Imagine we have two sides, one that provides a callable and the other side
that wants to call it. Both sides are implemented in C, so the callee has a
C signature and the caller has the arguments available as C data types. The
signature may or may not match the argument types exactly (float vs.
double, int vs. long, ...), because the caller and the callee know nothing
about each other initially, they just happen to appear in the same program
at runtime. All they know is that they could call each other through Python
space, but that would require data conversion, tuple packing, calling,
tuple unpacking, data unpacking, and then potentially the same thing on the
way back. They want to avoid that overhead.

Now, the caller needs to figure out if the callee has a compatible
signature. The callee may provide more than one signature (i.e. more than
one C call entry point), perhaps because it is implemented to deal with
different input data types efficiently, or perhaps because it can
efficiently convert them to its expected input. So, there is a signature on
the caller side given by the argument types it holds, and a couple of
signature on the callee side that can accept different C data input. Then
the caller needs to find out which signatures there are and match them
against what it can efficiently call. It may even be a JIT compiler that
can generate an efficient call signature on the fly, given a suitable
signature on callee side.

An example for this is an algorithm that evaluates a user provided function
on a large NumPy array. The caller knows what array type it is operating
on, and the user provided function may be designed to efficiently operate
on arrays of int, float and double entries.

Does this use case make sense to everyone?

The reason why we are discussing this on python-dev is that we are looking
for a general way to expose these C level signatures within the Python
ecosystem. And Dag's idea was to expose them as part of the type object,
basically as an addition to the current Python level tp_call() slot.

Stefan

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Stefan Behnel
Stefan Behnel, 16.05.2012 13:13:
> Dag Sverre Seljebotn, 16.05.2012 12:48:
>> On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote:
 Agreed in general, but in this case, it's really not that easy. A C
 function call involves a certain overhead all by itself, so calling into
 the C-API multiple times may be substantially more costly than, say,
 calling through a function pointer once and then running over a
 returned C
 array comparing numbers. And definitely way more costly than running over
 an array that the type struct points to directly. We are not talking
 about
 hundreds of entries here, just a few. A linear scan in 64 bit steps over
 something like a hundred bytes in the L1 cache should hardly be
 measurable.
>>>
>>> I give up, then. I fail to understand the problem. Apparently, you want
>>> to do something with the value you get from this lookup operation, but
>>> that something won't involve function calls (or else the function call
>>> overhead for the lookup wouldn't be relevant).
>>
>> In our specific case the value would be an offset added to the PyObject*,
>> and there we would find a pointer to a C function (together with a 64-bit
>> signature), and calling that C function (after checking the 64 bit
>> signature) is our final objective.
> 
> I think the use case hasn't been communicated all that clearly yet. Let's
> give it another try.
> 
> Imagine we have two sides, one that provides a callable and the other side
> that wants to call it. Both sides are implemented in C, so the callee has a
> C signature and the caller has the arguments available as C data types. The
> signature may or may not match the argument types exactly (float vs.
> double, int vs. long, ...), because the caller and the callee know nothing
> about each other initially, they just happen to appear in the same program
> at runtime. All they know is that they could call each other through Python
> space, but that would require data conversion, tuple packing, calling,
> tuple unpacking, data unpacking, and then potentially the same thing on the
> way back. They want to avoid that overhead.
> 
> Now, the caller needs to figure out if the callee has a compatible
> signature. The callee may provide more than one signature (i.e. more than
> one C call entry point), perhaps because it is implemented to deal with
> different input data types efficiently, or perhaps because it can
> efficiently convert them to its expected input. So, there is a signature on
> the caller side given by the argument types it holds, and a couple of
> signature on the callee side that can accept different C data input. Then
> the caller needs to find out which signatures there are and match them
> against what it can efficiently call. It may even be a JIT compiler that
> can generate an efficient call signature on the fly, given a suitable
> signature on callee side.
> 
> An example for this is an algorithm that evaluates a user provided function
> on a large NumPy array. The caller knows what array type it is operating
> on, and the user provided function may be designed to efficiently operate
> on arrays of int, float and double entries.
> 
> Does this use case make sense to everyone?
> 
> The reason why we are discussing this on python-dev is that we are looking
> for a general way to expose these C level signatures within the Python
> ecosystem. And Dag's idea was to expose them as part of the type object,
> basically as an addition to the current Python level tp_call() slot.

... and to finish the loop that I started here (sorry for being verbose):

The proposal that Dag referenced describes a more generic way to make this
kind of extension to type objects from user code. Basically, it allows
implementers to say "my type object has capability X", in a C-ish kind of
way. And the above C signature protocol would be one of those capabilities.

Personally, I wouldn't mind making the specific signature extension a
proposal instead of asking for a general extension mechanism for arbitrary
capabilities (although that still sounds tempting).

Stefan

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] 64-bit Windows buildbots needed

2012-05-16 Thread Antoine Pitrou

Hello all,

We still need 64-bit Windows buildbots to test for regressions.
Otherwise we might let regressions slip through, since few people seem
to run the test suite under Windows at home.

Regards

Antoine.



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Mark Shannon

Stefan Behnel wrote:

Dag Sverre Seljebotn, 16.05.2012 12:48:

On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote:

Agreed in general, but in this case, it's really not that easy. A C
function call involves a certain overhead all by itself, so calling into
the C-API multiple times may be substantially more costly than, say,
calling through a function pointer once and then running over a
returned C
array comparing numbers. And definitely way more costly than running over
an array that the type struct points to directly. We are not talking
about
hundreds of entries here, just a few. A linear scan in 64 bit steps over
something like a hundred bytes in the L1 cache should hardly be
measurable.

I give up, then. I fail to understand the problem. Apparently, you want
to do something with the value you get from this lookup operation, but
that something won't involve function calls (or else the function call
overhead for the lookup wouldn't be relevant).

In our specific case the value would be an offset added to the PyObject*,
and there we would find a pointer to a C function (together with a 64-bit
signature), and calling that C function (after checking the 64 bit
signature) is our final objective.


I think the use case hasn't been communicated all that clearly yet. Let's
give it another try.

Imagine we have two sides, one that provides a callable and the other side
that wants to call it. Both sides are implemented in C, so the callee has a
C signature and the caller has the arguments available as C data types. The
signature may or may not match the argument types exactly (float vs.
double, int vs. long, ...), because the caller and the callee know nothing
about each other initially, they just happen to appear in the same program
at runtime. All they know is that they could call each other through Python
space, but that would require data conversion, tuple packing, calling,
tuple unpacking, data unpacking, and then potentially the same thing on the
way back. They want to avoid that overhead.

Now, the caller needs to figure out if the callee has a compatible
signature. The callee may provide more than one signature (i.e. more than
one C call entry point), perhaps because it is implemented to deal with
different input data types efficiently, or perhaps because it can
efficiently convert them to its expected input. So, there is a signature on
the caller side given by the argument types it holds, and a couple of
signature on the callee side that can accept different C data input. Then
the caller needs to find out which signatures there are and match them
against what it can efficiently call. It may even be a JIT compiler that
can generate an efficient call signature on the fly, given a suitable
signature on callee side.




An example for this is an algorithm that evaluates a user provided function
on a large NumPy array. The caller knows what array type it is operating
on, and the user provided function may be designed to efficiently operate
on arrays of int, float and double entries.


Given that use case, can I suggest the following:

Separate the discovery of the function from its use.
By this I mean first lookup the function (outside of the loop)
then use the function (inside the loop).

It would then be possible to lookup the function pointer, using the 
standard API, PyObject_GetAttr (or maybe _PyType_Lookup).
Then when it came to applying the function, the function pointer could 
be used directly.


To do this would require an extra builtin-function-like object, which
would wrap the C function pointer. Currently the builtin (C) function
type only supports a very limited range of types for the underlying 
function pointer.
For example, an extended builtin-function could support (among other 
types) the C function type double (*func)(double, double). The extended 
builtin-function would be a Python callable, but would allow C 
extensions such to NumPy to access the underlying C function directly.


The builtin-function declaration would consist of a pointer to the 
underlying function pointer and a type declaration which states which 
types it accepts. The VM would be responsible for any unboxing/boxing 
required.
E.g float.__add__ could be constructed from a very simple C function 
(that adds two doubles and returns a double) and a type declaration:

(cdouble, cdouble)->cdouble.

Allowable types would be intptr_t, double or PyObject* (and maybe char*)
PyObject* types could be further qualified with their Python type.
Not allowing char, short, unsigned etc may seem like its too 
restrictive, but it prevents an explosion of possible types.
Allowing only 3 C-level types and no more than 3 parameters (plus 
return) means that all 121 (3**4+3**3+3**2+3**1+3**0) permutations can 
be handled without resorting to ctypes/ffi.


Example usage:

typedef double (*ddd_func)(double, double);
ddd_func cfunc;
PyObject *func = PyType_Lookup(the_type, the_attribute);
if (Py_TYPE(func) == Py_ExtendedBuiltinFunction_Type &&
   

Re: [Python-Dev] [Python-checkins] devguide: Add VS2010 link and text, then restructure a few things

2012-05-16 Thread Brian Curtin
On Wed, May 16, 2012 at 2:52 AM, "Martin v. Löwis"  wrote:
>> +All versions previous to 3.3 use Microsoft Visual Studio 2008, available
>> at
>>
>> +https://www.microsoft.com/visualstudio/en-us/products/2008-editions/express.
>
>
> This isn't actually the case. 2.4 and 2.5 used Visual Studio 2003, 2.0 to
> 2.3 used VC6, 1.4 and 1.5 used Visual C++ 1.5; versions before that
> were available only from Mark Hammond.

I know *all* previous versions didn't use 2008 -- just all other
versions we still support and that people seem to be working on or
using. Anyway, I changed it from "all" to "most" in 0ac1d3863208.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Dag Sverre Seljebotn

On 05/16/2012 02:16 PM, Stefan Behnel wrote:

Stefan Behnel, 16.05.2012 13:13:

Dag Sverre Seljebotn, 16.05.2012 12:48:

On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote:

Agreed in general, but in this case, it's really not that easy. A C
function call involves a certain overhead all by itself, so calling into
the C-API multiple times may be substantially more costly than, say,
calling through a function pointer once and then running over a
returned C
array comparing numbers. And definitely way more costly than running over
an array that the type struct points to directly. We are not talking
about
hundreds of entries here, just a few. A linear scan in 64 bit steps over
something like a hundred bytes in the L1 cache should hardly be
measurable.


I give up, then. I fail to understand the problem. Apparently, you want
to do something with the value you get from this lookup operation, but
that something won't involve function calls (or else the function call
overhead for the lookup wouldn't be relevant).


In our specific case the value would be an offset added to the PyObject*,
and there we would find a pointer to a C function (together with a 64-bit
signature), and calling that C function (after checking the 64 bit
signature) is our final objective.


I think the use case hasn't been communicated all that clearly yet. Let's
give it another try.

Imagine we have two sides, one that provides a callable and the other side
that wants to call it. Both sides are implemented in C, so the callee has a
C signature and the caller has the arguments available as C data types. The
signature may or may not match the argument types exactly (float vs.
double, int vs. long, ...), because the caller and the callee know nothing
about each other initially, they just happen to appear in the same program
at runtime. All they know is that they could call each other through Python
space, but that would require data conversion, tuple packing, calling,
tuple unpacking, data unpacking, and then potentially the same thing on the
way back. They want to avoid that overhead.

Now, the caller needs to figure out if the callee has a compatible
signature. The callee may provide more than one signature (i.e. more than
one C call entry point), perhaps because it is implemented to deal with
different input data types efficiently, or perhaps because it can
efficiently convert them to its expected input. So, there is a signature on
the caller side given by the argument types it holds, and a couple of
signature on the callee side that can accept different C data input. Then
the caller needs to find out which signatures there are and match them
against what it can efficiently call. It may even be a JIT compiler that
can generate an efficient call signature on the fly, given a suitable
signature on callee side.

An example for this is an algorithm that evaluates a user provided function
on a large NumPy array. The caller knows what array type it is operating
on, and the user provided function may be designed to efficiently operate
on arrays of int, float and double entries.

Does this use case make sense to everyone?

The reason why we are discussing this on python-dev is that we are looking
for a general way to expose these C level signatures within the Python
ecosystem. And Dag's idea was to expose them as part of the type object,
basically as an addition to the current Python level tp_call() slot.


... and to finish the loop that I started here (sorry for being verbose):

The proposal that Dag referenced describes a more generic way to make this
kind of extension to type objects from user code. Basically, it allows
implementers to say "my type object has capability X", in a C-ish kind of
way. And the above C signature protocol would be one of those capabilities.

Personally, I wouldn't mind making the specific signature extension a
proposal instead of asking for a general extension mechanism for arbitrary
capabilities (although that still sounds tempting).


Here's some reasons for the generic proposal:

a) Avoid pre-mature PEP-ing. Look at PEP 3118 for instance; that would 
almost certainly had been better if there had been a few years of 
beta-testing in the wild among Cython and NumPy users.


I think PEP-ing the "nativecall" proposal soon (even in the unlikely 
event that it would be accepted) is bound to give suboptimal results -- 
it needs to be tested in the wild on Cython and SciPy users for a few 
years first. (Still, we can't ask those to recompile their Python.)


My proposal is then about allowing people to play with their own slots, 
and deploy that to users, without having to create a PEP for their 
specific usecase.


b) There's more than the "nativecall" we'd use this for in Cython. 
Something like compiled abstract base classes/compiled multiple 
inheritance/Go-style interfaces for instance. Some of those things we'd 
like to use it for certainly will never be a PEP.


c) Get NumPy users off their PyObject_TypeCheck habit, whi

Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Dag Sverre Seljebotn

On 05/16/2012 02:47 PM, Mark Shannon wrote:

Stefan Behnel wrote:

Dag Sverre Seljebotn, 16.05.2012 12:48:

On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote:

Agreed in general, but in this case, it's really not that easy. A C
function call involves a certain overhead all by itself, so calling
into
the C-API multiple times may be substantially more costly than, say,
calling through a function pointer once and then running over a
returned C
array comparing numbers. And definitely way more costly than
running over
an array that the type struct points to directly. We are not talking
about
hundreds of entries here, just a few. A linear scan in 64 bit steps
over
something like a hundred bytes in the L1 cache should hardly be
measurable.

I give up, then. I fail to understand the problem. Apparently, you want
to do something with the value you get from this lookup operation, but
that something won't involve function calls (or else the function call
overhead for the lookup wouldn't be relevant).

In our specific case the value would be an offset added to the
PyObject*,
and there we would find a pointer to a C function (together with a
64-bit
signature), and calling that C function (after checking the 64 bit
signature) is our final objective.


I think the use case hasn't been communicated all that clearly yet. Let's
give it another try.

Imagine we have two sides, one that provides a callable and the other
side
that wants to call it. Both sides are implemented in C, so the callee
has a
C signature and the caller has the arguments available as C data
types. The
signature may or may not match the argument types exactly (float vs.
double, int vs. long, ...), because the caller and the callee know
nothing
about each other initially, they just happen to appear in the same
program
at runtime. All they know is that they could call each other through
Python
space, but that would require data conversion, tuple packing, calling,
tuple unpacking, data unpacking, and then potentially the same thing
on the
way back. They want to avoid that overhead.

Now, the caller needs to figure out if the callee has a compatible
signature. The callee may provide more than one signature (i.e. more than
one C call entry point), perhaps because it is implemented to deal with
different input data types efficiently, or perhaps because it can
efficiently convert them to its expected input. So, there is a
signature on
the caller side given by the argument types it holds, and a couple of
signature on the callee side that can accept different C data input. Then
the caller needs to find out which signatures there are and match them
against what it can efficiently call. It may even be a JIT compiler that
can generate an efficient call signature on the fly, given a suitable
signature on callee side.




An example for this is an algorithm that evaluates a user provided
function
on a large NumPy array. The caller knows what array type it is operating
on, and the user provided function may be designed to efficiently operate
on arrays of int, float and double entries.


Given that use case, can I suggest the following:

Separate the discovery of the function from its use.
By this I mean first lookup the function (outside of the loop)
then use the function (inside the loop).


We would obviously do that when we can. But Cython is a compiler/code 
translator, and we don't control usecases. You can easily make up 
usecases (= Cython code people write) where you can't easily separate 
the two.


For instance, the Sage projects has hundreds of thousands of lines of 
object-oriented Cython code (NOT just array-oriented, but also graphs 
and trees and stuff), which is all based on Cython's own fast vtable 
dispatches a la C++. They might want to clean up their code and more 
generic callback objects some places.


Other users currently pass around C pointers for callback functions, and 
we'd like to tell them "pass around these nicer Python callables 
instead, honestly, the penalty is only 2 ns per call". (*Regardless* of 
how you use them, like making sure you use them in a loop where we can 
statically pull out the function pointer acquisition. Saying "this is 
only non-sluggish if you do x, y, z puts users off.)


I'm not asking you to consider the details of all that. Just to allow 
some kind of high-performance extensibility of PyTypeObject, so that we 
can *stop* bothering python-dev with specific requirements from our 
parallel universe of nearly-all-Cython-and-Fortran-and-C++ codebases :-)


Dag
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Mark Shannon

Dag Sverre Seljebotn wrote:

On 05/16/2012 02:47 PM, Mark Shannon wrote:

Stefan Behnel wrote:

Dag Sverre Seljebotn, 16.05.2012 12:48:

On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote:

Agreed in general, but in this case, it's really not that easy. A C
function call involves a certain overhead all by itself, so calling
into
the C-API multiple times may be substantially more costly than, say,
calling through a function pointer once and then running over a
returned C
array comparing numbers. And definitely way more costly than
running over
an array that the type struct points to directly. We are not talking
about
hundreds of entries here, just a few. A linear scan in 64 bit steps
over
something like a hundred bytes in the L1 cache should hardly be
measurable.
I give up, then. I fail to understand the problem. Apparently, you 
want

to do something with the value you get from this lookup operation, but
that something won't involve function calls (or else the function call
overhead for the lookup wouldn't be relevant).

In our specific case the value would be an offset added to the
PyObject*,
and there we would find a pointer to a C function (together with a
64-bit
signature), and calling that C function (after checking the 64 bit
signature) is our final objective.


I think the use case hasn't been communicated all that clearly yet. 
Let's

give it another try.

Imagine we have two sides, one that provides a callable and the other
side
that wants to call it. Both sides are implemented in C, so the callee
has a
C signature and the caller has the arguments available as C data
types. The
signature may or may not match the argument types exactly (float vs.
double, int vs. long, ...), because the caller and the callee know
nothing
about each other initially, they just happen to appear in the same
program
at runtime. All they know is that they could call each other through
Python
space, but that would require data conversion, tuple packing, calling,
tuple unpacking, data unpacking, and then potentially the same thing
on the
way back. They want to avoid that overhead.

Now, the caller needs to figure out if the callee has a compatible
signature. The callee may provide more than one signature (i.e. more 
than

one C call entry point), perhaps because it is implemented to deal with
different input data types efficiently, or perhaps because it can
efficiently convert them to its expected input. So, there is a
signature on
the caller side given by the argument types it holds, and a couple of
signature on the callee side that can accept different C data input. 
Then

the caller needs to find out which signatures there are and match them
against what it can efficiently call. It may even be a JIT compiler that
can generate an efficient call signature on the fly, given a suitable
signature on callee side.




An example for this is an algorithm that evaluates a user provided
function
on a large NumPy array. The caller knows what array type it is operating
on, and the user provided function may be designed to efficiently 
operate

on arrays of int, float and double entries.


Given that use case, can I suggest the following:

Separate the discovery of the function from its use.
By this I mean first lookup the function (outside of the loop)
then use the function (inside the loop).


We would obviously do that when we can. But Cython is a compiler/code 
translator, and we don't control usecases. You can easily make up 
usecases (= Cython code people write) where you can't easily separate 
the two.


For instance, the Sage projects has hundreds of thousands of lines of 
object-oriented Cython code (NOT just array-oriented, but also graphs 
and trees and stuff), which is all based on Cython's own fast vtable 
dispatches a la C++. They might want to clean up their code and more 
generic callback objects some places.


Other users currently pass around C pointers for callback functions, and 
we'd like to tell them "pass around these nicer Python callables 
instead, honestly, the penalty is only 2 ns per call". (*Regardless* of 
how you use them, like making sure you use them in a loop where we can 
statically pull out the function pointer acquisition. Saying "this is 
only non-sluggish if you do x, y, z puts users off.)


Why not pass around a PyCFunction object, instead of a C function
pointer. It contains two fields: the function pointer and the object 
(self), which is exactly what you want.


Of course, the PyCFunction object only allows a limited range of
function types, which is why I am suggesting a variant which supports a
wider range of C function pointer types.

Is a single extra indirection in obj->func() rather than func(),
really that inefficient?
If you are passing around raw pointers, you have already given up on
dynamic type checking.



I'm not asking you to consider the details of all that. Just to allow 
some kind of high-performance extensibility of PyTypeObject, so that we 
can *stop* bothering python-dev w

Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Robert Bradshaw
On Wed, May 16, 2012 at 8:40 AM, Mark Shannon  wrote:
> Dag Sverre Seljebotn wrote:
>>
>> On 05/16/2012 02:47 PM, Mark Shannon wrote:
>>>
>>> Stefan Behnel wrote:

 Dag Sverre Seljebotn, 16.05.2012 12:48:
>
> On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote:
>>>
>>> Agreed in general, but in this case, it's really not that easy. A C
>>> function call involves a certain overhead all by itself, so calling
>>> into
>>> the C-API multiple times may be substantially more costly than, say,
>>> calling through a function pointer once and then running over a
>>> returned C
>>> array comparing numbers. And definitely way more costly than
>>> running over
>>> an array that the type struct points to directly. We are not talking
>>> about
>>> hundreds of entries here, just a few. A linear scan in 64 bit steps
>>> over
>>> something like a hundred bytes in the L1 cache should hardly be
>>> measurable.
>>
>> I give up, then. I fail to understand the problem. Apparently, you
>> want
>> to do something with the value you get from this lookup operation, but
>> that something won't involve function calls (or else the function call
>> overhead for the lookup wouldn't be relevant).
>
> In our specific case the value would be an offset added to the
> PyObject*,
> and there we would find a pointer to a C function (together with a
> 64-bit
> signature), and calling that C function (after checking the 64 bit
> signature) is our final objective.


 I think the use case hasn't been communicated all that clearly yet.
 Let's
 give it another try.

 Imagine we have two sides, one that provides a callable and the other
 side
 that wants to call it. Both sides are implemented in C, so the callee
 has a
 C signature and the caller has the arguments available as C data
 types. The
 signature may or may not match the argument types exactly (float vs.
 double, int vs. long, ...), because the caller and the callee know
 nothing
 about each other initially, they just happen to appear in the same
 program
 at runtime. All they know is that they could call each other through
 Python
 space, but that would require data conversion, tuple packing, calling,
 tuple unpacking, data unpacking, and then potentially the same thing
 on the
 way back. They want to avoid that overhead.

 Now, the caller needs to figure out if the callee has a compatible
 signature. The callee may provide more than one signature (i.e. more
 than
 one C call entry point), perhaps because it is implemented to deal with
 different input data types efficiently, or perhaps because it can
 efficiently convert them to its expected input. So, there is a
 signature on
 the caller side given by the argument types it holds, and a couple of
 signature on the callee side that can accept different C data input.
 Then
 the caller needs to find out which signatures there are and match them
 against what it can efficiently call. It may even be a JIT compiler that
 can generate an efficient call signature on the fly, given a suitable
 signature on callee side.
>>>
>>>

 An example for this is an algorithm that evaluates a user provided
 function
 on a large NumPy array. The caller knows what array type it is operating
 on, and the user provided function may be designed to efficiently
 operate
 on arrays of int, float and double entries.
>>>
>>>
>>> Given that use case, can I suggest the following:
>>>
>>> Separate the discovery of the function from its use.
>>> By this I mean first lookup the function (outside of the loop)
>>> then use the function (inside the loop).
>>
>>
>> We would obviously do that when we can. But Cython is a compiler/code
>> translator, and we don't control usecases. You can easily make up usecases
>> (= Cython code people write) where you can't easily separate the two.
>>
>> For instance, the Sage projects has hundreds of thousands of lines of
>> object-oriented Cython code (NOT just array-oriented, but also graphs and
>> trees and stuff), which is all based on Cython's own fast vtable dispatches
>> a la C++. They might want to clean up their code and more generic callback
>> objects some places.
>>
>> Other users currently pass around C pointers for callback functions, and
>> we'd like to tell them "pass around these nicer Python callables instead,
>> honestly, the penalty is only 2 ns per call". (*Regardless* of how you use
>> them, like making sure you use them in a loop where we can statically pull
>> out the function pointer acquisition. Saying "this is only non-sluggish if
>> you do x, y, z puts users off.)
>
>
> Why not pass around a PyCFunction object, instead of a C function
> pointer. It contains two fields: the function pointer and the object (self),
> whi

Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Mark Shannon

Robert Bradshaw wrote:

On Wed, May 16, 2012 at 8:40 AM, Mark Shannon  wrote:

Dag Sverre Seljebotn wrote:

On 05/16/2012 02:47 PM, Mark Shannon wrote:

Stefan Behnel wrote:

Dag Sverre Seljebotn, 16.05.2012 12:48:

On 05/16/2012 11:50 AM, "Martin v. Löwis" wrote:

Agreed in general, but in this case, it's really not that easy. A C
function call involves a certain overhead all by itself, so calling
into
the C-API multiple times may be substantially more costly than, say,
calling through a function pointer once and then running over a
returned C
array comparing numbers. And definitely way more costly than
running over
an array that the type struct points to directly. We are not talking
about
hundreds of entries here, just a few. A linear scan in 64 bit steps
over
something like a hundred bytes in the L1 cache should hardly be
measurable.

I give up, then. I fail to understand the problem. Apparently, you
want
to do something with the value you get from this lookup operation, but
that something won't involve function calls (or else the function call
overhead for the lookup wouldn't be relevant).

In our specific case the value would be an offset added to the
PyObject*,
and there we would find a pointer to a C function (together with a
64-bit
signature), and calling that C function (after checking the 64 bit
signature) is our final objective.


I think the use case hasn't been communicated all that clearly yet.
Let's
give it another try.

Imagine we have two sides, one that provides a callable and the other
side
that wants to call it. Both sides are implemented in C, so the callee
has a
C signature and the caller has the arguments available as C data
types. The
signature may or may not match the argument types exactly (float vs.
double, int vs. long, ...), because the caller and the callee know
nothing
about each other initially, they just happen to appear in the same
program
at runtime. All they know is that they could call each other through
Python
space, but that would require data conversion, tuple packing, calling,
tuple unpacking, data unpacking, and then potentially the same thing
on the
way back. They want to avoid that overhead.

Now, the caller needs to figure out if the callee has a compatible
signature. The callee may provide more than one signature (i.e. more
than
one C call entry point), perhaps because it is implemented to deal with
different input data types efficiently, or perhaps because it can
efficiently convert them to its expected input. So, there is a
signature on
the caller side given by the argument types it holds, and a couple of
signature on the callee side that can accept different C data input.
Then
the caller needs to find out which signatures there are and match them
against what it can efficiently call. It may even be a JIT compiler that
can generate an efficient call signature on the fly, given a suitable
signature on callee side.



An example for this is an algorithm that evaluates a user provided
function
on a large NumPy array. The caller knows what array type it is operating
on, and the user provided function may be designed to efficiently
operate
on arrays of int, float and double entries.


Given that use case, can I suggest the following:

Separate the discovery of the function from its use.
By this I mean first lookup the function (outside of the loop)
then use the function (inside the loop).


We would obviously do that when we can. But Cython is a compiler/code
translator, and we don't control usecases. You can easily make up usecases
(= Cython code people write) where you can't easily separate the two.

For instance, the Sage projects has hundreds of thousands of lines of
object-oriented Cython code (NOT just array-oriented, but also graphs and
trees and stuff), which is all based on Cython's own fast vtable dispatches
a la C++. They might want to clean up their code and more generic callback
objects some places.

Other users currently pass around C pointers for callback functions, and
we'd like to tell them "pass around these nicer Python callables instead,
honestly, the penalty is only 2 ns per call". (*Regardless* of how you use
them, like making sure you use them in a loop where we can statically pull
out the function pointer acquisition. Saying "this is only non-sluggish if
you do x, y, z puts users off.)


Why not pass around a PyCFunction object, instead of a C function
pointer. It contains two fields: the function pointer and the object (self),
which is exactly what you want.

Of course, the PyCFunction object only allows a limited range of
function types, which is why I am suggesting a variant which supports a
wider range of C function pointer types.

Is a single extra indirection in obj->func() rather than func(),
really that inefficient?
If you are passing around raw pointers, you have already given up on
dynamic type checking.



I'm not asking you to consider the details of all that. Just to allow some
kind of high-performance extensibility of PyTypeO

Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Martin v. Löwis

In our specific case the value would be an offset added to the
PyObject*, and there we would find a pointer to a C function (together
with a 64-bit signature), and calling that C function (after checking
the 64 bit signature) is our final objective.


And what the C function does really is faster than the lookup through
a dictionary? I find that hard to believe.


I still think this is out of scope for python-dev. If this is something
you want to be able to do for Python 2.4 as well, then you don't need
any change to Python - you can do whatever you come up with for all
Python versions, no need to (or point in) changing Python 3.4 (say).


We can go ahead and hijack tp_flags bit 22 to make things work in
existing versions. But what if Python 3.8 then starts using that bit for
something else?


Use flag bit 23 in Python 3.8. You know at compile time what Python
version you have.




As this is apparently only relevant to speed fanatics, too, I suggest
that you check how fast PyPI works for you.


Did you mean PyPy?


Oops, yes - Freudian slip :-)

Regards,
Martin
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Martin v. Löwis

Does this use case make sense to everyone?

The reason why we are discussing this on python-dev is that we are looking
for a general way to expose these C level signatures within the Python
ecosystem. And Dag's idea was to expose them as part of the type object,
basically as an addition to the current Python level tp_call() slot.


The use case makes sense, yet there is also a long-standing solution 
already to expose APIs and function pointers: the capsule objects.


If you want to avoid dictionary lookups on the server side, implement
tp_getattro, comparing addresses of interned strings.

Regards,
Martin

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Robert Bradshaw
On Wed, May 16, 2012 at 11:33 AM, "Martin v. Löwis"  wrote:
>> Does this use case make sense to everyone?
>>
>> The reason why we are discussing this on python-dev is that we are looking
>> for a general way to expose these C level signatures within the Python
>> ecosystem. And Dag's idea was to expose them as part of the type object,
>> basically as an addition to the current Python level tp_call() slot.
>
> The use case makes sense, yet there is also a long-standing solution already
> to expose APIs and function pointers: the capsule objects.
>
> If you want to avoid dictionary lookups on the server side, implement
> tp_getattro, comparing addresses of interned strings.

Yes, that's an idea worth looking at. The point implementing
tp_getattro to avoid dictionary lookups overhead is a good one, worth
trying at least. One drawback is that this approach does require the
GIL (as does _PyType_Lookup).

Regarding the C function being faster than the dictionary lookup (or
at least close enough that the lookup takes time), yes, this happens
all the time. For example one might be solving differential equations
and the "user input" is essentially a set of (usually simple) double
f(double) and its derivatives.

- Robert
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Dag Sverre Seljebotn

On 05/16/2012 10:24 PM, Robert Bradshaw wrote:

On Wed, May 16, 2012 at 11:33 AM, "Martin v. Löwis"  wrote:

Does this use case make sense to everyone?

The reason why we are discussing this on python-dev is that we are looking
for a general way to expose these C level signatures within the Python
ecosystem. And Dag's idea was to expose them as part of the type object,
basically as an addition to the current Python level tp_call() slot.


The use case makes sense, yet there is also a long-standing solution already
to expose APIs and function pointers: the capsule objects.

If you want to avoid dictionary lookups on the server side, implement
tp_getattro, comparing addresses of interned strings.


Yes, that's an idea worth looking at. The point implementing
tp_getattro to avoid dictionary lookups overhead is a good one, worth
trying at least. One drawback is that this approach does require the
GIL (as does _PyType_Lookup).

Regarding the C function being faster than the dictionary lookup (or
at least close enough that the lookup takes time), yes, this happens
all the time. For example one might be solving differential equations
and the "user input" is essentially a set of (usually simple) double
f(double) and its derivatives.


To underline how this is performance critical to us, perhaps a full 
Cython example is useful.


The following Cython code is a real world usecase. It is not too 
contrived in the essentials, although simplified a little bit. For 
instance undergrad engineering students could pick up Cython just to 
play with simple scalar functions like this.


from numpy import sin
# assume sin is a Python callable and that NumPy decides to support
# our spec to also support getting a "double (*sinfuncptr)(double)".

# Our mission: Avoid to have the user manually import "sin" from C,
# but allow just using the NumPy object and still be fast.

# define a function to integrate
cpdef double f(double x):
return sin(x * x) # guess on signature and use "fastcall"!

# the integrator
def integrate(func, double a, double b, int n):
cdef double s = 0
cdef double dx = (b - a) / n
for i in range(n):
# This is also a fastcall, but can be cached so doesn't
# matter...
s += func(a + i * dx)
return s * dx

integrate(f, 0, 1, 100)

There are two problems here:

 - The "sin" global can be reassigned (monkey-patched) between each 
call to "f", no way for "f" to know. Even "sin" could do the 
reassignment. So you'd need to check for reassignment to do caching...


 - The fastcall inside of "f" is separated from the loop in 
"integrate". And since "f" is often in another module, we can't rely on 
static full program analysis.


These problems with monkey-patching disappear if the lookup is negligible.

Some rough numbers:

 - The overhead with the tp_flags hack is a 2 ns overhead (something 
similar with a metaclass, the problems are more how to synchronize that 
metaclass across multiple 3rd party libraries)


 - Dict lookup 20 ns

 - The sin function is about 35 ns. And, "f" is probably only 2-3 ns, 
and there could very easily be multiple such functions, defined in 
different modules, in a chain, in order to build up a formula.


Dag
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Greg Ewing

Dag wrote:

I'm not asking you to consider the details of all that. 
Just to allow some kind of high-performance extensibility of 
PyTypeObject, so that we can *stop* bothering python-dev with 
specific requirements from our parallel universe of 
nearly-all-Cython-and-Fortran-and-C++ codebases :-)


Maybe you should ask for a bit more than that.

Tp_flags bits are a scarce resource, and they shouldn't
be handed out lightly to anyone who asks for one. Eventually
they're going to run out, and then something else will have to
be done to make further extensions of the type object possible.

So maybe it's worth thinking about making a general mechanism
available for third parties to extend the type object without
them all needing to have their own tp_flags bits and without
needing to collude with each other to avoid conflicts.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Robert Bradshaw
On Wed, May 16, 2012 at 5:03 PM, Greg Ewing  wrote:
> Dag wrote:
>
>> I'm not asking you to consider the details of all that. Just to allow some
>> kind of high-performance extensibility of PyTypeObject, so that we can
>> *stop* bothering python-dev with specific requirements from our parallel
>> universe of nearly-all-Cython-and-Fortran-and-C++ codebases :-)
>
>
> Maybe you should ask for a bit more than that.
>
> Tp_flags bits are a scarce resource, and they shouldn't
> be handed out lightly to anyone who asks for one. Eventually
> they're going to run out, and then something else will have to
> be done to make further extensions of the type object possible.
>
> So maybe it's worth thinking about making a general mechanism
> available for third parties to extend the type object without
> them all needing to have their own tp_flags bits and without
> needing to collude with each other to avoid conflicts.

This is exactly what was proposed to start this thread (with minimal
collusion to avoid conflicts, specifically partitioning up a global ID
space).

- Robert
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] 64-bit Windows buildbots needed

2012-05-16 Thread Brian Curtin
On Wed, May 16, 2012 at 7:44 AM, Antoine Pitrou  wrote:
>
> Hello all,
>
> We still need 64-bit Windows buildbots to test for regressions.
> Otherwise we might let regressions slip through, since few people seem
> to run the test suite under Windows at home.

The machine that I used to run a Server 2008 x64 build slave is back
to being a Windows machine after a dance with Ubuntu. I'm going to
order more RAM and set it up to provide some 64-bit build slave,
probably Windows 8.

I will see about having it setup in the next few weeks.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] C-level duck typing

2012-05-16 Thread Greg Ewing

On 17/05/12 12:17, Robert Bradshaw wrote:


This is exactly what was proposed to start this thread (with minimal
collusion to avoid conflicts, specifically partitioning up a global ID
space).


Yes, but I think this part of the mechanism needs to be spelled out in
more detail, perhaps in the form of a draft PEP. Then there will be
something concrete to discuss in python-dev.

--
Greg
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com