Re: [Python-Dev] Pre-PEP: Redesigning extension modules

2013-09-08 Thread Stefan Behnel
Eric Snow, 08.09.2013 00:22:
> On Mon, Sep 2, 2013 at 7:02 AM, Nick Coghlan wrote:
> 
>> The hook API I currently have in mind is a two step initialisation:
>>
>> PyImport_PrepareNAME (optional)
>> PyImport_ExecNAME
>>
> 
> Should we also look at an API change for the initfunc() of PyImport_Inittab
> entries?  Currently the function takes a module name, which doesn't jive
> with loader.exec_module() taking a module.  I noticed this while adding an
> exec_module() to BuiltinImporter.  I suppose the same thing goes
> for PyImport_ImportFrozenModuleObject().

Is it still the case that the inittab mechanism only works for the
embedding case? It would be nice to have a declarative mechanism for
registering a set of modules from a running module init function.

Stefan


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] metaclasses, classes, instances, and proper nomenclature

2013-09-08 Thread Ethan Furman

I've run across two different ways to think about this:

  1) the type of the first argument

  2) where the method/attribute lives

Since attributes don't take a first argument they default to 2:  an instance attribute lives in the instance, a class 
attribute lives in the class, and a metaclass attribute lives in the metaclass.


Methods, on the other hand, do take a first argument:  an instance method takes itself, a class method takes the class, 
and a metaclass method takes the metaclass.


Going by option 1 above there is only one way to get an instance method, and only one way to get a metaclass method -- 
calling with the instance (either directly or indirectly via the class), or calling a metaclass method that has been 
marked as a @classmethod.


Therein lies my confusion.

class Meta(type):

@classmethod
def meta_method(mcls):
print("I'm a metaclass method!")

def cls_method1(cls):
print("I'm a class method!  Aren't I?")

class Class(metaclass=Meta):

@classmethod
def cls_method2(cls):
print("I'm a class method for sure!")

def instance_method(self):
print("And I'm a regular ol' instance method")


So, is Meta.cls_method1 a class method?  On the one hand, it takes the class as it's first parameter, on the other hand 
it lives in the metaclass.  And on the third hand you can't get to it from the instance Class().



If you're wondering why this is posted to PyDev, the related question is this:  What is the proper role of a metaclass? 
 Should it basically fiddle with the class creation process and then get out of the way?  The case in point is, of 
course, Enum.  Currently it has a custom __getattr__, but it lives in the metaclass, EnumMeta.  Now this is handy, 
because it means that Color.red.blue raises an AttributeError, where if __getattr__ lived in Enum itself that would 
work.  It also has the __members__ attribute living in EnumMeta, which means it's not accessible from Color.red.  In 
other words, EnumMeta is not getting out the way, it is still very much involved.  On the one hand, that's cool; on the 
other hand, I hand to think hard to figure out why Color.red.blue was not getting routed through EnumMeta's __getattr__, 
but was instead getting routed through object.__getattr__.


--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] metaclasses, classes, instances, and proper nomenclature

2013-09-08 Thread Nick Coghlan
On 8 Sep 2013 18:38, "Ethan Furman"  wrote:
>
> I've run across two different ways to think about this:
>
>   1) the type of the first argument
>
>   2) where the method/attribute lives
>
> Since attributes don't take a first argument they default to 2:  an
instance attribute lives in the instance, a class attribute lives in the
class, and a metaclass attribute lives in the metaclass.
>
> Methods, on the other hand, do take a first argument:  an instance method
takes itself, a class method takes the class, and a metaclass method takes
the metaclass.

No, there's no such thing as a "metaclass method".

Metaclass instance methods are equivalent to hidden class methods - they
don't appear in dir() and can't be accessed through instances of the class.

It's only class methods on the metaclass that receive that rather than the
class object (__new__ is technically a static method, but still accepts the
metaclass as the first argument).

> Going by option 1 above there is only one way to get an instance method,
and only one way to get a metaclass method -- calling with the instance
(either directly or indirectly via the class), or calling a metaclass
method that has been marked as a @classmethod.
>
> Therein lies my confusion.
>
> class Meta(type):
>
> @classmethod
> def meta_method(mcls):
> print("I'm a metaclass method!")
>
> def cls_method1(cls):
> print("I'm a class method!  Aren't I?")
>
> class Class(metaclass=Meta):
>
> @classmethod
> def cls_method2(cls):
> print("I'm a class method for sure!")
>
> def instance_method(self):
> print("And I'm a regular ol' instance method")
>
>
> So, is Meta.cls_method1 a class method?  On the one hand, it takes the
class as it's first parameter, on the other hand it lives in the metaclass.
 And on the third hand you can't get to it from the instance Class().

It's a hidden class method.

> If you're wondering why this is posted to PyDev, the related question is
this:  What is the proper role of a metaclass?  Should it basically fiddle
with the class creation process and then get out of the way?  The case in
point is, of course, Enum.  Currently it has a custom __getattr__, but it
lives in the metaclass, EnumMeta.  Now this is handy, because it means that
Color.red.blue raises an AttributeError, where if __getattr__ lived in Enum
itself that would work.  It also has the __members__ attribute living in
EnumMeta, which means it's not accessible from Color.red.  In other words,
EnumMeta is not getting out the way, it is still very much involved.  On
the one hand, that's cool; on the other hand, I hand to think hard to
figure out why Color.red.blue was not getting routed through EnumMeta's
__getattr__, but was instead getting routed through object.__getattr__.

This is exactly how a metaclass is intended to be used - to affect the
behaviour of the class without affecting the behaviour of instances.

And yes, introspection does get a little interesting when a non-trivial
metaclass is in play :)

Cheers,
Nick.

>
> --
> ~Ethan~
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Issue #18904: test_socket: add inheritance tests using fcntl and FD_CLOEXEC

2013-09-08 Thread Antoine Pitrou
On Sun,  8 Sep 2013 11:54:00 +0200 (CEST)
victor.stinner  wrote:
> http://hg.python.org/cpython/rev/b7f6f6f59e91
> changeset:   85619:b7f6f6f59e91
> user:Victor Stinner 
> date:Sun Sep 08 11:53:09 2013 +0200
> summary:
>   Issue #18904: test_socket: add inheritance tests using fcntl and FD_CLOEXEC
> 
[...]
>  
> +if fcntl:
> +def test_get_inheritable_cloexec(self):

The right way to do this would be to skip the test if fcntl doesn't
exist.

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Steven D'Aprano


I'd like to get some attention for this please.


On Sat, Aug 31, 2013 at 12:58:39PM +1000, Steven D'Aprano wrote:
> Hi all,
> 
> 
> I think that PEP 450 is now ready for a PEP dictator. There have been a 
> number of code reviews, and feedback has been taken into account. The test 
> suite passes. I'm not aware of any unanswered issues with the code. At 
> least two people other than myself think that the implementation is ready 
> for a dictator, and nobody has objected.
> 
> There is still on-going work on speeding up the implementation for the 
> statistics.sum function, but that will not effect the interface or the 
> substantially change the test suite.
> 
> http://bugs.python.org/issue18606
> http://www.python.org/dev/peps/pep-0450/
> 
> 
> 
> 
> -- 
> Steven
> ___
> Python-Dev mailing list
> [email protected]
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/steve%40pearwood.info
> 
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] cpython: Issue #18904: test_socket: add inheritance tests using fcntl and FD_CLOEXEC

2013-09-08 Thread Victor Stinner
2013/9/8 Antoine Pitrou :
> On Sun,  8 Sep 2013 11:54:00 +0200 (CEST)
> victor.stinner  wrote:
>> http://hg.python.org/cpython/rev/b7f6f6f59e91
>> changeset:   85619:b7f6f6f59e91
>> user:Victor Stinner 
>> date:Sun Sep 08 11:53:09 2013 +0200
>> summary:
>>   Issue #18904: test_socket: add inheritance tests using fcntl and FD_CLOEXEC
>>
> [...]
>>
>> +if fcntl:
>> +def test_get_inheritable_cloexec(self):
>
> The right way to do this would be to skip the test if fcntl doesn't
> exist.

Ok.

First, I added two methods to get and set the FD_CLOXEC flag. Later, I
inlined these methods because it makes the code more readable.

I modified the tests to use @unittest.skipIf decorator:

New changeset aea58e1cae75 by Victor Stinner in branch 'default':
Issue #18904: test_os and test_socket use unittest.skipIf() to check if fcntl
http://hg.python.org/cpython/rev/aea58e1cae75

Victor
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 454: Add a new tracemalloc module

2013-09-08 Thread Victor Stinner
2013/9/4 Victor Stinner :
> http://www.python.org/dev/peps/pep-0454/
>
> PEP: 454
> Title: Add a new tracemalloc module to trace Python memory allocations
> Version: $Revision$
> Last-Modified: $Date$
> Author: Victor Stinner 
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 3-September-2013
> Python-Version: 3.4

I added a function get_tracemalloc_size() to see how much memory is
used by the tracemalloc module itself. Result on the Python test
suite:

* 1 frame: +52% (+%68%)
  Python=34 MiB; _tracemalloc=18 MiB, tracemalloc.py=5 MiB
* 10 frames: +155% (+170%)
  Python=34 MiB, _tracemalloc=53 MiB, tracemalloc.py=5 MiB
* 100 frames: +1273% (+1283%)
  Python=30 MiB, _tracemalloc=382 MiB, tracemalloc.py=6 MiB

On a small application and a computer with GB of memory, it may not
matter. In a big application on an embedded device, it can be a
blocker point to use tracemalloc. So I added filters (on the filename
and line number) directly in the C module:


``add_filter(include: bool, filename: str, lineno: int=None)`` function:

Add a filter. If *include* is ``True``, only trace memory blocks
allocated in a file with a name matching *filename*. If
*include* is ``False``, don't trace memory blocks allocated in a
file with a name matching *filename*.

The match is done using *filename* as a prefix. For example,
``'/usr/bin/'`` only matchs files the ``/usr/bin`` directories. The
``.pyc`` and ``.pyo`` suffixes are automatically replaced with
``.py`` when matching the filename.

*lineno* is a line number. If *lineno* is ``None`` or lesser than
``1``, it matches any line number.

``clear_filters()`` function:

Reset the filter list.

``get_filters()`` function:

Get the filters as list of
``(include: bool, filename: str, lineno: int)`` tuples.

If *lineno* is ``None``, a filter matchs any line number.

   By default, the filename of the Python tracemalloc module
   (``tracemalloc.py``) is excluded.


Right now, the match is done using a PyUnicode_Tailmatch(). It is not
convinient. I will see if it is possible to implement the joker
character "*" matching any string, so the API would be closer to
Snapshot.filter_filenames() (which uses fnmatch.fnmatch).

Victor
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Ryan
...what's a PEP dictator?

Steven D'Aprano  wrote:

>
>
>I'd like to get some attention for this please.
>
>
>On Sat, Aug 31, 2013 at 12:58:39PM +1000, Steven D'Aprano wrote:
>> Hi all,
>> 
>> 
>> I think that PEP 450 is now ready for a PEP dictator. There have been
>a 
>> number of code reviews, and feedback has been taken into account. The
>test 
>> suite passes. I'm not aware of any unanswered issues with the code.
>At 
>> least two people other than myself think that the implementation is
>ready 
>> for a dictator, and nobody has objected.
>> 
>> There is still on-going work on speeding up the implementation for
>the 
>> statistics.sum function, but that will not effect the interface or
>the 
>> substantially change the test suite.
>> 
>> http://bugs.python.org/issue18606
>> http://www.python.org/dev/peps/pep-0450/
>> 
>> 
>> 
>> 
>> -- 
>> Steven
>> ___
>> Python-Dev mailing list
>> [email protected]
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe: 
>>
>http://mail.python.org/mailman/options/python-dev/steve%40pearwood.info
>> 
>___
>Python-Dev mailing list
>[email protected]
>https://mail.python.org/mailman/listinfo/python-dev
>Unsubscribe:
>https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com

-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Antoine Pitrou
On Sat, 31 Aug 2013 12:58:39 +1000
Steven D'Aprano  wrote:

> Hi all,
> 
> 
> I think that PEP 450 is now ready for a PEP dictator.

Perhaps Mark would like to apply?

Regards

Antoine.


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Ethan Furman

On 09/08/2013 06:52 AM, Ryan wrote:


...what's a PEP dictator?


The person tasked with deciding on the fate of an individual PEP.

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Guido van Rossum
Going over the open issues:

- Parallel arrays or arrays of tuples? I think the API should require
an array of tuples. It is trivial to zip up parallel arrays to the
required format, while if you have an array of tuples, extracting the
parallel arrays is slightly more cumbersome. Also for manipulating of
the raw data, an array of tuples makes it easier to do insertions or
removals without worrying about losing the correspondence between the
arrays.

- Requiring concrete sequences as opposed to iterators sounds fine.
I'm guessing that good algorithms for doing certain calculations in a
single pass, assuming the full input doesn't fit in memory, are quite
different from good algorithms for doing the same calculations without
having that worry. (Just like you can't expect to use the same code to
do a good job of sorting in-memory and on-disk data.)

- Postponing some algorithms to Python 3.5  sounds fine.

On Sun, Sep 8, 2013 at 9:06 AM, Ethan Furman  wrote:
> On 09/08/2013 06:52 AM, Ryan wrote:
>>
>>
>> ...what's a PEP dictator?
>
>
> The person tasked with deciding on the fate of an individual PEP.
>
> --
> ~Ethan~
> ___
>
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Guido van Rossum
Steven, I'd like to just approve the PEP, given the amount of
discussion that's happened already (though I didn't follow much of
it). I quickly glanced through the PEP and didn't find anything I'd
personally object to, but then I found your section of open issues,
and I realized that you don't actually specify the proposed API in the
PEP itself. It's highly unusual to approve a PEP that doesn't contain
a specification. What did I miss?

On Sun, Sep 8, 2013 at 5:37 AM, Steven D'Aprano  wrote:
> 
>
> I'd like to get some attention for this please.
>
>
> On Sat, Aug 31, 2013 at 12:58:39PM +1000, Steven D'Aprano wrote:
>> Hi all,
>>
>>
>> I think that PEP 450 is now ready for a PEP dictator. There have been a
>> number of code reviews, and feedback has been taken into account. The test
>> suite passes. I'm not aware of any unanswered issues with the code. At
>> least two people other than myself think that the implementation is ready
>> for a dictator, and nobody has objected.
>>
>> There is still on-going work on speeding up the implementation for the
>> statistics.sum function, but that will not effect the interface or the
>> substantially change the test suite.
>>
>> http://bugs.python.org/issue18606
>> http://www.python.org/dev/peps/pep-0450/
>>
>>
>>
>>
>> --
>> Steven
>> ___
>> Python-Dev mailing list
>> [email protected]
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> http://mail.python.org/mailman/options/python-dev/steve%40pearwood.info
>>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> https://mail.python.org/mailman/options/python-dev/guido%40python.org



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Alexander Belopolsky
On Sun, Sep 8, 2013 at 9:32 PM, Guido van Rossum  wrote:

> - Parallel arrays or arrays of tuples? I think the API should require
> an array of tuples. It is trivial to zip up parallel arrays to the
> required format, while if you have an array of tuples, extracting the
> parallel arrays is slightly more cumbersome.
>

I agree with your conclusion but not with the rationale: converting between
array of tuples and parallel arrays is trivial both ways:

>>> at = [(1,2,3), (4,5,6), (7,8,9), (10, 11, 12)]
>>> zip(*at)
[(1, 4, 7, 10), (2, 5, 8, 11), (3, 6, 9, 12)]
>>> zip(*_)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12)]

(Note that zip(*x) is basically transpose(x).)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Guido van Rossum
Never mind, I found the patch and the issue. I really think that the
*PEP* is ready for inclusion after the open issues are changed into
something like Discussion or Future Work, and after adding a more
prominent link to the issue with the patch. Then the *patch* can be
reviewed some more until it is ready -- it looks very close already.

On Sun, Sep 8, 2013 at 10:32 AM, Guido van Rossum  wrote:
> Going over the open issues:
>
> - Parallel arrays or arrays of tuples? I think the API should require
> an array of tuples. It is trivial to zip up parallel arrays to the
> required format, while if you have an array of tuples, extracting the
> parallel arrays is slightly more cumbersome. Also for manipulating of
> the raw data, an array of tuples makes it easier to do insertions or
> removals without worrying about losing the correspondence between the
> arrays.
>
> - Requiring concrete sequences as opposed to iterators sounds fine.
> I'm guessing that good algorithms for doing certain calculations in a
> single pass, assuming the full input doesn't fit in memory, are quite
> different from good algorithms for doing the same calculations without
> having that worry. (Just like you can't expect to use the same code to
> do a good job of sorting in-memory and on-disk data.)
>
> - Postponing some algorithms to Python 3.5  sounds fine.
>
> On Sun, Sep 8, 2013 at 9:06 AM, Ethan Furman  wrote:
>> On 09/08/2013 06:52 AM, Ryan wrote:
>>>
>>>
>>> ...what's a PEP dictator?
>>
>>
>> The person tasked with deciding on the fate of an individual PEP.
>>
>> --
>> ~Ethan~
>> ___
>>
>> Python-Dev mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/guido%40python.org
>
>
>
> --
> --Guido van Rossum (python.org/~guido)



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Guido van Rossum
Well, to me zip(*x) is unnatural, and it's inefficient when the arrays are long.

On Sun, Sep 8, 2013 at 10:45 AM, Alexander Belopolsky
 wrote:
>
> On Sun, Sep 8, 2013 at 9:32 PM, Guido van Rossum  wrote:
>>
>> - Parallel arrays or arrays of tuples? I think the API should require
>> an array of tuples. It is trivial to zip up parallel arrays to the
>> required format, while if you have an array of tuples, extracting the
>> parallel arrays is slightly more cumbersome.
>
>
> I agree with your conclusion but not with the rationale: converting between
> array of tuples and parallel arrays is trivial both ways:
>
 at = [(1,2,3), (4,5,6), (7,8,9), (10, 11, 12)]
 zip(*at)
> [(1, 4, 7, 10), (2, 5, 8, 11), (3, 6, 9, 12)]
 zip(*_)
> [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12)]
>
> (Note that zip(*x) is basically transpose(x).)



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Steven D'Aprano
On Sun, Sep 08, 2013 at 10:25:22AM -0700, Guido van Rossum wrote:
> Steven, I'd like to just approve the PEP, given the amount of
> discussion that's happened already (though I didn't follow much of
> it). I quickly glanced through the PEP and didn't find anything I'd
> personally object to, but then I found your section of open issues,
> and I realized that you don't actually specify the proposed API in the
> PEP itself. It's highly unusual to approve a PEP that doesn't contain
> a specification. What did I miss?

You didn't miss anything, but I may have.

Should the PEP go through each public function in the module (there are 
only 11)? That may be a little repetitive, since most have the same, or 
almost the same, signatures. Or is it acceptable to just include an 
overview? I've come up with this:


API

The initial version of the library will provide univariate (single
variable) statistics functions.  The general API will be based on a
functional model ``function(data, ...) -> result``, where ``data``
is a mandatory iterable of (usually) numeric data.

The author expects that lists will be the most common data type used,
but any iterable type should be acceptable.  Where necessary, functions
may convert to lists internally.  Where possible, functions are
expected to conserve the type of the data values, for example, the mean
of a list of Decimals should be a Decimal rather than float.


Calculating the mean, median and mode

The ``mean``, ``median`` and ``mode`` functions take a single
mandatory argument and return the appropriate statistic, e.g.:

>>> mean([1, 2, 3])
2.0

``mode`` is the sole exception to the rule that the data argument
must be numeric.  It will also accept an iterable of nominal data,
such as strings.


Calculating variance and standard deviation

In order to be similar to scientific calculators, the statistics
module will include separate functions for population and sample
variance and standard deviation.  All four functions have similar
signatures, with a single mandatory argument, an iterable of
numeric data, e.g.:

>>> variance([1, 2, 2, 2, 3])
0.5

All four functions also accept a second, optional, argument, the
mean of the data.  This is modelled on a similar API provided by
the GNU Scientific Library[18].  There are three use-cases for
using this argument, in no particular order:

1)  The value of the mean is known *a priori*.

2)  You have already calculated the mean, and wish to avoid
calculating it again.

3)  You wish to (ab)use the variance functions to calculate
the second moment about some given point other than the
mean.

In each case, it is the caller's responsibility to ensure that
given argument is meaningful.




Is this satisfactory or do I need to go into more detail?


-- 
Steven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Paul Moore
On 8 September 2013 20:19, Steven D'Aprano  wrote:
[...]
> Is this satisfactory or do I need to go into more detail?

It describes only 7 functions, and yet you state there are 11. I'd
suggest you add a 1-line summary of each function, something like:

mean - calculate the (arithmetic) mean of the data
median - calculate the median value of the data
etc.

Paul
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Oscar Benjamin
On 8 September 2013 18:32, Guido van Rossum  wrote:
> Going over the open issues:
>
> - Parallel arrays or arrays of tuples? I think the API should require
> an array of tuples. It is trivial to zip up parallel arrays to the
> required format, while if you have an array of tuples, extracting the
> parallel arrays is slightly more cumbersome. Also for manipulating of
> the raw data, an array of tuples makes it easier to do insertions or
> removals without worrying about losing the correspondence between the
> arrays.

For something like this, where there are multiple obvious formats for
the input data, I think it's reasonable to just request whatever is
convenient for the implementation. Otherwise you're asking at least
some of your users to convert data from one format to another just so
that you can convert it back again. In any real problem you'll likely
have more than two variables, so you'll be writing some code to
prepare the data for the function anyway.

The most obvious alternative that isn't explicitly mentioned in the
PEP is to accept either:

def correlation(x, y=None):
if y is None:
xs = []
ys = []
for x, y in x:
xs.append(x)
ys.append(y)
else:
xs = list(x)
ys = list(y)
assert len(xs) == len(ys)
# In reality a helper function does the above.
# Now compute stuff

This avoids any unnecessary conversions and is as convenient as
possible for all users at the expense of having a slightly more
complicated API.


Oscar
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 454: Add a new tracemalloc module

2013-09-08 Thread Janzert
It seems like most of this could live on PyPi for a while so the API can 
get hashed out in use? If that's not the case is it because the PEP 445 
API isn't rich enough?


Janzert

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Guido van Rossum
On Sun, Sep 8, 2013 at 1:48 PM, Oscar Benjamin
 wrote:
> On 8 September 2013 18:32, Guido van Rossum  wrote:
>> Going over the open issues:
>>
>> - Parallel arrays or arrays of tuples? I think the API should require
>> an array of tuples. It is trivial to zip up parallel arrays to the
>> required format, while if you have an array of tuples, extracting the
>> parallel arrays is slightly more cumbersome. Also for manipulating of
>> the raw data, an array of tuples makes it easier to do insertions or
>> removals without worrying about losing the correspondence between the
>> arrays.
>
> For something like this, where there are multiple obvious formats for
> the input data, I think it's reasonable to just request whatever is
> convenient for the implementation.

Not really. The implementation may change, or its needs may not be
obvious to the caller. I would say the right thing to do is request
something easy to remember, which often means consistent. In general,
Python APIs definitely skew towards lists of tuples rather than
parallel arrays, and for good reasons -- that way you benefit most
from built-in operations like slices and insert/append.

> Otherwise you're asking at least
> some of your users to convert data from one format to another just so
> that you can convert it back again. In any real problem you'll likely
> have more than two variables, so you'll be writing some code to
> prepare the data for the function anyway.

Yeah, so you might as well prepare it in the form that the API expects.

> The most obvious alternative that isn't explicitly mentioned in the
> PEP is to accept either:
>
> def correlation(x, y=None):
> if y is None:
> xs = []
> ys = []
> for x, y in x:
> xs.append(x)
> ys.append(y)
> else:
> xs = list(x)
> ys = list(y)
> assert len(xs) == len(ys)
> # In reality a helper function does the above.
> # Now compute stuff
>
> This avoids any unnecessary conversions and is as convenient as
> possible for all users at the expense of having a slightly more
> complicated API.

I don't think this is really more convenient -- it is more to learn,
and can cause surprises (e.g. when a user is only familiar with one
format and then sees an example using the other format, they may be
unable to understand the example).

The one argument I *haven't* heard yet which *might* sway me would be
something along the line "every other statistics package that users
might be familiar with does it this way" or "all the statistics
textbooks do it this way". (Because, frankly, when it comes to
statistics I'm a rank amateur and I really want Steven's new module to
educate me as much as help me compute specific statistical functions.)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] RFC: PEP 454: Add a new tracemalloc module

2013-09-08 Thread Victor Stinner
2013/9/8 Janzert :
> It seems like most of this could live on PyPi for a while so the API can get
> hashed out in use?

The pytracemalloc is available on PyPI since 6 months. The only
feedback I had was something trying to compile it on Windows (which is
complex because of the dependency to glib, I don't think that it
succeed to install it on Windows). I guess that I didn't get more
feedback because it requires to patch and recompile Python, which is
not trivial.

I expect more feedback on python-dev with a working implementation (on
hg.python.org) and a PEP.

The version available on PyPI works and should be enough for most use
cases to be able to identify a memory leak.

Gregory P. Smith asked me if it would be possible to get more frames
(filename and line number) of the Python traceback, instead of just
the one frame (the last frame). I implemented it, but now I have new
issues (memory usage of the tracemalloc module itself), so I'm working
on filters directly implemented in the C module (_tracemalloc). It was
already possible to filter traces from a snapshot read from the disk.

I still have some tasks in my TODO list to finish the API and the
implementation. When I will be done, I will post post a new version of
the PEP on python-dev.

> If that's not the case is it because the PEP 445 API
> isn't rich enough?

The PEP 445 API is only designed to allow to develop new tools like
failmalloc or tracemalloc, without adding overhead if such debug tool
is not used.

The tracemalloc module reads the current Python traceback (filename
and line number) which is "not directly" accessible from
PyMem_Malloc(). I hope that existing tools like Heapy and Melia will
benefit directly from tracemalloc instead of having to develop their
own memory allocator hooks to get the same information (Python
traceback).

Victor
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Steven D'Aprano
On Sun, Sep 08, 2013 at 02:41:35PM -0700, Guido van Rossum wrote:
> On Sun, Sep 8, 2013 at 1:48 PM, Oscar Benjamin
>  wrote:

> > The most obvious alternative that isn't explicitly mentioned in the
> > PEP is to accept either:
> >
> > def correlation(x, y=None):
> > if y is None:
> > xs = []
> > ys = []
> > for x, y in x:
> > xs.append(x)
> > ys.append(y)
> > else:
> > xs = list(x)
> > ys = list(y)
> > assert len(xs) == len(ys)
> > # In reality a helper function does the above.
> > # Now compute stuff
> >
> > This avoids any unnecessary conversions and is as convenient as
> > possible for all users at the expense of having a slightly more
> > complicated API.

The PEP does mention that, as "some combination of the above".

The PEP also mentions that the decision of what API to use for 
multivariate stats is deferred until 3.5, so there's plenty of time for 
people to bike-shed this :-)

 
> I don't think this is really more convenient -- it is more to learn,
> and can cause surprises (e.g. when a user is only familiar with one
> format and then sees an example using the other format, they may be
> unable to understand the example).
> 
> The one argument I *haven't* heard yet which *might* sway me would be
> something along the line "every other statistics package that users
> might be familiar with does it this way" or "all the statistics
> textbooks do it this way". (Because, frankly, when it comes to
> statistics I'm a rank amateur and I really want Steven's new module to
> educate me as much as help me compute specific statistical functions.)

I don't think that there is one common API for multivariate stats 
packages. It partially depends on whether the package is aimed at basic 
use or advanced use. I haven't done a systematic comparison of the most 
common, but here are a few examples:

- The Casio Classpad graphing calculator has a spreadsheet-like 
interface, which I consider equivalent to func(xdata, ydata).

- The HP-48G series of calculators uses a fixed global variable holding 
a matrix, and a second global variable specifying which columns to use.

- The R "cor" (correlation coefficient) function takes either a pair of 
vectors (lists), and calculates a single value, or a matrix, in which 
case it calculates the correlation matrix.

- numpy.corrcoeff takes one or two array arguments, and a third argument 
specifying whether to treat rows or columns as variables, and like R 
returns either a single value or the correlation matrix.

- Minitab expects two seperate vector arguments, and returns the 
correlation coefficient between them.

- If I'm reading the below page correctly, the SAS corr procedure 
takes anything up to 27 arguments.

http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/procstat_corr_sect004.htm

I don't suggest we follow that API :-)


Quite frankly, I consider the majority of stats APIs to be confusing 
with a steep learning curve.



-- 
Steven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Stephen J. Turnbull
Guido van Rossum writes:
 > On Sun, Sep 8, 2013 at 1:48 PM, Oscar Benjamin
 >  wrote:
 > > On 8 September 2013 18:32, Guido van Rossum  wrote:
 > >> Going over the open issues:
 > >>
 > >> - Parallel arrays or arrays of tuples? I think the API should require
 > >> an array of tuples. It is trivial to zip up parallel arrays to the
 > >> required format, while if you have an array of tuples, extracting the
 > >> parallel arrays is slightly more cumbersome.
 > >>
 > >> Also for manipulating of the raw data, an array of tuples makes
 > >> it easier to do insertions or removals without worrying about
 > >> losing the correspondence between the arrays.

I don't necessarily find this persuasive.  It's more common when
working with existing databases that you add variables than add
observations.  This is going to require attention to the
correspondence in any case.  Observations aren't added, and they're
"removed" temporarily for statistics on subsets by slicing.  If you
use the same slice for all variables, you're not going to make a
mistake.

 > Not really. The implementation may change, or its needs may not be
 > obvious to the caller. I would say the right thing to do is request
 > something easy to remember, which often means consistent. In general,
 > Python APIs definitely skew towards lists of tuples rather than
 > parallel arrays, and for good reasons -- that way you benefit most
 > from built-in operations like slices and insert/append.

However, it's common in economic statistics to have a rectangular
array, and extract both certain rows (tuples of observations on
variables) and certain columns (variables).  For example you might
have data on populations of American states from 1900 to 2012, and
extract the data on New England states from 1946 to 2012 for analysis.

 > The one argument I *haven't* heard yet which *might* sway me would be
 > something along the line "every other statistics package that users
 > might be familiar with does it this way" or "all the statistics
 > textbooks do it this way". (Because, frankly, when it comes to
 > statistics I'm a rank amateur and I really want Steven's new module to
 > educate me as much as help me compute specific statistical functions.)

In economic statistics, most software traditionally inputs variables
in column-major order (ie, parallel arrays).  That said, most software
nowadays allows input as spreadsheet tables.  You pays your money and
you takes your choice.

I think the example above of state population data shows that rows and
columns are pretty symmetric here.  Many databases will have "too many"
of both, and you'll want to "slice" both to get the sample and
variables relevant to your analysis.

This is all just for consideration; I am quite familiar with economic
statistics and software, but not so much for that used in sociology,
psychology, and medical applications.  In the end, I think it's best
to leave it up to Steven's judgment as to what is convenient for him
to maintain.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Greg

On 9/09/2013 5:52 a.m., Guido van Rossum wrote:

Well, to me zip(*x) is unnatural, and it's inefficient when the arrays are long.


Would it be worth having a transpose() function in the stdlib
somewhere, that returns a view instead of copying the data?

--
Greg

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Guido van Rossum
Yeah, so this and Steven's review of various other APIs suggests that the
field of statistics hasn't really reached the object-oriented age (or
perhaps the OO view isn't suitable for the field), and people really think
of their data as a matrix of some sort. We should respect that. Now, if
this was NumPy, it would *still* make sense to require a single argument,
to be interpreted in the usual fashion. So I'm using that as a kind of
leverage to still recommend taking a list of pairs instead of a pair of
lists. Also, it's quite likely that at least *some* of the users of the new
statistics module will be more familiar with OO programming (e.g. the
Python DB API , PEP 249) than they are with other statistics packages.


On Sun, Sep 8, 2013 at 7:57 PM, Stephen J. Turnbull wrote:

> Guido van Rossum writes:
>  > On Sun, Sep 8, 2013 at 1:48 PM, Oscar Benjamin
>  >  wrote:
>  > > On 8 September 2013 18:32, Guido van Rossum  wrote:
>  > >> Going over the open issues:
>  > >>
>  > >> - Parallel arrays or arrays of tuples? I think the API should require
>  > >> an array of tuples. It is trivial to zip up parallel arrays to the
>  > >> required format, while if you have an array of tuples, extracting the
>  > >> parallel arrays is slightly more cumbersome.
>  > >>
>  > >> Also for manipulating of the raw data, an array of tuples makes
>  > >> it easier to do insertions or removals without worrying about
>  > >> losing the correspondence between the arrays.
>
> I don't necessarily find this persuasive.  It's more common when
> working with existing databases that you add variables than add
> observations.  This is going to require attention to the
> correspondence in any case.  Observations aren't added, and they're
> "removed" temporarily for statistics on subsets by slicing.  If you
> use the same slice for all variables, you're not going to make a
> mistake.
>
>  > Not really. The implementation may change, or its needs may not be
>  > obvious to the caller. I would say the right thing to do is request
>  > something easy to remember, which often means consistent. In general,
>  > Python APIs definitely skew towards lists of tuples rather than
>  > parallel arrays, and for good reasons -- that way you benefit most
>  > from built-in operations like slices and insert/append.
>
> However, it's common in economic statistics to have a rectangular
> array, and extract both certain rows (tuples of observations on
> variables) and certain columns (variables).  For example you might
> have data on populations of American states from 1900 to 2012, and
> extract the data on New England states from 1946 to 2012 for analysis.
>
>  > The one argument I *haven't* heard yet which *might* sway me would be
>  > something along the line "every other statistics package that users
>  > might be familiar with does it this way" or "all the statistics
>  > textbooks do it this way". (Because, frankly, when it comes to
>  > statistics I'm a rank amateur and I really want Steven's new module to
>  > educate me as much as help me compute specific statistical functions.)
>
> In economic statistics, most software traditionally inputs variables
> in column-major order (ie, parallel arrays).  That said, most software
> nowadays allows input as spreadsheet tables.  You pays your money and
> you takes your choice.
>
> I think the example above of state population data shows that rows and
> columns are pretty symmetric here.  Many databases will have "too many"
> of both, and you'll want to "slice" both to get the sample and
> variables relevant to your analysis.
>
> This is all just for consideration; I am quite familiar with economic
> statistics and software, but not so much for that used in sociology,
> psychology, and medical applications.  In the end, I think it's best
> to leave it up to Steven's judgment as to what is convenient for him
> to maintain.
>



-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Guido van Rossum
On Sun, Sep 8, 2013 at 5:26 PM, Greg  wrote:

> On 9/09/2013 5:52 a.m., Guido van Rossum wrote:
>
>> Well, to me zip(*x) is unnatural, and it's inefficient when the arrays
>> are long.
>>
>
> Would it be worth having a transpose() function in the stdlib
> somewhere, that returns a view instead of copying the data?


I'd be hesitant to add just that one function, given that there's hardly
any support for multi-dimensional arrays in the stdlib. (NumPy of course
has a transpose(), and that's where it arguably belongs.)

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Steven D'Aprano
On Sun, Sep 08, 2013 at 09:14:39PM +0100, Paul Moore wrote:
> On 8 September 2013 20:19, Steven D'Aprano  wrote:
> [...]
> > Is this satisfactory or do I need to go into more detail?
> 
> It describes only 7 functions, and yet you state there are 11. I'd
> suggest you add a 1-line summary of each function, something like:
> 
> mean - calculate the (arithmetic) mean of the data
> median - calculate the median value of the data
> etc.

Thanks Paul, will do.

I think PEP 1 needs to be a bit clearer about this part of the process. 
For instance, if I had a module with 100 functions and methods, would I 
need to document all of them in the PEP? I expect not, but then I didn't 
expect I needed to document all 11 either :-)



-- 
Steven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 450 adding statistics module

2013-09-08 Thread Steven D'Aprano
On Mon, Sep 09, 2013 at 12:26:05PM +1200, Greg wrote:
> On 9/09/2013 5:52 a.m., Guido van Rossum wrote:
> >Well, to me zip(*x) is unnatural, and it's inefficient when the arrays are 
> >long.
> 
> Would it be worth having a transpose() function in the stdlib
> somewhere, that returns a view instead of copying the data?

I've intentionally left out multivariate statistics from the initial 
version of statistics.py so there will be plenty of time to get feedback 
from users before deciding on an API before 3.5.

If there was a transpose function in the std lib, the obvious place 
would be the statistics module itself. There is precedent: R includes a 
transpose function, and presumably the creators of R expect it to be 
used frequently because they've given it a single-letter name.

http://stat.ethz.ch/R-manual/R-devel/library/base/html/t.html


-- 
Steven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com