Re: [Python-Dev] Pre-PEP: Redesigning extension modules
Eric Snow, 08.09.2013 00:22: > On Mon, Sep 2, 2013 at 7:02 AM, Nick Coghlan wrote: > >> The hook API I currently have in mind is a two step initialisation: >> >> PyImport_PrepareNAME (optional) >> PyImport_ExecNAME >> > > Should we also look at an API change for the initfunc() of PyImport_Inittab > entries? Currently the function takes a module name, which doesn't jive > with loader.exec_module() taking a module. I noticed this while adding an > exec_module() to BuiltinImporter. I suppose the same thing goes > for PyImport_ImportFrozenModuleObject(). Is it still the case that the inittab mechanism only works for the embedding case? It would be nice to have a declarative mechanism for registering a set of modules from a running module init function. Stefan ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] metaclasses, classes, instances, and proper nomenclature
I've run across two different ways to think about this:
1) the type of the first argument
2) where the method/attribute lives
Since attributes don't take a first argument they default to 2: an instance attribute lives in the instance, a class
attribute lives in the class, and a metaclass attribute lives in the metaclass.
Methods, on the other hand, do take a first argument: an instance method takes itself, a class method takes the class,
and a metaclass method takes the metaclass.
Going by option 1 above there is only one way to get an instance method, and only one way to get a metaclass method --
calling with the instance (either directly or indirectly via the class), or calling a metaclass method that has been
marked as a @classmethod.
Therein lies my confusion.
class Meta(type):
@classmethod
def meta_method(mcls):
print("I'm a metaclass method!")
def cls_method1(cls):
print("I'm a class method! Aren't I?")
class Class(metaclass=Meta):
@classmethod
def cls_method2(cls):
print("I'm a class method for sure!")
def instance_method(self):
print("And I'm a regular ol' instance method")
So, is Meta.cls_method1 a class method? On the one hand, it takes the class as it's first parameter, on the other hand
it lives in the metaclass. And on the third hand you can't get to it from the instance Class().
If you're wondering why this is posted to PyDev, the related question is this: What is the proper role of a metaclass?
Should it basically fiddle with the class creation process and then get out of the way? The case in point is, of
course, Enum. Currently it has a custom __getattr__, but it lives in the metaclass, EnumMeta. Now this is handy,
because it means that Color.red.blue raises an AttributeError, where if __getattr__ lived in Enum itself that would
work. It also has the __members__ attribute living in EnumMeta, which means it's not accessible from Color.red. In
other words, EnumMeta is not getting out the way, it is still very much involved. On the one hand, that's cool; on the
other hand, I hand to think hard to figure out why Color.red.blue was not getting routed through EnumMeta's __getattr__,
but was instead getting routed through object.__getattr__.
--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] metaclasses, classes, instances, and proper nomenclature
On 8 Sep 2013 18:38, "Ethan Furman" wrote:
>
> I've run across two different ways to think about this:
>
> 1) the type of the first argument
>
> 2) where the method/attribute lives
>
> Since attributes don't take a first argument they default to 2: an
instance attribute lives in the instance, a class attribute lives in the
class, and a metaclass attribute lives in the metaclass.
>
> Methods, on the other hand, do take a first argument: an instance method
takes itself, a class method takes the class, and a metaclass method takes
the metaclass.
No, there's no such thing as a "metaclass method".
Metaclass instance methods are equivalent to hidden class methods - they
don't appear in dir() and can't be accessed through instances of the class.
It's only class methods on the metaclass that receive that rather than the
class object (__new__ is technically a static method, but still accepts the
metaclass as the first argument).
> Going by option 1 above there is only one way to get an instance method,
and only one way to get a metaclass method -- calling with the instance
(either directly or indirectly via the class), or calling a metaclass
method that has been marked as a @classmethod.
>
> Therein lies my confusion.
>
> class Meta(type):
>
> @classmethod
> def meta_method(mcls):
> print("I'm a metaclass method!")
>
> def cls_method1(cls):
> print("I'm a class method! Aren't I?")
>
> class Class(metaclass=Meta):
>
> @classmethod
> def cls_method2(cls):
> print("I'm a class method for sure!")
>
> def instance_method(self):
> print("And I'm a regular ol' instance method")
>
>
> So, is Meta.cls_method1 a class method? On the one hand, it takes the
class as it's first parameter, on the other hand it lives in the metaclass.
And on the third hand you can't get to it from the instance Class().
It's a hidden class method.
> If you're wondering why this is posted to PyDev, the related question is
this: What is the proper role of a metaclass? Should it basically fiddle
with the class creation process and then get out of the way? The case in
point is, of course, Enum. Currently it has a custom __getattr__, but it
lives in the metaclass, EnumMeta. Now this is handy, because it means that
Color.red.blue raises an AttributeError, where if __getattr__ lived in Enum
itself that would work. It also has the __members__ attribute living in
EnumMeta, which means it's not accessible from Color.red. In other words,
EnumMeta is not getting out the way, it is still very much involved. On
the one hand, that's cool; on the other hand, I hand to think hard to
figure out why Color.red.blue was not getting routed through EnumMeta's
__getattr__, but was instead getting routed through object.__getattr__.
This is exactly how a metaclass is intended to be used - to affect the
behaviour of the class without affecting the behaviour of instances.
And yes, introspection does get a little interesting when a non-trivial
metaclass is in play :)
Cheers,
Nick.
>
> --
> ~Ethan~
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Issue #18904: test_socket: add inheritance tests using fcntl and FD_CLOEXEC
On Sun, 8 Sep 2013 11:54:00 +0200 (CEST) victor.stinner wrote: > http://hg.python.org/cpython/rev/b7f6f6f59e91 > changeset: 85619:b7f6f6f59e91 > user:Victor Stinner > date:Sun Sep 08 11:53:09 2013 +0200 > summary: > Issue #18904: test_socket: add inheritance tests using fcntl and FD_CLOEXEC > [...] > > +if fcntl: > +def test_get_inheritable_cloexec(self): The right way to do this would be to skip the test if fcntl doesn't exist. Regards Antoine. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
I'd like to get some attention for this please. On Sat, Aug 31, 2013 at 12:58:39PM +1000, Steven D'Aprano wrote: > Hi all, > > > I think that PEP 450 is now ready for a PEP dictator. There have been a > number of code reviews, and feedback has been taken into account. The test > suite passes. I'm not aware of any unanswered issues with the code. At > least two people other than myself think that the implementation is ready > for a dictator, and nobody has objected. > > There is still on-going work on speeding up the implementation for the > statistics.sum function, but that will not effect the interface or the > substantially change the test suite. > > http://bugs.python.org/issue18606 > http://www.python.org/dev/peps/pep-0450/ > > > > > -- > Steven > ___ > Python-Dev mailing list > [email protected] > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/steve%40pearwood.info > ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] cpython: Issue #18904: test_socket: add inheritance tests using fcntl and FD_CLOEXEC
2013/9/8 Antoine Pitrou : > On Sun, 8 Sep 2013 11:54:00 +0200 (CEST) > victor.stinner wrote: >> http://hg.python.org/cpython/rev/b7f6f6f59e91 >> changeset: 85619:b7f6f6f59e91 >> user:Victor Stinner >> date:Sun Sep 08 11:53:09 2013 +0200 >> summary: >> Issue #18904: test_socket: add inheritance tests using fcntl and FD_CLOEXEC >> > [...] >> >> +if fcntl: >> +def test_get_inheritable_cloexec(self): > > The right way to do this would be to skip the test if fcntl doesn't > exist. Ok. First, I added two methods to get and set the FD_CLOXEC flag. Later, I inlined these methods because it makes the code more readable. I modified the tests to use @unittest.skipIf decorator: New changeset aea58e1cae75 by Victor Stinner in branch 'default': Issue #18904: test_os and test_socket use unittest.skipIf() to check if fcntl http://hg.python.org/cpython/rev/aea58e1cae75 Victor ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 454: Add a new tracemalloc module
2013/9/4 Victor Stinner : > http://www.python.org/dev/peps/pep-0454/ > > PEP: 454 > Title: Add a new tracemalloc module to trace Python memory allocations > Version: $Revision$ > Last-Modified: $Date$ > Author: Victor Stinner > Status: Draft > Type: Standards Track > Content-Type: text/x-rst > Created: 3-September-2013 > Python-Version: 3.4 I added a function get_tracemalloc_size() to see how much memory is used by the tracemalloc module itself. Result on the Python test suite: * 1 frame: +52% (+%68%) Python=34 MiB; _tracemalloc=18 MiB, tracemalloc.py=5 MiB * 10 frames: +155% (+170%) Python=34 MiB, _tracemalloc=53 MiB, tracemalloc.py=5 MiB * 100 frames: +1273% (+1283%) Python=30 MiB, _tracemalloc=382 MiB, tracemalloc.py=6 MiB On a small application and a computer with GB of memory, it may not matter. In a big application on an embedded device, it can be a blocker point to use tracemalloc. So I added filters (on the filename and line number) directly in the C module: ``add_filter(include: bool, filename: str, lineno: int=None)`` function: Add a filter. If *include* is ``True``, only trace memory blocks allocated in a file with a name matching *filename*. If *include* is ``False``, don't trace memory blocks allocated in a file with a name matching *filename*. The match is done using *filename* as a prefix. For example, ``'/usr/bin/'`` only matchs files the ``/usr/bin`` directories. The ``.pyc`` and ``.pyo`` suffixes are automatically replaced with ``.py`` when matching the filename. *lineno* is a line number. If *lineno* is ``None`` or lesser than ``1``, it matches any line number. ``clear_filters()`` function: Reset the filter list. ``get_filters()`` function: Get the filters as list of ``(include: bool, filename: str, lineno: int)`` tuples. If *lineno* is ``None``, a filter matchs any line number. By default, the filename of the Python tracemalloc module (``tracemalloc.py``) is excluded. Right now, the match is done using a PyUnicode_Tailmatch(). It is not convinient. I will see if it is possible to implement the joker character "*" matching any string, so the API would be closer to Snapshot.filter_filenames() (which uses fnmatch.fnmatch). Victor ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
...what's a PEP dictator? Steven D'Aprano wrote: > > >I'd like to get some attention for this please. > > >On Sat, Aug 31, 2013 at 12:58:39PM +1000, Steven D'Aprano wrote: >> Hi all, >> >> >> I think that PEP 450 is now ready for a PEP dictator. There have been >a >> number of code reviews, and feedback has been taken into account. The >test >> suite passes. I'm not aware of any unanswered issues with the code. >At >> least two people other than myself think that the implementation is >ready >> for a dictator, and nobody has objected. >> >> There is still on-going work on speeding up the implementation for >the >> statistics.sum function, but that will not effect the interface or >the >> substantially change the test suite. >> >> http://bugs.python.org/issue18606 >> http://www.python.org/dev/peps/pep-0450/ >> >> >> >> >> -- >> Steven >> ___ >> Python-Dev mailing list >> [email protected] >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> >http://mail.python.org/mailman/options/python-dev/steve%40pearwood.info >> >___ >Python-Dev mailing list >[email protected] >https://mail.python.org/mailman/listinfo/python-dev >Unsubscribe: >https://mail.python.org/mailman/options/python-dev/rymg19%40gmail.com -- Sent from my Android phone with K-9 Mail. Please excuse my brevity.___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
On Sat, 31 Aug 2013 12:58:39 +1000 Steven D'Aprano wrote: > Hi all, > > > I think that PEP 450 is now ready for a PEP dictator. Perhaps Mark would like to apply? Regards Antoine. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
On 09/08/2013 06:52 AM, Ryan wrote: ...what's a PEP dictator? The person tasked with deciding on the fate of an individual PEP. -- ~Ethan~ ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
Going over the open issues: - Parallel arrays or arrays of tuples? I think the API should require an array of tuples. It is trivial to zip up parallel arrays to the required format, while if you have an array of tuples, extracting the parallel arrays is slightly more cumbersome. Also for manipulating of the raw data, an array of tuples makes it easier to do insertions or removals without worrying about losing the correspondence between the arrays. - Requiring concrete sequences as opposed to iterators sounds fine. I'm guessing that good algorithms for doing certain calculations in a single pass, assuming the full input doesn't fit in memory, are quite different from good algorithms for doing the same calculations without having that worry. (Just like you can't expect to use the same code to do a good job of sorting in-memory and on-disk data.) - Postponing some algorithms to Python 3.5 sounds fine. On Sun, Sep 8, 2013 at 9:06 AM, Ethan Furman wrote: > On 09/08/2013 06:52 AM, Ryan wrote: >> >> >> ...what's a PEP dictator? > > > The person tasked with deciding on the fate of an individual PEP. > > -- > ~Ethan~ > ___ > > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
Steven, I'd like to just approve the PEP, given the amount of discussion that's happened already (though I didn't follow much of it). I quickly glanced through the PEP and didn't find anything I'd personally object to, but then I found your section of open issues, and I realized that you don't actually specify the proposed API in the PEP itself. It's highly unusual to approve a PEP that doesn't contain a specification. What did I miss? On Sun, Sep 8, 2013 at 5:37 AM, Steven D'Aprano wrote: > > > I'd like to get some attention for this please. > > > On Sat, Aug 31, 2013 at 12:58:39PM +1000, Steven D'Aprano wrote: >> Hi all, >> >> >> I think that PEP 450 is now ready for a PEP dictator. There have been a >> number of code reviews, and feedback has been taken into account. The test >> suite passes. I'm not aware of any unanswered issues with the code. At >> least two people other than myself think that the implementation is ready >> for a dictator, and nobody has objected. >> >> There is still on-going work on speeding up the implementation for the >> statistics.sum function, but that will not effect the interface or the >> substantially change the test suite. >> >> http://bugs.python.org/issue18606 >> http://www.python.org/dev/peps/pep-0450/ >> >> >> >> >> -- >> Steven >> ___ >> Python-Dev mailing list >> [email protected] >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> http://mail.python.org/mailman/options/python-dev/steve%40pearwood.info >> > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/guido%40python.org -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
On Sun, Sep 8, 2013 at 9:32 PM, Guido van Rossum wrote: > - Parallel arrays or arrays of tuples? I think the API should require > an array of tuples. It is trivial to zip up parallel arrays to the > required format, while if you have an array of tuples, extracting the > parallel arrays is slightly more cumbersome. > I agree with your conclusion but not with the rationale: converting between array of tuples and parallel arrays is trivial both ways: >>> at = [(1,2,3), (4,5,6), (7,8,9), (10, 11, 12)] >>> zip(*at) [(1, 4, 7, 10), (2, 5, 8, 11), (3, 6, 9, 12)] >>> zip(*_) [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12)] (Note that zip(*x) is basically transpose(x).) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
Never mind, I found the patch and the issue. I really think that the *PEP* is ready for inclusion after the open issues are changed into something like Discussion or Future Work, and after adding a more prominent link to the issue with the patch. Then the *patch* can be reviewed some more until it is ready -- it looks very close already. On Sun, Sep 8, 2013 at 10:32 AM, Guido van Rossum wrote: > Going over the open issues: > > - Parallel arrays or arrays of tuples? I think the API should require > an array of tuples. It is trivial to zip up parallel arrays to the > required format, while if you have an array of tuples, extracting the > parallel arrays is slightly more cumbersome. Also for manipulating of > the raw data, an array of tuples makes it easier to do insertions or > removals without worrying about losing the correspondence between the > arrays. > > - Requiring concrete sequences as opposed to iterators sounds fine. > I'm guessing that good algorithms for doing certain calculations in a > single pass, assuming the full input doesn't fit in memory, are quite > different from good algorithms for doing the same calculations without > having that worry. (Just like you can't expect to use the same code to > do a good job of sorting in-memory and on-disk data.) > > - Postponing some algorithms to Python 3.5 sounds fine. > > On Sun, Sep 8, 2013 at 9:06 AM, Ethan Furman wrote: >> On 09/08/2013 06:52 AM, Ryan wrote: >>> >>> >>> ...what's a PEP dictator? >> >> >> The person tasked with deciding on the fate of an individual PEP. >> >> -- >> ~Ethan~ >> ___ >> >> Python-Dev mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/guido%40python.org > > > > -- > --Guido van Rossum (python.org/~guido) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
Well, to me zip(*x) is unnatural, and it's inefficient when the arrays are long. On Sun, Sep 8, 2013 at 10:45 AM, Alexander Belopolsky wrote: > > On Sun, Sep 8, 2013 at 9:32 PM, Guido van Rossum wrote: >> >> - Parallel arrays or arrays of tuples? I think the API should require >> an array of tuples. It is trivial to zip up parallel arrays to the >> required format, while if you have an array of tuples, extracting the >> parallel arrays is slightly more cumbersome. > > > I agree with your conclusion but not with the rationale: converting between > array of tuples and parallel arrays is trivial both ways: > at = [(1,2,3), (4,5,6), (7,8,9), (10, 11, 12)] zip(*at) > [(1, 4, 7, 10), (2, 5, 8, 11), (3, 6, 9, 12)] zip(*_) > [(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12)] > > (Note that zip(*x) is basically transpose(x).) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
On Sun, Sep 08, 2013 at 10:25:22AM -0700, Guido van Rossum wrote: > Steven, I'd like to just approve the PEP, given the amount of > discussion that's happened already (though I didn't follow much of > it). I quickly glanced through the PEP and didn't find anything I'd > personally object to, but then I found your section of open issues, > and I realized that you don't actually specify the proposed API in the > PEP itself. It's highly unusual to approve a PEP that doesn't contain > a specification. What did I miss? You didn't miss anything, but I may have. Should the PEP go through each public function in the module (there are only 11)? That may be a little repetitive, since most have the same, or almost the same, signatures. Or is it acceptable to just include an overview? I've come up with this: API The initial version of the library will provide univariate (single variable) statistics functions. The general API will be based on a functional model ``function(data, ...) -> result``, where ``data`` is a mandatory iterable of (usually) numeric data. The author expects that lists will be the most common data type used, but any iterable type should be acceptable. Where necessary, functions may convert to lists internally. Where possible, functions are expected to conserve the type of the data values, for example, the mean of a list of Decimals should be a Decimal rather than float. Calculating the mean, median and mode The ``mean``, ``median`` and ``mode`` functions take a single mandatory argument and return the appropriate statistic, e.g.: >>> mean([1, 2, 3]) 2.0 ``mode`` is the sole exception to the rule that the data argument must be numeric. It will also accept an iterable of nominal data, such as strings. Calculating variance and standard deviation In order to be similar to scientific calculators, the statistics module will include separate functions for population and sample variance and standard deviation. All four functions have similar signatures, with a single mandatory argument, an iterable of numeric data, e.g.: >>> variance([1, 2, 2, 2, 3]) 0.5 All four functions also accept a second, optional, argument, the mean of the data. This is modelled on a similar API provided by the GNU Scientific Library[18]. There are three use-cases for using this argument, in no particular order: 1) The value of the mean is known *a priori*. 2) You have already calculated the mean, and wish to avoid calculating it again. 3) You wish to (ab)use the variance functions to calculate the second moment about some given point other than the mean. In each case, it is the caller's responsibility to ensure that given argument is meaningful. Is this satisfactory or do I need to go into more detail? -- Steven ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
On 8 September 2013 20:19, Steven D'Aprano wrote: [...] > Is this satisfactory or do I need to go into more detail? It describes only 7 functions, and yet you state there are 11. I'd suggest you add a 1-line summary of each function, something like: mean - calculate the (arithmetic) mean of the data median - calculate the median value of the data etc. Paul ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
On 8 September 2013 18:32, Guido van Rossum wrote: > Going over the open issues: > > - Parallel arrays or arrays of tuples? I think the API should require > an array of tuples. It is trivial to zip up parallel arrays to the > required format, while if you have an array of tuples, extracting the > parallel arrays is slightly more cumbersome. Also for manipulating of > the raw data, an array of tuples makes it easier to do insertions or > removals without worrying about losing the correspondence between the > arrays. For something like this, where there are multiple obvious formats for the input data, I think it's reasonable to just request whatever is convenient for the implementation. Otherwise you're asking at least some of your users to convert data from one format to another just so that you can convert it back again. In any real problem you'll likely have more than two variables, so you'll be writing some code to prepare the data for the function anyway. The most obvious alternative that isn't explicitly mentioned in the PEP is to accept either: def correlation(x, y=None): if y is None: xs = [] ys = [] for x, y in x: xs.append(x) ys.append(y) else: xs = list(x) ys = list(y) assert len(xs) == len(ys) # In reality a helper function does the above. # Now compute stuff This avoids any unnecessary conversions and is as convenient as possible for all users at the expense of having a slightly more complicated API. Oscar ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 454: Add a new tracemalloc module
It seems like most of this could live on PyPi for a while so the API can get hashed out in use? If that's not the case is it because the PEP 445 API isn't rich enough? Janzert ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
On Sun, Sep 8, 2013 at 1:48 PM, Oscar Benjamin wrote: > On 8 September 2013 18:32, Guido van Rossum wrote: >> Going over the open issues: >> >> - Parallel arrays or arrays of tuples? I think the API should require >> an array of tuples. It is trivial to zip up parallel arrays to the >> required format, while if you have an array of tuples, extracting the >> parallel arrays is slightly more cumbersome. Also for manipulating of >> the raw data, an array of tuples makes it easier to do insertions or >> removals without worrying about losing the correspondence between the >> arrays. > > For something like this, where there are multiple obvious formats for > the input data, I think it's reasonable to just request whatever is > convenient for the implementation. Not really. The implementation may change, or its needs may not be obvious to the caller. I would say the right thing to do is request something easy to remember, which often means consistent. In general, Python APIs definitely skew towards lists of tuples rather than parallel arrays, and for good reasons -- that way you benefit most from built-in operations like slices and insert/append. > Otherwise you're asking at least > some of your users to convert data from one format to another just so > that you can convert it back again. In any real problem you'll likely > have more than two variables, so you'll be writing some code to > prepare the data for the function anyway. Yeah, so you might as well prepare it in the form that the API expects. > The most obvious alternative that isn't explicitly mentioned in the > PEP is to accept either: > > def correlation(x, y=None): > if y is None: > xs = [] > ys = [] > for x, y in x: > xs.append(x) > ys.append(y) > else: > xs = list(x) > ys = list(y) > assert len(xs) == len(ys) > # In reality a helper function does the above. > # Now compute stuff > > This avoids any unnecessary conversions and is as convenient as > possible for all users at the expense of having a slightly more > complicated API. I don't think this is really more convenient -- it is more to learn, and can cause surprises (e.g. when a user is only familiar with one format and then sees an example using the other format, they may be unable to understand the example). The one argument I *haven't* heard yet which *might* sway me would be something along the line "every other statistics package that users might be familiar with does it this way" or "all the statistics textbooks do it this way". (Because, frankly, when it comes to statistics I'm a rank amateur and I really want Steven's new module to educate me as much as help me compute specific statistical functions.) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] RFC: PEP 454: Add a new tracemalloc module
2013/9/8 Janzert : > It seems like most of this could live on PyPi for a while so the API can get > hashed out in use? The pytracemalloc is available on PyPI since 6 months. The only feedback I had was something trying to compile it on Windows (which is complex because of the dependency to glib, I don't think that it succeed to install it on Windows). I guess that I didn't get more feedback because it requires to patch and recompile Python, which is not trivial. I expect more feedback on python-dev with a working implementation (on hg.python.org) and a PEP. The version available on PyPI works and should be enough for most use cases to be able to identify a memory leak. Gregory P. Smith asked me if it would be possible to get more frames (filename and line number) of the Python traceback, instead of just the one frame (the last frame). I implemented it, but now I have new issues (memory usage of the tracemalloc module itself), so I'm working on filters directly implemented in the C module (_tracemalloc). It was already possible to filter traces from a snapshot read from the disk. I still have some tasks in my TODO list to finish the API and the implementation. When I will be done, I will post post a new version of the PEP on python-dev. > If that's not the case is it because the PEP 445 API > isn't rich enough? The PEP 445 API is only designed to allow to develop new tools like failmalloc or tracemalloc, without adding overhead if such debug tool is not used. The tracemalloc module reads the current Python traceback (filename and line number) which is "not directly" accessible from PyMem_Malloc(). I hope that existing tools like Heapy and Melia will benefit directly from tracemalloc instead of having to develop their own memory allocator hooks to get the same information (Python traceback). Victor ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
On Sun, Sep 08, 2013 at 02:41:35PM -0700, Guido van Rossum wrote: > On Sun, Sep 8, 2013 at 1:48 PM, Oscar Benjamin > wrote: > > The most obvious alternative that isn't explicitly mentioned in the > > PEP is to accept either: > > > > def correlation(x, y=None): > > if y is None: > > xs = [] > > ys = [] > > for x, y in x: > > xs.append(x) > > ys.append(y) > > else: > > xs = list(x) > > ys = list(y) > > assert len(xs) == len(ys) > > # In reality a helper function does the above. > > # Now compute stuff > > > > This avoids any unnecessary conversions and is as convenient as > > possible for all users at the expense of having a slightly more > > complicated API. The PEP does mention that, as "some combination of the above". The PEP also mentions that the decision of what API to use for multivariate stats is deferred until 3.5, so there's plenty of time for people to bike-shed this :-) > I don't think this is really more convenient -- it is more to learn, > and can cause surprises (e.g. when a user is only familiar with one > format and then sees an example using the other format, they may be > unable to understand the example). > > The one argument I *haven't* heard yet which *might* sway me would be > something along the line "every other statistics package that users > might be familiar with does it this way" or "all the statistics > textbooks do it this way". (Because, frankly, when it comes to > statistics I'm a rank amateur and I really want Steven's new module to > educate me as much as help me compute specific statistical functions.) I don't think that there is one common API for multivariate stats packages. It partially depends on whether the package is aimed at basic use or advanced use. I haven't done a systematic comparison of the most common, but here are a few examples: - The Casio Classpad graphing calculator has a spreadsheet-like interface, which I consider equivalent to func(xdata, ydata). - The HP-48G series of calculators uses a fixed global variable holding a matrix, and a second global variable specifying which columns to use. - The R "cor" (correlation coefficient) function takes either a pair of vectors (lists), and calculates a single value, or a matrix, in which case it calculates the correlation matrix. - numpy.corrcoeff takes one or two array arguments, and a third argument specifying whether to treat rows or columns as variables, and like R returns either a single value or the correlation matrix. - Minitab expects two seperate vector arguments, and returns the correlation coefficient between them. - If I'm reading the below page correctly, the SAS corr procedure takes anything up to 27 arguments. http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/procstat_corr_sect004.htm I don't suggest we follow that API :-) Quite frankly, I consider the majority of stats APIs to be confusing with a steep learning curve. -- Steven ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
Guido van Rossum writes: > On Sun, Sep 8, 2013 at 1:48 PM, Oscar Benjamin > wrote: > > On 8 September 2013 18:32, Guido van Rossum wrote: > >> Going over the open issues: > >> > >> - Parallel arrays or arrays of tuples? I think the API should require > >> an array of tuples. It is trivial to zip up parallel arrays to the > >> required format, while if you have an array of tuples, extracting the > >> parallel arrays is slightly more cumbersome. > >> > >> Also for manipulating of the raw data, an array of tuples makes > >> it easier to do insertions or removals without worrying about > >> losing the correspondence between the arrays. I don't necessarily find this persuasive. It's more common when working with existing databases that you add variables than add observations. This is going to require attention to the correspondence in any case. Observations aren't added, and they're "removed" temporarily for statistics on subsets by slicing. If you use the same slice for all variables, you're not going to make a mistake. > Not really. The implementation may change, or its needs may not be > obvious to the caller. I would say the right thing to do is request > something easy to remember, which often means consistent. In general, > Python APIs definitely skew towards lists of tuples rather than > parallel arrays, and for good reasons -- that way you benefit most > from built-in operations like slices and insert/append. However, it's common in economic statistics to have a rectangular array, and extract both certain rows (tuples of observations on variables) and certain columns (variables). For example you might have data on populations of American states from 1900 to 2012, and extract the data on New England states from 1946 to 2012 for analysis. > The one argument I *haven't* heard yet which *might* sway me would be > something along the line "every other statistics package that users > might be familiar with does it this way" or "all the statistics > textbooks do it this way". (Because, frankly, when it comes to > statistics I'm a rank amateur and I really want Steven's new module to > educate me as much as help me compute specific statistical functions.) In economic statistics, most software traditionally inputs variables in column-major order (ie, parallel arrays). That said, most software nowadays allows input as spreadsheet tables. You pays your money and you takes your choice. I think the example above of state population data shows that rows and columns are pretty symmetric here. Many databases will have "too many" of both, and you'll want to "slice" both to get the sample and variables relevant to your analysis. This is all just for consideration; I am quite familiar with economic statistics and software, but not so much for that used in sociology, psychology, and medical applications. In the end, I think it's best to leave it up to Steven's judgment as to what is convenient for him to maintain. ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
On 9/09/2013 5:52 a.m., Guido van Rossum wrote: Well, to me zip(*x) is unnatural, and it's inefficient when the arrays are long. Would it be worth having a transpose() function in the stdlib somewhere, that returns a view instead of copying the data? -- Greg ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
Yeah, so this and Steven's review of various other APIs suggests that the field of statistics hasn't really reached the object-oriented age (or perhaps the OO view isn't suitable for the field), and people really think of their data as a matrix of some sort. We should respect that. Now, if this was NumPy, it would *still* make sense to require a single argument, to be interpreted in the usual fashion. So I'm using that as a kind of leverage to still recommend taking a list of pairs instead of a pair of lists. Also, it's quite likely that at least *some* of the users of the new statistics module will be more familiar with OO programming (e.g. the Python DB API , PEP 249) than they are with other statistics packages. On Sun, Sep 8, 2013 at 7:57 PM, Stephen J. Turnbull wrote: > Guido van Rossum writes: > > On Sun, Sep 8, 2013 at 1:48 PM, Oscar Benjamin > > wrote: > > > On 8 September 2013 18:32, Guido van Rossum wrote: > > >> Going over the open issues: > > >> > > >> - Parallel arrays or arrays of tuples? I think the API should require > > >> an array of tuples. It is trivial to zip up parallel arrays to the > > >> required format, while if you have an array of tuples, extracting the > > >> parallel arrays is slightly more cumbersome. > > >> > > >> Also for manipulating of the raw data, an array of tuples makes > > >> it easier to do insertions or removals without worrying about > > >> losing the correspondence between the arrays. > > I don't necessarily find this persuasive. It's more common when > working with existing databases that you add variables than add > observations. This is going to require attention to the > correspondence in any case. Observations aren't added, and they're > "removed" temporarily for statistics on subsets by slicing. If you > use the same slice for all variables, you're not going to make a > mistake. > > > Not really. The implementation may change, or its needs may not be > > obvious to the caller. I would say the right thing to do is request > > something easy to remember, which often means consistent. In general, > > Python APIs definitely skew towards lists of tuples rather than > > parallel arrays, and for good reasons -- that way you benefit most > > from built-in operations like slices and insert/append. > > However, it's common in economic statistics to have a rectangular > array, and extract both certain rows (tuples of observations on > variables) and certain columns (variables). For example you might > have data on populations of American states from 1900 to 2012, and > extract the data on New England states from 1946 to 2012 for analysis. > > > The one argument I *haven't* heard yet which *might* sway me would be > > something along the line "every other statistics package that users > > might be familiar with does it this way" or "all the statistics > > textbooks do it this way". (Because, frankly, when it comes to > > statistics I'm a rank amateur and I really want Steven's new module to > > educate me as much as help me compute specific statistical functions.) > > In economic statistics, most software traditionally inputs variables > in column-major order (ie, parallel arrays). That said, most software > nowadays allows input as spreadsheet tables. You pays your money and > you takes your choice. > > I think the example above of state population data shows that rows and > columns are pretty symmetric here. Many databases will have "too many" > of both, and you'll want to "slice" both to get the sample and > variables relevant to your analysis. > > This is all just for consideration; I am quite familiar with economic > statistics and software, but not so much for that used in sociology, > psychology, and medical applications. In the end, I think it's best > to leave it up to Steven's judgment as to what is convenient for him > to maintain. > -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
On Sun, Sep 8, 2013 at 5:26 PM, Greg wrote: > On 9/09/2013 5:52 a.m., Guido van Rossum wrote: > >> Well, to me zip(*x) is unnatural, and it's inefficient when the arrays >> are long. >> > > Would it be worth having a transpose() function in the stdlib > somewhere, that returns a view instead of copying the data? I'd be hesitant to add just that one function, given that there's hardly any support for multi-dimensional arrays in the stdlib. (NumPy of course has a transpose(), and that's where it arguably belongs.) -- --Guido van Rossum (python.org/~guido) ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
On Sun, Sep 08, 2013 at 09:14:39PM +0100, Paul Moore wrote: > On 8 September 2013 20:19, Steven D'Aprano wrote: > [...] > > Is this satisfactory or do I need to go into more detail? > > It describes only 7 functions, and yet you state there are 11. I'd > suggest you add a 1-line summary of each function, something like: > > mean - calculate the (arithmetic) mean of the data > median - calculate the median value of the data > etc. Thanks Paul, will do. I think PEP 1 needs to be a bit clearer about this part of the process. For instance, if I had a module with 100 functions and methods, would I need to document all of them in the PEP? I expect not, but then I didn't expect I needed to document all 11 either :-) -- Steven ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 450 adding statistics module
On Mon, Sep 09, 2013 at 12:26:05PM +1200, Greg wrote: > On 9/09/2013 5:52 a.m., Guido van Rossum wrote: > >Well, to me zip(*x) is unnatural, and it's inefficient when the arrays are > >long. > > Would it be worth having a transpose() function in the stdlib > somewhere, that returns a view instead of copying the data? I've intentionally left out multivariate statistics from the initial version of statistics.py so there will be plenty of time to get feedback from users before deciding on an API before 3.5. If there was a transpose function in the std lib, the obvious place would be the statistics module itself. There is precedent: R includes a transpose function, and presumably the creators of R expect it to be used frequently because they've given it a single-letter name. http://stat.ethz.ch/R-manual/R-devel/library/base/html/t.html -- Steven ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
