Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-30 Thread Josiah Carlson
On Fri, Jan 29, 2010 at 11:25 PM, Stephen J. Turnbull
 wrote:
> Josiah Carlson writes:
>
>  > Lisp lists are really stacks
>
> No, they're really (ie, concretely) singly-linked lists.
>
> Now, stacks are an abstract data type, and singly-linked lists provide
> an efficient implementation of stacks.  But that's not what linked
> lists "really are".  For example, singly-linked lists are also a
> reasonable way to implement inverted trees (ie, the node knows its
> parent, but not its children), which is surely not a stack.
>
> The Python use of "list" to denote what is concretely a dynamically
> extensible one-dimensional array confused me a bit.  But what the
> heck, Guido needed a four-letter word to denote a concrete type used
> to implement a mutable sequence ADT, and he wasn't going to borrow one
> from that French guy on the ramparts, right?  No big deal.  Ahem...
>
> So the confusion here is that in Python, "list" denotes a particular
> concrete data type, while Steve H. is using a more abstract idea of
> list as mutable sequence to suggest there's a reason for optimizing
> certain mutations that Python's data type isn't good at.  I don't
> think that's an effective way for him to make his point, unfortunately.
> But both usages are consistent with Python's usage; mutability is the
> usual way that lists are distinguished from tuples, for example, and
> the underlying dynamic array implementation is rarely mentioned.

My experience with Lisp is limited to mzScheme and DrScheme, but
AFAIR, neither of them had mutable lists.  Both had list semantics
that were equivalent (in terms of limitations and functionality) to
the structure I described using tuples in my earlier post.  If other
Lisp implementations have mutable lists, I'd be surprised to learn
that.

However, now we are well into the weeds, far off the track of whether
or not Steve H's feature is worth saddling Python lists with cruft.

 - Josiah
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-30 Thread Josiah Carlson
On Fri, Jan 29, 2010 at 11:31 PM, Stephen J. Turnbull
 wrote:
> Josiah Carlson writes:
>  > On Thu, Jan 28, 2010 at 8:57 PM, Steve Howell  wrote:
>
>  > > What do you think of LISP, and "car" in particular (apart from
>  > > the stupidly cryptic name)?
>
>  > Apples and oranges.
>
> True, but speaking of Lisp lists, here's some possibly relevant
> experience.  About 10 years ago, XEmacs converted its cons type from a
> special immediate representation (ie, cons == (car, cdr)) to a generic
> record representation (ie, cons == (pointer to type descriptor, car,
> cdr)).  This resulted in a perceptible increase in VM usage and disk
> usage.  A typical running XEmacs instance for me contains about 0.75
> million conses and uses 200MB of VM, so with 32-bit pointers that's
> about 3MB extra, or 1.5%, and with 64-bit pointers it's 6MB extra,
> about 3%.  However, I tend to have several big buffers (20-50MB) of
> pure character data; people who work with smaller buffers on 64-bit
> machines have reported as much as 10% extra overhead.  On disk, the
> binary is typically about 9MB stripped.  That contains about 50,000
> conses, or an extra 200KB/400KB with the new structure, somewhat more
> than my experience (2% or 4%).
>
> Some people complained, but we considered this well worthwhile (moving
> one "type bit" from the car to the header allowed Lisp integers to
> cover the range -1G to +1G, and there are a surprising number of
> people who would like to use XEmacs on files >512MB).  I suppose that
> Steve's proposal probably has similar impact on binaries and running
> instances of Python, but he hasn't given any use cases for list.pop(0)
> to compared to doubling the size of usable buffers.

The choice that emacs made is great for emacs; as you stated, it
allowed emacs to do something it was previously unable to do.  Steve
H's proposed change would not allow Python to do anything it wasn't
able to do before, and would (as TJR stated in this and other threads)
saddle Python with overhead so as to make more convenient the use a
structure for which it was not intended (paraphrased, of course).
Again; no good use-case, means no problem, means no reason to try to
solve the perceived "problem".

It's great that you support Steve H's proposal, but can we keep the
discussion on why this would be good for Python, rather than why
changing a structure that is identical in name (but only similar in
functionality) to Python's list was good for another language?

 - Josiah
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Forking and Multithreading - enemy brothers

2010-01-30 Thread Henning von Bargen

From: Stefan Behnel 
To: [email protected]
Subject: Re: [Python-Dev] Forking and Multithreading - enemy brothers
Message-ID: 
Content-Type: text/plain; charset=ISO-8859-15

Pascal Chambon, 29.01.2010 22:58:

I've just recently realized the huge problems surrounding the mix of
multithreading and fork() - i.e that only the main thread actually
survived the fork(), and that process data (in particular,
synchronization primitives) could be left in a dangerously broken state
because of such forks, if multithreaded programs.


I would *never* have even tried that, 


Why not? Actually there are some real-world use-cases.

For example, in 2005 I developed a multi-threaded report generator 
application which supports different engines. Some of these engines

allow to generate a report by submitting a http request, while others
require you to start an executable.

The application worked fine, but the multi-threading/forking problem
led to some very-hard-to-detect bugs.

See http://mail.python.org/pipermail/python-dev/2007-June/073745.html
or just search the web for "unwanted handle inheritance".

Even seemingly fool-proof things like using the logging module (when 
internally deleting old log files) in the parent process would cause 
problems.


I remember Martin von Löwis mentioned that the special case of file 
handles on Windows should be solved by the new I/O implementation of 
Python 3, but I didn't check this (that particular application is still 
working with Python 2.4 (I developed a workaround that required 
replacing all calls to open and patching socket.py or whatever).



I would *never* have even tried that, but it doesn't surprise me that it
works basically as expected. I found this as a quick intro:

http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2003-09/0672.html


... and another interesting link that also describes exec() usage in this
context.

http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them

Stefan



That is indeed very interesting.
Unfortunately Python has to support older Linux kernels as well.

Henning



___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Forking and Multithreading - enemy brothers

2010-01-30 Thread Pascal Chambon


/[...]
What dangers do you refer to specifically? Something reproducible?
-L
/


Since it's a race condition issue, it's not easily reproducible with 
normal libraries - which only take threading locks for small moments.
But it can appear if your threads make good use of the threading module. 
By forking randomly, you have chances that the main locks of the logging 
module you frozen in an "acquired" state (even though their owner 
threads are not existing in the child process), and your next attempt to 
use logging will result in a pretty deadlock (on some *nix platforms, at 
least). This issue led to the creation of python-atfork by the way.



Stefan Behnel a écrit :

Stefan Behnel, 30.01.2010 07:36:
  

Pascal Chambon, 29.01.2010 22:58:


I've just recently realized the huge problems surrounding the mix of
multithreading and fork() - i.e that only the main thread actually
survived the fork(), and that process data (in particular,
synchronization primitives) could be left in a dangerously broken state
because of such forks, if multithreaded programs.
  

I would *never* have even tried that, but it doesn't surprise me that it
works basically as expected. I found this as a quick intro:

http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2003-09/0672.html



... and another interesting link that also describes exec() usage in this
context.

http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them

Stefan

  

Yep, these links sum it up quite well.
But to me it's not a matter of "trying" to mix threads and fork - most 
people won't on purpose seek trouble.
It's simply the fact that, in a multithreaded program (i.e, any program 
of some importance), multiprocessing modules will be impossible to use 
safely without a complex synchronization of all threads to prepare the 
underlying forking (and we know that using multiprocessing can be a 
serious benefit, for GIL/performance reasons).
Solutions to fork() issues clearly exist - just add a "use_forking=yes" 
attribute to subprocess functions, and users will be free to use the 
spawnl() semantic, which is already implemented on win32 platforms, and 
which gives full control over both threads and subprocesses. Honestly, I 
don't see how it will complicate stuffs, except slightly for the 
programmer which will have to edit the code to add spwawnl() support (I 
might help on that).


Regards,
Pascal


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Forking and Multithreading - enemy brothers

2010-01-30 Thread Pascal Chambon


/[...]
What dangers do you refer to specifically? Something reproducible?
-L
/


Since it's a race condition issue, it's not easily reproducible with 
normal libraries - which only take threading locks for small moments.
But it can appear if your threads make good use of the threading module. 
By forking randomly, you have chances that the main locks of the logging 
module you frozen in an "acquired" state (even though their owner 
threads are not existing in the child process), and your next attempt to 
use logging will result in a pretty deadlock (on some *nix platforms, at 
least). This issue led to the creation of python-atfork by the way.



Stefan Behnel a écrit :

Stefan Behnel, 30.01.2010 07:36:
  

Pascal Chambon, 29.01.2010 22:58:


I've just recently realized the huge problems surrounding the mix of
multithreading and fork() - i.e that only the main thread actually
survived the fork(), and that process data (in particular,
synchronization primitives) could be left in a dangerously broken state
because of such forks, if multithreaded programs.
  

I would *never* have even tried that, but it doesn't surprise me that it
works basically as expected. I found this as a quick intro:

http://unix.derkeiler.com/Newsgroups/comp.unix.programmer/2003-09/0672.html



... and another interesting link that also describes exec() usage in this
context.

http://www.linuxprogrammingblog.com/threads-and-fork-think-twice-before-using-them

Stefan

  

Yep, these links sum it up quite well.
But to me it's not a matter of "trying" to mix threads and fork - most 
people won't on purpose seek trouble.
It's simply the fact that, in a multithreaded program (i.e, any program 
of some importance), multiprocessing modules will be impossible to use 
safely without a complex synchronization of all threads to prepare the 
underlying forking (and we know that using multiprocessing can be a 
serious benefit, for GIL/performance reasons).
Solutions to fork() issues clearly exist - just add a "use_forking=yes" 
attribute to subprocess functions, and users will be free to use the 
spawnl() semantic, which is already implemented on win32 platforms, and 
which gives full control over both threads and subprocesses. Honestly, I 
don't see how it will complicate stuffs, except slightly for the 
programmer which will have to edit the code to add spwawnl() support (I 
might help on that).


Regards,
Pascal


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-30 Thread Stephen J. Turnbull
Josiah Carlson writes:
 > On Fri, Jan 29, 2010 at 11:31 PM, Stephen J. Turnbull
 >  wrote:

 > > Some people complained, but we considered this well worthwhile (moving
 > > one "type bit" from the car to the header allowed Lisp integers to
 > > cover the range -1G to +1G, and there are a surprising number of
 > > people who would like to use XEmacs on files >512MB).  I suppose that
 > > Steve's proposal probably has similar impact on binaries and running
 > > instances of Python, but he hasn't given any use cases for list.pop(0)
 > > to compared to doubling the size of usable buffers.

 > The choice that emacs made is great for emacs;

Emacs hasn't made that choice, XEmacs did.  I believe Emacs is still
"restricted" to 128MB, or maybe 256MB, buffers.  They recently had an
opportunity to increase integer size, and thus maximum buffer size,
but refused it.  It's not a no-brainer.

 > It's great that you support Steve H's proposal, but can we keep the
 > discussion on why this would be good for Python,

I don't support it or oppose it (I wouldn't notice the increased
overhead myself, but I have no use case for O(1) list.pop(0)).  I'm
giving some figures on a similar change (adding a single pointer to a
previously low-overhead structure used in large numbers in some
applications), and pointing out that this was good for XEmacs only
because there was a rather big increase in capability in a use-case
that people can sympathize with even if they don't need it themselves.

I hope that this example will help Steve H understand why he needs to
give real use-cases, or if he doesn't know of any, give up.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-30 Thread Paul Moore
On 29 January 2010 23:45, "Martin v. Löwis"  wrote:
>> On Windows, would a C extension author be able to distribute a single
>> binary (bdist_wininst/bdist_msi) which would be compatible with
>> with-LLVM and without-LLVM builds of Python?
>
> When PEP 384 gets implemented, you not only get that, but you will also
> be able to use the same extension module for 3.2, 3.3, 3.4, etc, with
> or without U-S.

Ah! That's the point behind PEP 384! Sorry, I'd only skimmed that PEP
when it came up, and completely missed the implications.

In which case a HUGE +1 from me for PEP 384.

Paul.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-30 Thread Stephen J. Turnbull
Minor erratum:

Stephen J. Turnbull writes:

 > Emacs hasn't made that choice, XEmacs did.  I believe Emacs is still
 > "restricted" to 128MB, or maybe 256MB, buffers.  They recently had an
 > opportunity to increase integer size, and thus maximum buffer size,
 > but refused it.  It's not a no-brainer.

I stand corrected.  Emacs did make some changes which increased
integer size from 28 bits to 30, allowing a maximum signed value of
512M, but refused the tradeoff I described of making the cons type be
indicated by a pointer to a type description record rather than a type
bit in one of the pointers.  That would have allowed 31 bits for
integers, as in XEmacs.  The basic thrust of my argument was correct,
though.

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-30 Thread Steve Howell
--- On Fri, 1/29/10, Stephen J. Turnbull  wrote:

> 
>  > Lisp lists are really stacks
> 
> No, they're really (ie, concretely) singly-linked
> lists.  
> 
> Now, stacks are an abstract data type, and singly-linked
> lists provide
> an efficient implementation of stacks.  But that's not
> what linked
> lists "really are".  For example, singly-linked lists
> are also a
> reasonable way to implement inverted trees (ie, the node
> knows its
> parent, but not its children), which is surely not a
> stack.

I like your distinction between abstract data types and concrete 
implementations.
 
>From a mutability perspective, the concrete implementation of Python lists 
>shares a performance characteristic with most concrete implementations of 
>stacks, in that inserts/pops at the top are cheap.

Unlike most stacks, Python lists do at least semantically allow queue-like 
behavior for removing elements from the bottom, but I don't think it's unfair 
of me to say that removes from the bottom are discouraged under the current 
implementation.  (I can cite the tutorial, for example).  So, to the extent 
that removes from the bottom are frowned upon, Python again has the same 
mutability characteristics as a stack.

The abstract data type "stack" does not allow for random access of elements 
AFAIK, so Python lists are definitely more than a stack, especially since 
random accesses are not only possible, they are quite efficient.

So I guess they are an array.

I don't know whether or not "arrays" are considered to be an abstract data type 
or not, but my de facto concept of an array is something that supports fast 
random access, cheap mutation at the top, and no guarantees at the bottom.  I 
am guessing that from a big-O perspective, Python lists have the exact same 
performance characteristics as the data structures that Perl, Ruby, and 
Javascript all call "array."  Also, Python lists are built on top of a C array, 
and while it would be a bit of an overstatement to say that lists are just a 
nicely sugared encapsulation of C arrays, I think it would be a fair statement 
to say that Python lists only give O(1) performance for the same operations as 
the underlying C array; all the other operations are there just for convenience 
where performance is not a driving concern.

Also, we can go back to the example of LISP, the one language that I know of 
that shares the term "list."  Whatever a "list" denotes from an abstract 
perspective, Python and LISP do not agree upon the definition.  Python lists 
are more like right-side-up stacks with fast random access, while LISP lists 
are more like an upside-down stack without iteration.

> The Python use of "list" to denote what is concretely a
> dynamically
> extensible one-dimensional array confused me a bit. 
> But what the
> heck, Guido needed a four-letter word to denote a concrete
> type used
> to implement a mutable sequence ADT, and he wasn't going to
> borrow one
> from that French guy on the ramparts, right?  No big
> deal.  Ahem...

Probably the same reason he didn't call dictionaries "hashes", right? :)


___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-30 Thread Cesare Di Mauro
I'm back with some tests that I made with the U-S test suite.

2010/1/30 Scott Dial

>

> Cesare, just FYI, your Hg repository has lost the execute bits on some
> files (namely "./configure" and "./Parser/asdl_c.py"), so it does not
> quite build out-of-the-box.
>

Unfortunately, I haven't found a solution to this problem. If somebody
working with Windows and Mercurial (I use TortoiseHg graphical client) can
give help on this issue, I'll release wpython 1.1 final.


> I took the liberty of cloning your repo into my laptop's VirtualBox
> instance of Ubuntu. I ran the default performance tests from the U-S
> repo, with VirtualBox at highest priority. As a sanity check, I ran it
> against the U-S trunk. I think the numbers speak for themselves.
>
>  --
> Scott Dial
> [email protected]
> [email protected]
>

I downloaded U-S test suite, and made some benchmarks with my machine.
Django and Spambayes tests didn't run:

Running django...
INFO:root:Running D:\Projects\wpython\wpython10_test\PCbuild\python
performance/bm_django.py -n 100
Traceback (most recent call last):
File "perf.py", line 1938, in 
main(sys.argv[1:])
File "perf.py", line 1918, in main
options)))
File "perf.py", line 1193, in BM_Django
return SimpleBenchmark(MeasureDjango, *args, **kwargs)
File "perf.py", line 590, in SimpleBenchmark
*args, **kwargs)
File "perf.py", line 1189, in MeasureDjango
return MeasureGeneric(python, options, bm_path, bm_env)
File "perf.py", line 960, in MeasureGeneric
inherit_env=options.inherit_env)
File "perf.py", line 916, in CallAndCaptureOutput
raise RuntimeError("Benchmark died: " + err)
RuntimeError: Benchmark died: Traceback (most recent call last):
File "performance/bm_django.py", line 25, in 
from django.template import Context, Template
ImportError: No module named template

Running spambayes...
INFO:root:Running D:\Projects\wpython\wpython10_test\PCbuild\python
performance/bm_spambayes.py -n 50
Traceback (most recent call last):
File "perf.py", line 1938, in 
main(sys.argv[1:])
File "perf.py", line 1918, in main
options)))
File "perf.py", line 1666, in BM_spambayes
return SimpleBenchmark(MeasureSpamBayes, *args, **kwargs)
File "perf.py", line 590, in SimpleBenchmark
*args, **kwargs)
File "perf.py", line 1662, in MeasureSpamBayes
return MeasureGeneric(python, options, bm_path, bm_env)
File "perf.py", line 960, in MeasureGeneric
inherit_env=options.inherit_env)
File "perf.py", line 916, in CallAndCaptureOutput
raise RuntimeError("Benchmark died: " + err)
RuntimeError: Benchmark died: Traceback (most recent call last):
File "performance/bm_spambayes.py", line 18, in 
from spambayes import hammie, mboxutils
ImportError: No module named spambayes

Anyway, I run all others with wpython 1.0 final:

C:\Temp\unladen-swallow-tests>C:\temp\Python-2.6.4\PCbuild\python perf.py -r
-b default,-django,-spambayes C:\temp\Python-2.6.4\PCbuild\python
D:\Projects\wpython\wpython10_test\PCbuild\python

Report on Windows Conan post2008Server 6.1.7600 x86 AMD64 Family 15 Model 12
Stepping 0, AuthenticAMD
Total CPU cores: 1

### 2to3 ###
Min: 43.408000 -> 38.528000: 1.1267x faster
Avg: 44.448600 -> 39.391000: 1.1284x faster
Significant (t=10.582185)
Stddev: 0.84415 -> 0.65538: 1.2880x smaller
Timeline: http://tinyurl.com/ybdwese

### nbody ###
Min: 1.124000 -> 1.109000: 1.0135x faster
Avg: 1.167630 -> 1.148190: 1.0169x faster
Not significant
Stddev: 0.09607 -> 0.09544: 1.0065x smaller
Timeline: http://tinyurl.com/yex7dfv

### slowpickle ###
Min: 1.237000 -> 1.067000: 1.1593x faster
Avg: 1.283800 -> 1.109070: 1.1575x faster
Significant (t=11.393574)
Stddev: 0.11086 -> 0.10596: 1.0462x smaller
Timeline: http://tinyurl.com/y8t5ess

### slowspitfire ###
Min: 2.079000 -> 1.928000: 1.0783x faster
Avg: 2.148920 -> 1.987540: 1.0812x faster
Significant (t=7.731224)
Stddev: 0.15384 -> 0.14108: 1.0904x smaller
Timeline: http://tinyurl.com/yzexcqa

### slowunpickle ###
Min: 0.617000 -> 0.568000: 1.0863x faster
Avg: 0.645420 -> 0.590790: 1.0925x faster
Significant (t=7.087322)
Stddev: 0.05478 -> 0.05422: 1.0103x smaller
Timeline: http://tinyurl.com/ycsoouq


I also made some tests with wpython 1.1, leaving bytecode peepholer enabled:

C:\Temp\unladen-swallow-tests>C:\temp\Python-2.6.4\PCbuild\python perf.py -r
-b default,-django,-spambayes C:\temp\Python-2.6.4\PCbuild\python
D:\Projects\wpython\wpython_test\PCbuild\python

Report on Windows Conan post2008Server 6.1.7600 x86 AMD64 Family 15 Model 12
Stepping 0, AuthenticAMD
Total CPU cores: 1

### 2to3 ###
Min: 43.454000 -> 39.912000: 1.0887x faster
Avg: 44.301000 -> 40.766800: 1.0867x faster
Significant (t=8.188533)
Stddev: 0.65325 -> 0.71041: 1.0875x larger
Timeline: http://tinyurl.com/ya5z9mg

### nbody ###
Min: 1.125000 -> 1.07: 1.0514x faster
Avg: 1.169270 -> 1.105530: 1.0577x faster
Significant (t=4.774702)
Stddev: 0.09655 -> 0.09219: 1.0473x smaller
Timeline: http://tinyurl.com/y8udjmk

### slowpickle ###
Min: 1.235000 -> 1.094000: 1.1289x faster
Avg: 1.275860 -> 1.132740: 1.1263

Re: [Python-Dev] patch to make list.pop(0) work in O(1) time

2010-01-30 Thread geremy condra
On Fri, Jan 29, 2010 at 12:48 AM, Terry Reedy  wrote:
> On 1/28/2010 6:30 PM, Josiah Carlson wrote:
>
>> I would also point out that the way these things are typically done is
>> that programmers/engineers have use-cases that are not satisfied by
>> existing structures, they explain the issues they have with existing
>> structures, and they propose modifications.  So far, Steve has not
>> offered any use-cases for why his proposed change is necessary; merely
>
> Use of a list as a queue rather than as a stack, as in breadth-first search,
> where one only needs to pop off the front but never push to the front. That
> is not to say that this is common or that a deque or other options may no be
> pretty satisfactory. But it would certainly be easier, when presenting such
> algorithms, to just be able to use a list, which has already been taught,
> than to introduce another structure. Currently a deque is not a drop-in
> replacement for a list in that one cannot use all list methods with a deque.
>
> As I understand it, his proposal is simpler than the one rejected a couple
> of years ago is that it does not include intentional over-allocation at the
> front of the list, as would be needed for guaranteed O(1) behavior for
> deque-like insertion at the front. I may consider a Python version of his
> idea for one of my needs, where speed is not an issue.
>
> I agree that the discussion has gone on too long here and that some of
> Steve's rhetoric has been unnecessarily abrasive and off-putting. He has
> been told this and acknowledged it once on Python-list, but habits die hard.
> For both reasons, I suggested a few days ago that further discussion should
> focus on the patch and be moved to the issue on the tracker. So I will not
> say more here.
>
> Terry Jan Reedy

Excellently put.

Geremy Condra
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3146: Merge Unladen Swallow into CPython

2010-01-30 Thread Brett Cannon
On Fri, Jan 29, 2010 at 15:04,   wrote:
> On 10:47 pm, [email protected] wrote:
>>
>> On 1/29/2010 4:19 PM, Collin Winter wrote:
>>>
>>> On Fri, Jan 29, 2010 at 7:22 AM, Nick Coghlan wrote:
>>
>>> Agreed. We originally switched Unladen Swallow to wordcode in our
>>> 2009Q1 release, and saw a performance improvement from this across the
>>> board. We switched back to bytecode for the JIT compiler to make
>>> upstream merger easier. The Unladen Swallow benchmark suite should
>>> provided a thorough assessment of the impact of the wordcode ->
>>> bytecode switch. This would be complementary to a JIT compiler, rather
>>> than a replacement for it.
>>>
>>> I would note that the switch will introduce incompatibilities with
>>> libraries like Twisted. IIRC, Twisted has a traceback prettifier that
>>> removes its trampoline functions from the traceback, parsing CPython's
>>> bytecode in the process. If running under CPython, it assumes that the
>>> bytecode is as it expects. We broke this in Unladen's wordcode switch.
>>> I think parsing bytecode is a bad idea, but any switch to wordcode
>>> should be advertised widely.
>>
>> Several years, there was serious consideration of switching to a
>> registerbased vm, which would have been even more of a change. Since I
>> learned 1.4, Guido has consistently insisted that the CPython vm is not part
>> of the language definition and, as far as I know, he has rejected any byte-
>> code hackery in the stdlib. While he is not one to, say, randomly permute
>> the codes just to frustrate such hacks, I believe he has always considered
>> vm details private and subject to change and any usage thereof 'at one's own
>> risk'.
>
> Language to such effect might be a useful addition to this page (amongst
> others, perhaps):
>
>  http://docs.python.org/library/dis.html
>
> which very clearly and helpfully lays out quite a number of APIs which can
> be used to get pretty deep into the bytecode.  If all of this is subject to
> be discarded at the first sign that doing so might be beneficial for some
> reason, don't keep it a secret that people need to join python-dev to learn.
>

Can you file a bug and assign it to me?

-Brett
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Barry Warsaw
PEP: 3147
Title: PYC Repository Directories
Version: $Revision$
Last-Modified: $Date$
Author: Barry Warsaw 
Status: Draft
Type: Standards Track
Content-Type: text/x-rst
Created: 2009-12-16
Python-Version: 3.2
Post-History:


Abstract


This PEP describes an extension to Python's import mechanism which
improves sharing of Python source code files among multiple installed
different versions of the Python interpreter.  It does this by
allowing many different byte compilation files (.pyc files) to be
co-located with the Python source file (.py file).  The extension
described here can also be used to support different Python
compilation caches, such as JIT output that may be produced by an
Unladen Swallow [1]_ enabled C Python.


Rationale
=

Linux distributions such as Ubuntu [2]_ and Debian [3]_ provide more
than one Python version at the same time to their users.  For example,
Ubuntu 9.10 Karmic Koala can install Python 2.5, 2.6, and 3.1, with
Python 2.6 being the default.

In order to ease the burden on operating system packagers for these
distributions, the distribution packages do not contain Python version
numbers [4]_; they are shared across all Python versions installed on
the system.  Putting Python version numbers in the packages would be a
maintenance nightmare, since all the packages - *and their
dependencies* - would have to be updated every time a new Python
release was added or removed from the distribution.  Because of the
sheer number of packages available, this amount of work is infeasible.

For pure Python modules, sharing is possible because upstream
maintainers typically support multiple versions of Python in a source
compatible way.  In practice though, it is well known that pyc files
are not compatible across Python major releases.  A reading of
import.c [5]_ in the Python source code proves that within recent
memory, every new CPython major release has bumped the pyc magic
number.

Even C extensions can be source compatible across multiple versions of
Python.  Compiled extension modules are usually not compatible though,
and PEP 384 [6]_ has been proposed to address this by defining a
stable ABI for extension modules.

Because the distributions cannot share pyc files, elaborate mechanisms
have been developed to put the resulting pyc files in non-shared
locations while the source code is still shared.  Examples include the
symlink-based Debian regimes python-support [7]_ and python-central
[8]_.  These approaches make for much more complicated, fragile,
inscrutable, and fragmented policies for delivering Python
applications to a wide range of users.  Arguably more users get Python
from their operating system vendor than from upstream tarballs.  Thus,
solving this pyc sharing problem for CPython is a high priority for
such vendors.

This PEP proposes a solution to this problem.


Proposal


Python's import machinery is extended to search for byte code cache
files in a directory co-located with the source file, but with an
extension 'pyr'.  The pyr directory contains individual files with the
cached byte compilation of the source code, identical to current pyc
and pyo files.  The files inside the pyr directory retain their file
extensions, but the base name is replaced by the hexlified [10]_ magic
number of the Python version the byte code is compatible with.

The file extension pyr was chosen because 'r' is a mnemonic for
'repository', and there appears to be no prior uses of the extension
[9]_.

For example, a module `foo` with source code in `foo.py` and byte
compiled with Python 2.5, Python 2.6, Python 2.6 `-O`, Python 2.6
`-U`, and Python 3.1 would have the following file system layout::

foo.py
foo.pyr/
f2b30a0d.pyc # Python 2.5
f2d10a0d.pyc # Python 2.6
f2d10a0d.pyo # Python 2.6 -O
f2d20a0d.pyc # Python 2.6 -U
0c4f0a0d.pyc # Python 3.1


Python behavior
===

When Python searches for a module to import (say `foo`), it may find
one of several situations.  As per current Python rules, the term
"matching pyc" means that the magic number matches the current
interpreter's magic number, and the source file is not newer than the
`pyc` file.

When Python finds a `foo.py` file for which no `foo.pyc` file or
`foo.pyr` directory exists, Python will by default load the `foo.py`
file and write a `foo.pyc` file next to the source file.  This is
unchanged from current behavior.

When the Python executable is given a `-R` flag, or the environment
variable `$PYTHONPYR` is set, then Python will create a `foo.pyr`
directory and write a `pyc` file to that directory with the hexlified
magic number as the base name.

If during import, Python finds an existing `pyc` file but no `pyr`
directory, and the `$PYTHONPYR` environment variable is not set, then
the `pyc` file is loaded as normal and no `pyr` directory is created.

If during import, Python finds a `pyr` directory with a matching `pyc`
file, *regardless of whether `$P

Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Vitor Bosshard
2010/1/30 Barry Warsaw :
>
> Multiple file extensions
> 
>
> The PEP author also considered an approach where multiple thin byte
> compiled files lived in the same place, but used different file
> extensions to designate the Python version.  E.g. foo.pyc25,
> foo.pyc26, foo.pyc31 etc.  This was rejected because of the clutter
> involved in writing so many different files.  The multiple extension
> approach makes it more difficult (and an ongoing task) to update any
> tools that are dependent on the file extension.
>


Why not:

foo.py
foo.pyc # < 2.7 or < 3.2
foo.27.pyc
foo.32.pyc
etc.


This is simpler and more logical than the current subfolder proposal,
as it is clear which version each file corresponds to. Python can use
all the magic values it wants, but please don't spill them over into
the filesystem. Readability counts.

Putting the files into a separate dir also makes it much harder to
work with external tools; e.g. VCSes already ignore .pyc and .pyo
files, but not unknown directories.

I'd rather have a folder cluttered with files I know I can ignore (and
can easily run a selective rm over) than one that is cluttered with
subfolders.


Vitor
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Ben Finney
Vitor Bosshard  writes:

> foo.py
> foo.pyc # < 2.7 or < 3.2
> foo.27.pyc
> foo.32.pyc
> etc.
>
>
> This is simpler and more logical than the current subfolder proposal,
> as it is clear which version each file corresponds to. Python can use
> all the magic values it wants, but please don't spill them over into
> the filesystem. Readability counts.

+1. From a UI perspective, this is superior to creating new
subdirectories for each module, and also superior to opaque magic-number
filenames.

-- 
 \ “Beware of and eschew pompous prolixity.” —Charles A. Beardsley |
  `\   |
_o__)  |
Ben Finney

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Daniel Stutzbach
On Sat, Jan 30, 2010 at 8:21 PM, Vitor Bosshard  wrote:

> Putting the files into a separate dir also makes it much harder to
> work with external tools; e.g. VCSes already ignore .pyc and .pyo
> files, but not unknown directories.
>

Can't a VCS be configured to ignore a .pyr directory just as easily as it
can be configured to ignore a .pyc file?
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread MRAB

Ben Finney wrote:

Vitor Bosshard  writes:


foo.py
foo.pyc # < 2.7 or < 3.2
foo.27.pyc
foo.32.pyc
etc.


This is simpler and more logical than the current subfolder proposal,
as it is clear which version each file corresponds to. Python can use
all the magic values it wants, but please don't spill them over into
the filesystem. Readability counts.


+1. From a UI perspective, this is superior to creating new
subdirectories for each module, and also superior to opaque magic-number
filenames.


Will there be a guarantee that if there are n digits then the first n-1
are the major version number and the last is the minor version number?

Well, I suppose there is because that's what happens with Python's home
directory, eg. "Python27".

One thing that puzzles me is that the PEP shows that "Python -U" has a
different magic number from "Python", but I can't find any reference to
"-U" (the options appear to be case-sensitive, so presumably it's not
the same as "-u").

If that's not an issue, then +1.
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Nick Coghlan
MRAB wrote:
> One thing that puzzles me is that the PEP shows that "Python -U" has a
> different magic number from "Python", but I can't find any reference to
> "-U" (the options appear to be case-sensitive, so presumably it's not
> the same as "-u").

We deliberate don't document -U because its typical effect is "break the
world" - it makes all strings unicode in 2.x.

You really don't want a -U pyc in a non -U interpreter and vice-versa,
hence the difference magic numbers.

This is also the reason why using the version number in the filename
isn't adequate - the magic number can change due to more than just the
Python version changing. Also, if we don't change the bytecode for a
given release, then multiple versions can use the same magic number.

Hiding these away in an appropriately named subfolder is the nicest way
to handle it without cluttering the source code directory.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Nick Coghlan
Daniel Stutzbach wrote:
> On Sat, Jan 30, 2010 at 8:21 PM, Vitor Bosshard  > wrote:
> 
> Putting the files into a separate dir also makes it much harder to
> work with external tools; e.g. VCSes already ignore .pyc and .pyo
> files, but not unknown directories.
> 
> 
> Can't a VCS be configured to ignore a .pyr directory just as easily as
> it can be configured to ignore a .pyc file?

Yes they can.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Nick Coghlan
Vitor Bosshard wrote:
> Why not:
> 
> foo.py
> foo.pyc # < 2.7 or < 3.2
> foo.27.pyc
> foo.32.pyc
> etc.
> 
> 
> This is simpler and more logical than the current subfolder proposal,
> as it is clear which version each file corresponds to. Python can use
> all the magic values it wants, but please don't spill them over into
> the filesystem. Readability counts.

There is no one-to-one correspondence between Python version and pyc
magic numbers. Different runtime options may change the magic number and
different versions may reuse a magic number

> Putting the files into a separate dir also makes it much harder to
> work with external tools; e.g. VCSes already ignore .pyc and .pyo
> files, but not unknown directories.
> 
> I'd rather have a folder cluttered with files I know I can ignore (and
> can easily run a selective rm over) than one that is cluttered with
> subfolders.

It won't be cluttered with subfolders - you will have at most one .pyr
per source .py file. Even more conveniently, many file browsers already
list subfolders and files separately, so if you never run a version of
Python that uses the old .pyc format you won't even get that level of
clutter.

That is significantly better than accumulating an unbounded number of
different .pyc files interspersed amongst the actual source files I care
about.

The PEP gets a +1 from me.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Vitor Bosshard
2010/1/31 Nick Coghlan :
> Vitor Bosshard wrote:
>> Why not:
>>
>> foo.py
>> foo.pyc # < 2.7 or < 3.2
>> foo.27.pyc
>> foo.32.pyc
>> etc.
>>
>>
>> This is simpler and more logical than the current subfolder proposal,
>> as it is clear which version each file corresponds to. Python can use
>> all the magic values it wants, but please don't spill them over into
>> the filesystem. Readability counts.
>
> There is no one-to-one correspondence between Python version and pyc
> magic numbers. Different runtime options may change the magic number and
> different versions may reuse a magic number

Good point. Runtime options would need to change the version (e.g.
foo.25U.py), and versions that reuse magic numbers would be
redundantly written to disk. However, the underlying issue as I see it
is that the magic value is an implementation detail that should not be
exposed.

>
>> Putting the files into a separate dir also makes it much harder to
>> work with external tools; e.g. VCSes already ignore .pyc and .pyo
>> files, but not unknown directories.
>>
>> I'd rather have a folder cluttered with files I know I can ignore (and
>> can easily run a selective rm over) than one that is cluttered with
>> subfolders.
>
> It won't be cluttered with subfolders - you will have at most one .pyr
> per source .py file. Even more conveniently, many file browsers already
> list subfolders and files separately, so if you never run a version of
> Python that uses the old .pyc format you won't even get that level of
> clutter.

Since those folders would start with arbitrary characters, they'd be
intermingled with "real" subfolders, which is a terrible mess. At
least the current .pyc files are always clustered nicely right next to
their source file.


>
> That is significantly better than accumulating an unbounded number of
> different .pyc files interspersed amongst the actual source files I care
> about.

I do see your point. How about creating a single pyr folder, in which
*all* compiled files of the parent folder go, e.g.
pyr/foo.0c4f0a0d.pyc


Vitor
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Ben Finney
Nick Coghlan  writes:

> This is also the reason why using the version number in the filename
> isn't adequate - the magic number can change due to more than just the
> Python version changing. Also, if we don't change the bytecode for a
> given release, then multiple versions can use the same magic number.
>
> Hiding these away in an appropriately named subfolder is the nicest
> way to handle it without cluttering the source code directory.

Thanks for the explanation. I eagerly await further information to the
contrary, but you've shown that it is at least likely to be better to
hide these files inside a subdirectory.


If a subdirectory is indeed the better solution, can we please ensure
that only *one* such subdirectory is created for the whole tree of
packages, instead of a new subdirectory per module?

In other words, my understanding is that the current PEP would have the
following tree for an example project::

foo/
__init__.py
__init__.pyr/
deadbeef.pyc
decafbad.pyc
lorem.py
lorem.pyr/
deadbeef.pyc
decafbad.pyc
ipsum/
__init__.py
__init__.pyr/
deadbeef.pyc
decafbad.pyc
dolor.py
dolor.pyr/
deadbeef.pyc
decafbad.pyc
sit.py
sit.pyr/
deadbeef.pyc
decafbad.pyc
amet/
__init__.py
__init__.pyr/
deadbeef.pyc
decafbad.pyc
malor.py
malor.pyr/
deadbeef.pyc
decafbad.pyc
wobble.py
wobble.pyr/
deadbeef.pyc
decafbad.pyc
bar/
__init__.py
__init__.pyr/
deadbeef.pyc
decafbad.pyc
wibble.py
wibble.pyr/
deadbeef.pyc
decafbad.pyc
warble/
__init__.py
__init__.pyr/
deadbeef.pyc
decafbad.pyc
wubble.py
wubble.pyr/
deadbeef.pyc
decafbad.pyc
wobble.py
wobble.pyr/
deadbeef.pyc
decafbad.pyc

That's a nightmarish mess of compiled files swamping the source files,
as has been pointed out several times.

Could we instead have a single subdirectory for each tree of module
packages, keeping them tidily out of the way of the source files, while
making them located just as deterministically::

foo/
.pyr/
__init__/
deadbeef.pyc
decafbad.pyc
lorem/
deadbeef.pyc
decafbad.pyc
ipsum/
__init__/
deadbeef.pyc
decafbad.pyc
dolor/
deadbeef.pyc
decafbad.pyc
sit/
deadbeef.pyc
decafbad.pyc
amet/
__init__/
deadbeef.pyc
decafbad.pyc
spam/
deadbeef.pyc
decafbad.pyc
malor/
deadbeef.pyc
decafbad.pyc
wobble/
deadbeef.pyc
decafbad.pyc
__init__.py
lorem.py
ipsum/
__init__.py
dolor.py
sit.py
amet/
__init__.py
spam.py
malor.py
wobble.py
bar/
.pyr/
__init__/
deadbeef.pyc
decafbad.pyc
wibble/
deadbeef.pyc
decafbad.pyc
warble/
__init__/
deadbeef.pyc
decafbad.pyc
wubble/
deadbeef.pyc
decafbad.pyc
wobble/
deadbeef.pyc
decafbad.pyc
__init__.py
wibble.py
warble/
__init__.py
wubble.py
wobble.py

-- 
 \“It is seldom that liberty of any kind is lost all at once.” |
  `\   —David Hume |
_o__)  |
Ben Finney

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Ben Finney
Nick Coghlan  writes:

> It won't be cluttered with subfolders - you will have at most one .pyr
> per source .py file.

If that doesn't meet your threshold of “cluttered with subfolders”, I'm
at a loss for words to think where that threshold might be. It meets,
and exceeds by a long shot, my threshold for subfolder clutter.

Even adding a *single* subfolder in arbitrary directories is an
obnoxious act for a program to do automatically, and is not to be
undertaken lightly. It might be justified in this case, but that doesn't
mean we should open the gates to even more clutter.

-- 
 \  “Rightful liberty is unobstructed action, according to our |
  `\will, within limits drawn around us by the equal rights of |
_o__)   others.” —Thomas Jefferson |
Ben Finney

___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Vitor Bosshard
2010/1/31 Nick Coghlan :

>> Can't a VCS be configured to ignore a .pyr directory just as easily as
>> it can be configured to ignore a .pyc file?
>
> Yes they can.


Of course they can, but not out of the box. It was just an example off
the top of my head.

A trickier case: My GUI app offers scripting facilities. The
associated open file dialog hides all .pyc files, and users select
just from .py files. if subfolders are introduced, however, they can't
be hidden away as easily. Filtering by extension is standard
functionality in all GUI toolkits. Hiding a fraction of subfolders is
not.

The point is that creating large amounts of subfolders goes against
the expectations that python (and pretty much any other program for
that matter) has established. If it's possible to keep in line with
those expectations, it should at least be considered, given that
existing programs could continue to work without needing to change
anything.

If the decision is made that folders will be used (for x y z reason),
then great care should be taken to make the change as minimally
invasive as possible.


Vitor
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Jeffrey Yasskin
On Sat, Jan 30, 2010 at 8:22 PM, Vitor Bosshard  wrote:
> 2010/1/31 Nick Coghlan :
>
>>> Can't a VCS be configured to ignore a .pyr directory just as easily as
>>> it can be configured to ignore a .pyc file?
>>
>> Yes they can.
>
>
> Of course they can, but not out of the box. It was just an example off
> the top of my head.

Mercurial, at least, doesn't ignore .pyc files out of the box either.
I'm sure it will be a terrible hardship for people to add one extra
line to their .ignore files.

> A trickier case: My GUI app offers scripting facilities. The
> associated open file dialog hides all .pyc files, and users select
> just from .py files. if subfolders are introduced, however, they can't
> be hidden away as easily. Filtering by extension is standard
> functionality in all GUI toolkits. Hiding a fraction of subfolders is
> not.

You're saying you can filter files by extension, but you cannot filter
directories by extension?
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Jeffrey Yasskin
+1 overall. I'm certainly not concerned with replacing pyc clutter
with pyr clutter. I do like that you haven't _increased_ the number of
extraneous siblings of .py files.

I have a couple bikesheddy or "why didn't you do this" comments. I'll
be perfectly satisfied with an answer or a line in the pep.

1. Why the -R flag? It seems like this is a uniform improvement, so it
should be the default. Have faith in your design! ;-)

2. Vitor's suggestion to make 1 "pyr" directory per directory and
stick all the .pyc's there would solve the "pyc clutter" problem. Any
reason not to do that? Trying to make it 1-pyr-per-directory-hierarchy
as Ben suggested seems unworkable.  The one problem with this would
seem to be filename length limits; do we care about those anymore?

3. It seems like .pyr directories are nicely forward-compatible with
other uses like version-specific .so's or JIT caches. I don't think
this PEP needs to flesh out any of those other possibilities though.

4. -1 to a moratorium on bytecode changes. No moratorium can last
forever, and then packagers will be back to the same problem. The
rationale for 3003 doesn't seem to apply here.

On Sat, Jan 30, 2010 at 4:00 PM, Barry Warsaw  wrote:
> PEP: 3147
> Title: PYC Repository Directories
> Version: $Revision$
> Last-Modified: $Date$
> Author: Barry Warsaw 
> Status: Draft
> Type: Standards Track
> Content-Type: text/x-rst
> Created: 2009-12-16
> Python-Version: 3.2
> Post-History:
>
>
> Abstract
> 
>
> This PEP describes an extension to Python's import mechanism which
> improves sharing of Python source code files among multiple installed
> different versions of the Python interpreter.  It does this by
> allowing many different byte compilation files (.pyc files) to be
> co-located with the Python source file (.py file).  The extension
> described here can also be used to support different Python
> compilation caches, such as JIT output that may be produced by an
> Unladen Swallow [1]_ enabled C Python.
>
>
> Rationale
> =
>
> Linux distributions such as Ubuntu [2]_ and Debian [3]_ provide more
> than one Python version at the same time to their users.  For example,
> Ubuntu 9.10 Karmic Koala can install Python 2.5, 2.6, and 3.1, with
> Python 2.6 being the default.
>
> In order to ease the burden on operating system packagers for these
> distributions, the distribution packages do not contain Python version
> numbers [4]_; they are shared across all Python versions installed on
> the system.  Putting Python version numbers in the packages would be a
> maintenance nightmare, since all the packages - *and their
> dependencies* - would have to be updated every time a new Python
> release was added or removed from the distribution.  Because of the
> sheer number of packages available, this amount of work is infeasible.
>
> For pure Python modules, sharing is possible because upstream
> maintainers typically support multiple versions of Python in a source
> compatible way.  In practice though, it is well known that pyc files
> are not compatible across Python major releases.  A reading of
> import.c [5]_ in the Python source code proves that within recent
> memory, every new CPython major release has bumped the pyc magic
> number.
>
> Even C extensions can be source compatible across multiple versions of
> Python.  Compiled extension modules are usually not compatible though,
> and PEP 384 [6]_ has been proposed to address this by defining a
> stable ABI for extension modules.
>
> Because the distributions cannot share pyc files, elaborate mechanisms
> have been developed to put the resulting pyc files in non-shared
> locations while the source code is still shared.  Examples include the
> symlink-based Debian regimes python-support [7]_ and python-central
> [8]_.  These approaches make for much more complicated, fragile,
> inscrutable, and fragmented policies for delivering Python
> applications to a wide range of users.  Arguably more users get Python
> from their operating system vendor than from upstream tarballs.  Thus,
> solving this pyc sharing problem for CPython is a high priority for
> such vendors.
>
> This PEP proposes a solution to this problem.
>
>
> Proposal
> 
>
> Python's import machinery is extended to search for byte code cache
> files in a directory co-located with the source file, but with an
> extension 'pyr'.  The pyr directory contains individual files with the
> cached byte compilation of the source code, identical to current pyc
> and pyo files.  The files inside the pyr directory retain their file
> extensions, but the base name is replaced by the hexlified [10]_ magic
> number of the Python version the byte code is compatible with.
>
> The file extension pyr was chosen because 'r' is a mnemonic for
> 'repository', and there appears to be no prior uses of the extension
> [9]_.
>
> For example, a module `foo` with source code in `foo.py` and byte
> compiled with Python 2.5, Python 2.6, Python 2.6 `-O`, Pyth

Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread R. David Murray
On Sat, 30 Jan 2010 19:00:05 -0500, Barry Warsaw  wrote:
> Linux distributions such as Ubuntu [2]_ and Debian [3]_ provide more
> than one Python version at the same time to their users.  For example,
> Ubuntu 9.10 Karmic Koala can install Python 2.5, 2.6, and 3.1, with
> Python 2.6 being the default.
> 
> In order to ease the burden on operating system packagers for these
> distributions, the distribution packages do not contain Python version
> numbers [4]_; they are shared across all Python versions installed on
> the system.  Putting Python version numbers in the packages would be a
> maintenance nightmare, since all the packages - *and their
> dependencies* - would have to be updated every time a new Python
> release was added or removed from the distribution.  Because of the
> sheer number of packages available, this amount of work is infeasible.

As a non-Debian user (I'm a Gentoo user), the above doesn't enlighten me,
even after skimming the referenced document.  Perhaps an example would
be helpful?

I'm also not sure how it motivates the PEP.  How would putting the Python
version numbers in the package names be an alternate solution to the
problem the PEP is trying to address?

Since Ubuntu is based on Debian, some discussion of what challenges
non-Debian distributions perceive, and what partial solutions they've
crafted, if any, would be a good addition to the motivation section.
(FYI, Gentoo just installs the pyc files into each of the installed
Python's site-packages that is supported by the package in question...disk
space is relatively cheap.)

> extension 'pyr'.  The pyr directory contains individual files with the
> cached byte compilation of the source code, identical to current pyc
> and pyo files.  The files inside the pyr directory retain their file
> extensions, but the base name is replaced by the hexlified [10]_ magic
> number of the Python version the byte code is compatible with.
[...]
> For example, a module `foo` with source code in `foo.py` and byte
> compiled with Python 2.5, Python 2.6, Python 2.6 `-O`, Python 2.6
> `-U`, and Python 3.1 would have the following file system layout::
> 
> foo.py
> foo.pyr/
> f2b30a0d.pyc # Python 2.5
> f2d10a0d.pyc # Python 2.6
> f2d10a0d.pyo # Python 2.6 -O
> f2d20a0d.pyc # Python 2.6 -U
> 0c4f0a0d.pyc # Python 3.1

This may be a bit silly, but I find the hexified magic numbers ugly
and uninformative.  I'd rather see version numbers in there.  I realize
there isn't a one to one correspondence, though, so perhaps there isn't
a more readable alternative.

> Effects on non-conforming Python versions
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> 
> Python implementations which don't know anything about `pyr`
> directories will ignore them.  This means that they will read and
> write `pyc` files as usual.  A conforming implementation will still
> prefer any existing `foo.pyr/.pyc` file over an existing
> sibling `pyc` file.
> 
> The one possible conflicting state is where a sibling `pyc` file
> exists, but its magic number does not match.
>
> In the default case, when Python finds a `pyc` file with a
> non-matching magic number, it simply overwrites the `pyc` file with
> the new byte code and magic number.  In the absence of the `-R` flag,
> this remains unchanged.  When the `-R` flag was given, the
> non-matching sibling `pyc` file is ignored - it is neither removed nor
> overwritten - and a `foo.pyr/.pyc` file is written instead.

Shouldn't most of this discussion be in the section on the general
algorithm that is to be implemented?  The discussion of the effect on
non-conforming versions should then talk about what happens on a system
that has both non-conforming and conforming versions under various
scenarios, which I don't think the section as written really makes clear.

> The implementation of this PEP would have to ensure that the same
> directory level is returned from `__file__` as it does without the
> `pyr` directory, so that the common idiom above continues to work::
> 
> >>> import foo
> >>> foo.__file__
> 'foo.pyr'
> # baz is a package
> >>> import baz
> >>> baz.__file__
> 'baz/__init__.pyr'

This requirement should be in the specification portion, not in the
"alternatives" section.

> Note that some existing Python code only checks for `.py` and `.pyc`
> file extensions (and possibly `.pyo`).  These would have to be
> extended to also check for `.pyr` extensions.

Some of that code is in the stdlib :)

Perhaps it is time to add a library function that returns all the
extensions that compiled python files might take, regardless of whether
or not this PEP gets accepted.

> Multiple file extensions
> 
> 
> The PEP author also considered an approach where multiple thin byte
> compiled files lived in the same place, but used different file
> extensions to 

Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread R. David Murray
On Sat, 30 Jan 2010 20:37:32 -0800, Jeffrey Yasskin  wrote:
> On Sat, Jan 30, 2010 at 8:22 PM, Vitor Bosshard  wrote:
> > A trickier case: My GUI app offers scripting facilities. The
> > associated open file dialog hides all .pyc files, and users select
> > just from .py files. if subfolders are introduced, however, they can't
> > be hidden away as easily. Filtering by extension is standard
> > functionality in all GUI toolkits. Hiding a fraction of subfolders is
> > not.
> 
> You're saying you can filter files by extension, but you cannot filter
> directories by extension?

I would not be at all surprised to learn that filtering folders by
extension is not something GUI file managers usually support.

--David
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Nick Coghlan
Ben Finney wrote:
> Could we instead have a single subdirectory for each tree of module
> packages, keeping them tidily out of the way of the source files, while
> making them located just as deterministically::

Not easily. With the scheme currently proposed in the PEP, setting a
value for __file__ which is both reasonably accurate and backwards
compatible with existing file manipulation techniques is
straightforward: just use the name of the cache directory.

With a parallel tree, either __file__ will bear little relation to the
actual location of the cache files or it won't be backwards compatible
with existing usage.

Keep in mind that if this PEP becomes the norm, the level of readily
visible clutter will actually be reduced from "foo.py, foo.pyc, foo.pyo"
to just "foo.py, foo.pyr" (with the file/directory split making it even
easier to separate out the source file from the compiled cache files).

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Nick Coghlan
Vitor Bosshard wrote:
>> There is no one-to-one correspondence between Python version and pyc
>> magic numbers. Different runtime options may change the magic number and
>> different versions may reuse a magic number
> 
> Good point. Runtime options would need to change the version (e.g.
> foo.25U.py), and versions that reuse magic numbers would be
> redundantly written to disk. However, the underlying issue as I see it
> is that the magic value is an implementation detail that should not be
> exposed.

I think this is actually be a good point - while there needs to be a
shared namespace to allow different Python implementations to avoid
stepping on each others toes, CPython's bytecode compatibility magic
number may not be the best choice as the distinguishing identifier.

It may be better to give the magic numbers a meaningful corresponding
string, such that the filenames would be more like:

foo.py
foo.pyr/
  cpython-25.pyc
  cpython-25U.pyc
  cpython-27.pyc
  cpython-27U.pyc
  cpython-32.pyc
  unladen-011.pyc
  wpython-11.pyc

If we don't change the bytecode for a given Python version, then the
name of the bytecode format used wouldn't change either.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Nick Coghlan
Ben Finney wrote:
> Nick Coghlan  writes:
> 
>> It won't be cluttered with subfolders - you will have at most one .pyr
>> per source .py file.
> 
> If that doesn't meet your threshold of “cluttered with subfolders”, I'm
> at a loss for words to think where that threshold might be. It meets,
> and exceeds by a long shot, my threshold for subfolder clutter.
> 
> Even adding a *single* subfolder in arbitrary directories is an
> obnoxious act for a program to do automatically, and is not to be
> undertaken lightly. It might be justified in this case, but that doesn't
> mean we should open the gates to even more clutter.

I think our key difference of opinion on this point is that I don't see
any significant difference between cluttering a directory with
automatically created files (which Python has done for years) and
cluttering it with automatically created folders (which the PEP proposes).

In this particular case, I actually see it as something an improvement,
given that the proposed number of folders is half the maximum number of
files that may currently be generated.

However, a question the PEP should consider is whether or not these
folders should be given an initial dot in their filenames as well as
being explicitly flagged as hidden on filesystems that support that
indicator.

Cheers,
Nick.

-- 
Nick Coghlan   |   [email protected]   |   Brisbane, Australia
---
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3147: PYC Repository Directories

2010-01-30 Thread Henning von Bargen

I like the idea of the PEP.
On the other hand, I dislike using directories for it.
Others have explained enough reasons for why creating many
directories is a bad idea; and there may be other reasons
(file-system limits for number of directories, problems when
the directories are located on the network).

The solution is so obvious:

Why not use a .pyr file that is internally a zip file?


Henning
___
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com