Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Barry Warsaw
On Feb 01, 2016, at 11:40 AM, R. David Murray wrote:

>Well, Brett said it would be optional, though perhaps the above
>paragraph is asking about doing it in our Windows build.  But the linux
>distros might make also use the option if it exists, so the question is
>very meaningful.  However, you'd have to ask the distro if the source
>would be shipped in the linux case, and I'd guess not in most cases.

It's very likely the .py files would still be shipped, but perhaps in a -dev
package that isn't normally installed.

>I don't know about anyone else, but on my own development systems it is
>not that unusual for me to *edit* the stdlib files (to add debug prints)
>while debugging my own programs.  Freeze would definitely interfere with
>that.  I could, of course, install a separate source build on my dev
>system, but I thought it worth mentioning as a factor.

I do this too, though usually in a VM or chroot and not in my live system.  A
very common situation for me though is pdb stepping through my own code and
landing in -or passing through- stdlib.

>On the other hand, if the distros go the way Nick has (I think) been
>advocating, and have a separate 'system python for system scripts' that
>is independent of the one installed for user use, having the system-only
>python be frozen and sourceless would actually make sense on a couple of
>levels.

Yep, we've talked about it in Debian-land too, but never quite gotten around
to doing anything.  Certainly I'd like to see some consistency among Linux
distros there (i.e. discussed on linux-sig@).

But even with system scripts, I do need to step through them occasionally.  If
it were a matter of changing a shebang or invoking the script with a different
Python (e.g. /usr/bin/python3s vs. /usr/bin/python3) to get the full unpacked
source, that would be fine.

Cheers,
-Barry
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Yury Selivanov



On 2016-01-29 11:28 PM, Steven D'Aprano wrote:

On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote:

Hi,


tl;dr The summary is that I have a patch that improves CPython
performance up to 5-10% on macro benchmarks.  Benchmarks results on
Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are available
at [1].  There are no slowdowns that I could reproduce consistently.

Have you looked at Cesare Di Mauro's wpython? As far as I know, it's now
unmaintained, and the project repo on Google Code appears to be dead (I
get a 404), but I understand that it was significantly faster than
CPython back in the 2.6 days.

https://wpython.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf




Thanks for bringing this up!

IIRC wpython was about using "fat" bytecodes, i.e. using 64bits per 
bytecode instead of 8.  That allows to minimize the number of bytecodes, 
thus having some performance increase.  TBH, I don't think it was 
"significantly faster".


If I were to do some big refactoring of the ceval loop, I'd probably 
consider implementing a register VM.  While register VMs are a bit 
faster than stack VMs (up to 20-30%), they would also allow us to apply 
more optimizations, and even bolt on a simple JIT compiler.


Yury
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Brett Cannon
On Mon, 1 Feb 2016 at 08:48 R. David Murray  wrote:

> On Mon, 01 Feb 2016 14:12:27 +1100, Steven D'Aprano 
> wrote:
> > On Sun, Jan 31, 2016 at 08:23:00PM +, Brett Cannon wrote:
> > > So freezing the stdlib helps on UNIX and not on OS X (if my old
> testing is
> > > still accurate). I guess the next question is what it does on Windows
> and
> > > if we would want to ever consider freezing the stdlib as part of the
> build
> > > process (and if we would want to change the order of importers on
> > > sys.meta_path so frozen modules came after file-based ones).
> >
> > I find that being able to easily open stdlib .py files in a text editor
> > to read the source is extremely valuable. I've learned much more from
> > reading the source than from (e.g.) StackOverflow. Likewise, it's often
> > handy to do a grep over the stdlib. When you talk about freezing the
> > stdlib, what exactly does that mean?
> >
> > - will the source files still be there?
>
> Well, Brett said it would be optional, though perhaps the above
> paragraph is asking about doing it in our Windows build.


Nope, it would probably need to be across all OSs to have consistent
semantics.


>   But the linux
> distros might make also use the option if it exists, so the question is
> very meaningful.  However, you'd have to ask the distro if the source
> would be shipped in the linux case, and I'd guess not in most cases.
>
> I don't know about anyone else, but on my own development systems it is
> not that unusual for me to *edit* the stdlib files (to add debug prints)
> while debugging my own programs.  Freeze would definitely interfere with
> that.  I could, of course, install a separate source build on my dev
> system, but I thought it worth mentioning as a factor.
>

This is what would need to be discussed in terms of how to handle this. For
instance, we already do stuff in (I believe) site.py when we detect the
build is in a checkout, so we could in that instance make sure the stdlib
file directory takes precedence over any frozen code (hence why I wondered
if the frozen importer on sys.meta_path should come after the sys.path
importer). If we did that then we could make installing the stdlib files
optional but still take precedence.

It's all workable, it's just a question of if we want to. This is why I
think we should get concrete benchmark numbers on Windows, Linux, and OS X
to see if this is even worth considering as something we provide in our own
binaries.


>
> On the other hand, if the distros go the way Nick has (I think) been
> advocating, and have a separate 'system python for system scripts' that
> is independent of the one installed for user use, having the system-only
> python be frozen and sourceless would actually make sense on a couple of
> levels.
>

It at least wouldn't hurt anything.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Yury Selivanov

Hi Brett,

On 2016-02-01 12:18 PM, Brett Cannon wrote:


On Mon, 1 Feb 2016 at 09:08 Yury Selivanov > wrote:



[..]

If I were to do some big refactoring of the ceval loop, I'd probably
consider implementing a register VM.  While register VMs are a bit
faster than stack VMs (up to 20-30%), they would also allow us to
apply
more optimizations, and even bolt on a simple JIT compiler.


[..]

As for bolting on a JIT, the whole point of Pyjion is to see if that's 
worth it for CPython, so that's already being taken care of (and is 
actually easier with a stack-based VM since the JIT engine we're using 
is stack-based itself).


Sure, I have very high hopes for Pyjion and Pyston.  I really hope that 
Microsoft and Dropbox will keep pushing.


Yury
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread mike . romberg
> " " == Barry Warsaw  writes:

>> On Feb 01, 2016, at 11:40 AM, R. David Murray wrote:

>> I don't know about anyone else, but on my own development
>> systems it is not that unusual for me to *edit* the stdlib
>> files (to add debug prints) while debugging my own programs.
>> Freeze would definitely interfere with that.  I could, of
>> course, install a separate source build on my dev system, but I
>> thought it worth mentioning as a factor.

   [snip]

 > But even with system scripts, I do need to step through them
 > occasionally.  If it were a matter of changing a shebang or
 > invoking the script with a different Python
 > (e.g. /usr/bin/python3s vs. /usr/bin/python3) to get the full
 > unpacked source, that would be fine.

  If the stdlib were to use implicit namespace packages
( https://www.python.org/dev/peps/pep-0420/ ) and the various
loaders/importers as well, then python could do what I've done with an
embedded python application for years.  Freeze the stdlib (or put it
in a zipfile or whatever is fast).  Then arrange PYTHONPATH to first
look on the filesystem and then look in the frozen/ziped storage.

  Normally the filesystem part is empty.   So, modules are loaded from
the frozen/zip area.  But if you wanna override one of the frozen
modules simply copy one or more .py files onto the file system.  I've
been doing this only with modules in the global scope.  But implicit
namespace packages seem to open the door for this with packages.

Mike
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Sven R. Kunze

On 01.02.2016 17:54, Yury Selivanov wrote:
If I were to do some big refactoring of the ceval loop, I'd probably 
consider implementing a register VM.  While register VMs are a bit 
faster than stack VMs (up to 20-30%), they would also allow us to 
apply more optimizations, and even bolt on a simple JIT compiler.


How do JIT and register machine related to each other? :)


Best,
Sven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread R. David Murray
On Mon, 01 Feb 2016 14:12:27 +1100, Steven D'Aprano  wrote:
> On Sun, Jan 31, 2016 at 08:23:00PM +, Brett Cannon wrote:
> > So freezing the stdlib helps on UNIX and not on OS X (if my old testing is
> > still accurate). I guess the next question is what it does on Windows and
> > if we would want to ever consider freezing the stdlib as part of the build
> > process (and if we would want to change the order of importers on
> > sys.meta_path so frozen modules came after file-based ones).
> 
> I find that being able to easily open stdlib .py files in a text editor 
> to read the source is extremely valuable. I've learned much more from 
> reading the source than from (e.g.) StackOverflow. Likewise, it's often 
> handy to do a grep over the stdlib. When you talk about freezing the 
> stdlib, what exactly does that mean?
> 
> - will the source files still be there?

Well, Brett said it would be optional, though perhaps the above
paragraph is asking about doing it in our Windows build.  But the linux
distros might make also use the option if it exists, so the question is
very meaningful.  However, you'd have to ask the distro if the source
would be shipped in the linux case, and I'd guess not in most cases.

I don't know about anyone else, but on my own development systems it is
not that unusual for me to *edit* the stdlib files (to add debug prints)
while debugging my own programs.  Freeze would definitely interfere with
that.  I could, of course, install a separate source build on my dev
system, but I thought it worth mentioning as a factor.

On the other hand, if the distros go the way Nick has (I think) been
advocating, and have a separate 'system python for system scripts' that
is independent of the one installed for user use, having the system-only
python be frozen and sourceless would actually make sense on a couple of
levels.

--David
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Sven R. Kunze

On 01.02.2016 19:28, Brett Cannon wrote:
A search for [stack vs register based virtual machine] will get you 
some information.


Alright. :) Will go for that.

You aren't really supposed to yet. :) In Pyjion's case we are still 
working on compatibility, let alone trying to show a speed improvement 
so we have not said much beyond this mailing list (we have a talk 
proposal in for PyCon US that we hope gets accepted). We just happened 
to get picked up on Reddit and HN recently and so interest has spiked 
in the project.


Exciting. :)



So, it could be that we will see a jitted CPython when Pyjion
appears to be successful?


The ability to plug in a JIT, but yes, that's the hope.


Okay. Not sure what you mean by plugin. One thing I like about Python is 
that it just works. So, plugin sounds like unnecessary work.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov

Hi,

This is the second email thread I start regarding implementing an opcode 
cache in ceval loop.  Since my first post on this topic:


- I've implemented another optimization (LOAD_ATTR);

- I've added detailed statistics mode so that I can "see" how the cache 
performs and tune it;


- some macro benchmarks are now 10-20% faster; 2to3 (a real application) 
is 7-8% faster;


- and I have some good insights on the memory footprint.

** The purpose of this email is to get a general approval from 
python-dev, so that I can start polishing the patches and getting them 
reviewed/committed. **



Summary of optimizations


When a code object is executed more than ~1000 times, it's considered 
"hot".  It gets its opcodes analyzed to initialize caches for 
LOAD_METHOD (a new opcode I propose to add in [1]), LOAD_ATTR, and 
LOAD_GLOBAL.


It's important to only optimize code objects that were executed "enough" 
times, to avoid optimizing code objects for modules, classes, and 
functions that were imported but never used.


The cache struct is defined in code.h [2], and is 32 bytes long. When a 
code object becomes hot, it gets an cache offset table allocated for it 
(+1 byte for each opcode) + an array of cache structs.


To measure the max/average memory impact, I tuned my code to optimize 
*every* code object on *first* run.  Then I ran the entire Python test 
suite.  Python test suite + standard library both contain around 72395 
code objects, which required 20Mb of memory for caches.  The test 
process consumed around 400Mb of memory.  Thus, the absolute worst case 
scenario, the overhead is about 5%.


Then I ran the test suite without any modifications to the patch. This 
means that only code objects that are called frequently enough are 
optimized.  In this more, only 2072 code objects were optimized, using 
less than 1Mb of memory for the cache.



LOAD_ATTR
-

Damien George mentioned that they optimize a lot of dict lookups in 
MicroPython by memorizing last key/value offset in the dict object, thus 
eliminating lots of hash lookups.  I've implemented this optimization in 
my patch.  The results are quite good.  A simple micro-benchmark [3] 
shows ~30% speed improvement.  Here are some debug stats generated by 
2to3 benchmark:


-- Opcode cache LOAD_ATTR hits = 14778415 (83%)
-- Opcode cache LOAD_ATTR misses   = 750 (0%)
-- Opcode cache LOAD_ATTR opts = 282
-- Opcode cache LOAD_ATTR deopts   = 60
-- Opcode cache LOAD_ATTR total= 1912

Each "hit" makes LOAD_ATTR about 30% faster.


LOAD_GLOBAL
---

This turned out to be a very stable optimization.  Here is the debug 
output of the 2to3 test:


-- Opcode cache LOAD_GLOBAL hits   = 3940647 (100%)
-- Opcode cache LOAD_GLOBAL misses = 0 (0%)
-- Opcode cache LOAD_GLOBAL opts   = 252

All benchmarks (and real code) have stats like that.  Globals and 
builtins are very rarely modified, so the cache works really well.  With 
LOAD_GLOBAL opcode cache, global lookup is very cheap, there is no hash 
lookup for it at all.  It makes optimizations like "def foo(len=len)" 
obsolete.



LOAD_METHOD
---

This is a new opcode I propose to add in [1].  The idea is to substitute 
LOAD_ATTR with it, and avoid instantiation of BoundMethod objects.


With the cache, we can store a reference to the method descriptor (I use 
type->tp_version_tag for cache invalidation, the same thing 
_PyType_Lookup is built around).


The cache makes LOAD_METHOD really efficient.  A simple micro-benchmark 
like [4], shows that with the cache and LOAD_METHOD, 
"s.startswith('abc')" becomes as efficient as "s[:3] == 'abc'".


LOAD_METHOD/CALL_FUNCTION without cache is about 20% faster than 
LOAD_ATTR/CALL_FUNCTION.  With the cache, it's about 30% faster.


Here's the debug output of the 2to3 benchmark:

-- Opcode cache LOAD_METHOD hits   = 5164848 (64%)
-- Opcode cache LOAD_METHOD misses = 12 (0%)
-- Opcode cache LOAD_METHOD opts   = 94
-- Opcode cache LOAD_METHOD deopts = 12
-- Opcode cache LOAD_METHOD dct-chk= 1614801
-- Opcode cache LOAD_METHOD total  = 7945954


What's next?


First, I'd like to merge the new LOAD_METHOD opcode, see issue 26110 
[1].  It's a very straightforward optimization, the patch is small and 
easy to review.


Second, I'd like to merge the new opcode cache, see issue 26219 [5].  
All unittests pass.  Memory usage increase is very moderate (<1mb for 
the entire test suite), and the performance increase is significant.  
The only potential blocker for this is PEP 509 approval (which I'd be 
happy to assist with).


What do you think?

Thanks,
Yury


[1] http://bugs.python.org/issue26110
[2] https://github.com/1st1/cpython/blob/opcache5/Include/code.h#L10
[3] https://gist.github.com/1st1/37d928f1e84813bf1c44
[4] https://gist.github.com/1st1/10588e6e11c4d7c19445
[5] http://bugs.python.org/issue26219

___
Python-Dev mailing list
[email protected]
http

Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Brett Cannon
On Mon, 1 Feb 2016 at 09:08 Yury Selivanov  wrote:

>
>
> On 2016-01-29 11:28 PM, Steven D'Aprano wrote:
> > On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote:
> >> Hi,
> >>
> >>
> >> tl;dr The summary is that I have a patch that improves CPython
> >> performance up to 5-10% on macro benchmarks.  Benchmarks results on
> >> Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are available
> >> at [1].  There are no slowdowns that I could reproduce consistently.
> > Have you looked at Cesare Di Mauro's wpython? As far as I know, it's now
> > unmaintained, and the project repo on Google Code appears to be dead (I
> > get a 404), but I understand that it was significantly faster than
> > CPython back in the 2.6 days.
> >
> >
> https://wpython.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf
> >
> >
>
> Thanks for bringing this up!
>
> IIRC wpython was about using "fat" bytecodes, i.e. using 64bits per
> bytecode instead of 8.  That allows to minimize the number of bytecodes,
> thus having some performance increase.  TBH, I don't think it was
> "significantly faster".
>
> If I were to do some big refactoring of the ceval loop, I'd probably
> consider implementing a register VM.  While register VMs are a bit
> faster than stack VMs (up to 20-30%), they would also allow us to apply
> more optimizations, and even bolt on a simple JIT compiler.
>

If you did tackle the register VM approach that would also settle a
long-standing question of whether a certain optimization works for Python.

As for bolting on a JIT, the whole point of Pyjion is to see if that's
worth it for CPython, so that's already being taken care of (and is
actually easier with a stack-based VM since the JIT engine we're using is
stack-based itself).
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Ethan Furman

On 02/01/2016 08:40 AM, R. David Murray wrote:

On Mon, 01 Feb 2016 14:12:27 +1100, Steven D'Aprano wrote:



I find that being able to easily open stdlib .py files in a text editor
to read the source is extremely valuable. I've learned much more from
reading the source than from (e.g.) StackOverflow. Likewise, it's often
handy to do a grep over the stdlib. When you talk about freezing the
stdlib, what exactly does that mean?

- will the source files still be there?


Well, Brett said it would be optional, though perhaps the above
paragraph is asking about doing it in our Windows build.  But the linux
distros might make also use the option if it exists, so the question is
very meaningful.  However, you'd have to ask the distro if the source
would be shipped in the linux case, and I'd guess not in most cases.

I don't know about anyone else, but on my own development systems it is
not that unusual for me to *edit* the stdlib files (to add debug prints)
while debugging my own programs.  Freeze would definitely interfere with
that.  I could, of course, install a separate source build on my dev
system, but I thought it worth mentioning as a factor.


Yup, so do I.



On the other hand, if the distros go the way Nick has (I think) been
advocating, and have a separate 'system python for system scripts' that
is independent of the one installed for user use, having the system-only
python be frozen and sourceless would actually make sense on a couple of
levels.


Agreed.

--
~Ethan~
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Brett Cannon
On Mon, 1 Feb 2016 at 11:11 Yury Selivanov  wrote:

> Hi,
>
> This is the second email thread I start regarding implementing an opcode
> cache in ceval loop.  Since my first post on this topic:
>
> - I've implemented another optimization (LOAD_ATTR);
>
> - I've added detailed statistics mode so that I can "see" how the cache
> performs and tune it;
>
> - some macro benchmarks are now 10-20% faster; 2to3 (a real application)
> is 7-8% faster;
>
> - and I have some good insights on the memory footprint.
>
> ** The purpose of this email is to get a general approval from
> python-dev, so that I can start polishing the patches and getting them
> reviewed/committed. **
>
>
> Summary of optimizations
> 
>
> When a code object is executed more than ~1000 times, it's considered
> "hot".  It gets its opcodes analyzed to initialize caches for
> LOAD_METHOD (a new opcode I propose to add in [1]), LOAD_ATTR, and
> LOAD_GLOBAL.
>
> It's important to only optimize code objects that were executed "enough"
> times, to avoid optimizing code objects for modules, classes, and
> functions that were imported but never used.
>
> The cache struct is defined in code.h [2], and is 32 bytes long. When a
> code object becomes hot, it gets an cache offset table allocated for it
> (+1 byte for each opcode) + an array of cache structs.
>
> To measure the max/average memory impact, I tuned my code to optimize
> *every* code object on *first* run.  Then I ran the entire Python test
> suite.  Python test suite + standard library both contain around 72395
> code objects, which required 20Mb of memory for caches.  The test
> process consumed around 400Mb of memory.  Thus, the absolute worst case
> scenario, the overhead is about 5%.
>
> Then I ran the test suite without any modifications to the patch. This
> means that only code objects that are called frequently enough are
> optimized.  In this more, only 2072 code objects were optimized, using
> less than 1Mb of memory for the cache.
>
>
> LOAD_ATTR
> -
>
> Damien George mentioned that they optimize a lot of dict lookups in
> MicroPython by memorizing last key/value offset in the dict object, thus
> eliminating lots of hash lookups.  I've implemented this optimization in
> my patch.  The results are quite good.  A simple micro-benchmark [3]
> shows ~30% speed improvement.  Here are some debug stats generated by
> 2to3 benchmark:
>
> -- Opcode cache LOAD_ATTR hits = 14778415 (83%)
> -- Opcode cache LOAD_ATTR misses   = 750 (0%)
> -- Opcode cache LOAD_ATTR opts = 282
> -- Opcode cache LOAD_ATTR deopts   = 60
> -- Opcode cache LOAD_ATTR total= 1912
>
> Each "hit" makes LOAD_ATTR about 30% faster.
>
>
> LOAD_GLOBAL
> ---
>
> This turned out to be a very stable optimization.  Here is the debug
> output of the 2to3 test:
>
> -- Opcode cache LOAD_GLOBAL hits   = 3940647 (100%)
> -- Opcode cache LOAD_GLOBAL misses = 0 (0%)
> -- Opcode cache LOAD_GLOBAL opts   = 252
>
> All benchmarks (and real code) have stats like that.  Globals and
> builtins are very rarely modified, so the cache works really well.  With
> LOAD_GLOBAL opcode cache, global lookup is very cheap, there is no hash
> lookup for it at all.  It makes optimizations like "def foo(len=len)"
> obsolete.
>
>
> LOAD_METHOD
> ---
>
> This is a new opcode I propose to add in [1].  The idea is to substitute
> LOAD_ATTR with it, and avoid instantiation of BoundMethod objects.
>
> With the cache, we can store a reference to the method descriptor (I use
> type->tp_version_tag for cache invalidation, the same thing
> _PyType_Lookup is built around).
>
> The cache makes LOAD_METHOD really efficient.  A simple micro-benchmark
> like [4], shows that with the cache and LOAD_METHOD,
> "s.startswith('abc')" becomes as efficient as "s[:3] == 'abc'".
>
> LOAD_METHOD/CALL_FUNCTION without cache is about 20% faster than
> LOAD_ATTR/CALL_FUNCTION.  With the cache, it's about 30% faster.
>
> Here's the debug output of the 2to3 benchmark:
>
> -- Opcode cache LOAD_METHOD hits   = 5164848 (64%)
> -- Opcode cache LOAD_METHOD misses = 12 (0%)
> -- Opcode cache LOAD_METHOD opts   = 94
> -- Opcode cache LOAD_METHOD deopts = 12
> -- Opcode cache LOAD_METHOD dct-chk= 1614801
> -- Opcode cache LOAD_METHOD total  = 7945954
>
>
> What's next?
> 
>
> First, I'd like to merge the new LOAD_METHOD opcode, see issue 26110
> [1].  It's a very straightforward optimization, the patch is small and
> easy to review.


+1 from me.


>
> Second, I'd like to merge the new opcode cache, see issue 26219 [5].
> All unittests pass.  Memory usage increase is very moderate (<1mb for
> the entire test suite), and the performance increase is significant.
> The only potential blocker for this is PEP 509 approval (which I'd be
> happy to assist with).
>

I think the fact that it improves performance across the board as well as
eliminates the various tricks people use to cache global and built-ins, a
big +1 from me. I g

Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Sven R. Kunze
Thanks, Brett. Wasn't aware of lazy imports as well. I think that one is 
even better reducing startup time as freezing stdlib.


On 31.01.2016 18:57, Brett Cannon wrote:
I have opened http://bugs.python.org/issue26252 to track writing the 
example (and before ppl go playing with the lazy loader, be aware of 
http://bugs.python.org/issue26186).


On Sun, 31 Jan 2016 at 09:26 Brett Cannon > wrote:


There are no example docs for it yet, but enough people have asked
this week about how to set up a custom importer that I will write
up a generic example case which will make sense for a lazy loader
(need to file the issue before I forget).


On Sun, 31 Jan 2016, 09:11 Donald Stufft mailto:[email protected]>> wrote:



On Jan 31, 2016, at 12:02 PM, Brett Cannon mailto:[email protected]>> wrote:

A lazy importer was added in Python 3.5


Is there any docs on how to actually use the LazyLoader in
3.5? I can’t seem to find any but I don’t really know the
import system that well.

-
Donald Stufft
PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F
6E3C BCE9 3372 DCFA



___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/srkunze%40mail.de


___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Nikolaus Rath

On Feb 01 2016, [email protected] wrote:
" " == Barry Warsaw  writes: 


>> On Feb 01, 2016, at 11:40 AM, R. David Murray wrote: 
 
>> I don't know about anyone else, but on my own development 
>> systems it is not that unusual for me to *edit* the 
>> stdlib files (to add debug prints) while debugging my own 
>> programs.  Freeze would definitely interfere with that. 
>> I could, of course, install a separate source build on my 
>> dev system, but I thought it worth mentioning as a 
>> factor. 

   [snip] 

 > But even with system scripts, I do need to step through 
 > them occasionally.  If it were a matter of changing a 
 > shebang or invoking the script with a different Python 
 > (e.g. /usr/bin/python3s vs. /usr/bin/python3) to get the 
 > full unpacked source, that would be fine. 

  If the stdlib were to use implicit namespace packages 
( https://www.python.org/dev/peps/pep-0420/ ) and the various 
loaders/importers as well, then python could do what I've done 
with an embedded python application for years.  Freeze the 
stdlib (or put it in a zipfile or whatever is fast).  Then 
arrange PYTHONPATH to first look on the filesystem and then look 
in the frozen/ziped storage.


Presumably that would eliminate the performance advantages of the 
frozen/zipped storage because now Python would still have to issue 
all the stat calls to first check for the existence of a .py file.



Best,
-Nikolaus

(No Cc on replies please, I'm reading the list)
--
GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F
Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

»Time flies like an arrow, fruit flies like a Banana.«
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Brett Cannon
On Mon, 1 Feb 2016 at 12:16 Yury Selivanov  wrote:

> Brett,
>
> On 2016-02-01 3:08 PM, Brett Cannon wrote:
> >
> >
> > On Mon, 1 Feb 2016 at 11:51 Yury Selivanov  > > wrote:
> >
> > Hi Brett,
> >
> [..]
> >
> >
> > The first two fields are used to make sure that we have objects of
> the
> > same type.  If it changes, we deoptimize the opcode immediately.
> Then
> > we try the offset.  If it's successful - we have a cache hit.  If
> not,
> > that's fine, we'll try another few times before deoptimizing the
> > opcode.
> >
> >
> > So this is a third "next step" that has its own issue?
>
> It's all in issue http://bugs.python.org/issue26219 right now.
>
> My current plan is to implement LOAD_METHOD/CALL_METHOD (just opcodes,
> no cache) in 26110.
>
> Then implement caching for LOAD_METHOD, LOAD_GLOBAL, and LOAD_ATTR in
> 26219.  I'm flexible to break down 26219 in three separate issues if
> that helps the review process (but that would take more of my time):
>
> - implement support for opcode caching (general infrastructure) +
> LOAD_GLOBAL optimization
> - LOAD_METHOD optimization
> - LOAD_ATTR optimization
>

I personally don't care how you break it down, just trying to keep all the
moving pieces in my head. :)

Anyway, it sounds like PEP 509 is blocking part of it, but the LOAD_METHOD
stuff can go in as-is. So are you truly blocked only on getting the latest
version of that patch up to http://bugs.python.org/issue26110 and getting a
code review?
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Sven R. Kunze



On 01.02.2016 18:18, Brett Cannon wrote:



On Mon, 1 Feb 2016 at 09:08 Yury Selivanov > wrote:




On 2016-01-29 11:28 PM, Steven D'Aprano wrote:
> On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote:
>> Hi,
>>
>>
>> tl;dr The summary is that I have a patch that improves CPython
>> performance up to 5-10% on macro benchmarks. Benchmarks results on
>> Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are
available
>> at [1].  There are no slowdowns that I could reproduce
consistently.
> Have you looked at Cesare Di Mauro's wpython? As far as I know,
it's now
> unmaintained, and the project repo on Google Code appears to be
dead (I
> get a 404), but I understand that it was significantly faster than
> CPython back in the 2.6 days.
>
>

https://wpython.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf
>
>

Thanks for bringing this up!

IIRC wpython was about using "fat" bytecodes, i.e. using 64bits per
bytecode instead of 8.  That allows to minimize the number of
bytecodes,
thus having some performance increase.  TBH, I don't think it was
"significantly faster".

If I were to do some big refactoring of the ceval loop, I'd probably
consider implementing a register VM.  While register VMs are a bit
faster than stack VMs (up to 20-30%), they would also allow us to
apply
more optimizations, and even bolt on a simple JIT compiler.


If you did tackle the register VM approach that would also settle a 
long-standing question of whether a certain optimization works for Python.


Are there some resources on why register machines are considered faster 
than stack machines?


As for bolting on a JIT, the whole point of Pyjion is to see if that's 
worth it for CPython, so that's already being taken care of (and is 
actually easier with a stack-based VM since the JIT engine we're using 
is stack-based itself).


Interesting. Haven't noticed these projects, yet.

So, it could be that we will see a jitted CPython when Pyjion appears to 
be successful?


Best,
Sven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Sven R. Kunze

On 01.02.2016 20:51, Yury Selivanov wrote:
If LOAD_ATTR gets too many cache misses (20 in my current patch) it 
gets deoptimized, and the default implementation is used.  So if the 
code is very dynamic - there's no improvement, but no performance 
penalty either.


Will you re-try optimizing it?

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Brett Cannon
On Mon, 1 Feb 2016 at 10:21 Sven R. Kunze  wrote:

>
>
> On 01.02.2016 18:18, Brett Cannon wrote:
>
>
>
> On Mon, 1 Feb 2016 at 09:08 Yury Selivanov < 
> [email protected]> wrote:
>
>>
>>
>> On 2016-01-29 11:28 PM, Steven D'Aprano wrote:
>> > On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote:
>> >> Hi,
>> >>
>> >>
>> >> tl;dr The summary is that I have a patch that improves CPython
>> >> performance up to 5-10% on macro benchmarks.  Benchmarks results on
>> >> Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are available
>> >> at [1].  There are no slowdowns that I could reproduce consistently.
>> > Have you looked at Cesare Di Mauro's wpython? As far as I know, it's now
>> > unmaintained, and the project repo on Google Code appears to be dead (I
>> > get a 404), but I understand that it was significantly faster than
>> > CPython back in the 2.6 days.
>> >
>> >
>> https://wpython.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf
>> >
>> >
>>
>> Thanks for bringing this up!
>>
>> IIRC wpython was about using "fat" bytecodes, i.e. using 64bits per
>> bytecode instead of 8.  That allows to minimize the number of bytecodes,
>> thus having some performance increase.  TBH, I don't think it was
>> "significantly faster".
>>
>> If I were to do some big refactoring of the ceval loop, I'd probably
>> consider implementing a register VM.  While register VMs are a bit
>> faster than stack VMs (up to 20-30%), they would also allow us to apply
>> more optimizations, and even bolt on a simple JIT compiler.
>>
>
> If you did tackle the register VM approach that would also settle a
> long-standing question of whether a certain optimization works for Python.
>
>
> Are there some resources on why register machines are considered faster
> than stack machines?
>

A search for [stack vs register based virtual machine] will get you some
information.


>
>
> As for bolting on a JIT, the whole point of Pyjion is to see if that's
> worth it for CPython, so that's already being taken care of (and is
> actually easier with a stack-based VM since the JIT engine we're using is
> stack-based itself).
>
>
> Interesting. Haven't noticed these projects, yet.
>

You aren't really supposed to yet. :) In Pyjion's case we are still working
on compatibility, let alone trying to show a speed improvement so we have
not said much beyond this mailing list (we have a talk proposal in for
PyCon US that we hope gets accepted). We just happened to get picked up on
Reddit and HN recently and so interest has spiked in the project.


>
> So, it could be that we will see a jitted CPython when Pyjion appears to
> be successful?
>

The ability to plug in a JIT, but yes, that's the hope.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Sven R. Kunze

On 01.02.2016 21:35, Yury Selivanov wrote:
It's important to understand that if we have a lot of cache misses 
after the code object was executed 1000 times, it doesn't make sense 
to keep trying to update that cache.  It just means that the code, in 
that particular point, works with different kinds of objects.


So, the assumption is that the code makes the difference here not time. 
That could be true for production code.


FWIW, I experimented with different ideas (one is to never 
de-optimize), and the current strategy works best on the vast number 
of benchmarks.


Nice.

Regarding the magic constants (1000, 20) what is the process of updating 
them?



Best,
Sven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov

Hi Damien,

On 2016-02-01 3:59 PM, Damien George wrote:

Hi Yury,

That's great news about the speed improvements with the dict offset cache!


The cache struct is defined in code.h [2], and is 32 bytes long. When a
code object becomes hot, it gets an cache offset table allocated for it
(+1 byte for each opcode) + an array of cache structs.

Ok, so each opcode has a 1-byte cache that sits separately to the
actual bytecode.  But a lot of opcodes don't use it so that leads to
some wasted memory, correct?


Each code object has a list of opcodes and their arguments
(bytes object == unsigned char array).

"Hot" code objects have an offset table (unsigned chars), and
a cache entries array (hope your email client will display
the following correctly):

   opcodes  offset   cache entries
table

OPCODE0cache for 1st LOAD_ATTR
ARG1  0cache for 1st LOAD_GLOBAL
ARG2  0cache for 2nd LOAD_ATTR
OPCODE0cache for 1st LOAD_METHOD
LOAD_ATTR 1...
ARG1  0
ARG2  0
OPCODE0
LOAD_GLOBAL   2
ARG1  0
ARG2  0
LOAD_ATTR 3
ARG1  0
ARG2  0
...  ...
LOAD_METHOD   4
...  ...

When, say, a LOAD_ATTR opcode executes, it first checks if the
code object has a non-NULL cache-entries table.

If it has, that LOAD_ATTR then uses the offset table (indexing
with its `INSTR_OFFSET()`) to find its position in
cache-entries.



But then how do you index the cache, do you keep a count of the
current opcode number?  If I remember correctly, CPython has some
opcodes taking 1 byte, and some taking 3 bytes, so the offset into the
bytecode cannot be easily mapped to a bytecode number.


First, when a code object is created, it doesn't have
an offset table and cache entries (those are set to NULL).

Each code object has a new field to count how many times
it was called.  Each time a code object is called with
PyEval_EvalFrameEx, that field is inced.

Once a code object is called more than 1024 times we:

1. allocate memory for its offset table

2. iterate through its opcodes and count how many
LOAD_ATTR, LOAD_METHOD and LOAD_GLOBAL opcodes it has;

3. As part of (2) we initialize the offset-table with
correct mapping.  Some opcodes will have a non-zero
entry in the offset-table, some won't.  Opcode args
will always have zeros in the offset tables.

4. Then we allocate cache-entries table.

Yury
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov



On 2016-02-01 4:02 PM, Sven R. Kunze wrote:

On 01.02.2016 21:35, Yury Selivanov wrote:
It's important to understand that if we have a lot of cache misses 
after the code object was executed 1000 times, it doesn't make sense 
to keep trying to update that cache.  It just means that the code, in 
that particular point, works with different kinds of objects.


So, the assumption is that the code makes the difference here not 
time. That could be true for production code.


FWIW, I experimented with different ideas (one is to never 
de-optimize), and the current strategy works best on the vast number 
of benchmarks.


Nice.

Regarding the magic constants (1000, 20) what is the process of 
updating them?


Right now they are private constants in ceval.c.

I will (maybe) expose a private API via the _testcapi module to 
re-define them (set them to 1 or 0), only to write better unittests.  I 
have no plans to make those constants public or have a public API to 
tackle them.  IMHO, this is something that almost nobody will ever use.


Yury
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Andrew Barnert via Python-Dev
Looking over the thread and the two issues, you've got good arguments for why 
the improved code will be the most common code, and good benchmarks for various 
kinds of real-life code, but it doesn't seem like you'd tried to stress it on 
anything that could be made worse. From your explanations and your code, I 
wouldn't expect that @classmethods, functions stored in the object dict or 
generated by __getattr__, non-function callables as methods, etc. would go 
significantly slower, or code that mixes @properties or __getattr__ proxy 
attributes with real attributes, or uses __slots__, or code that does 
frequently write to a global, etc. But it would be nice to _know_ that they 
don't instead of just expecting it.

Sent from my iPhone

> On Feb 1, 2016, at 11:10, Yury Selivanov  wrote:
> 
> Hi,
> 
> This is the second email thread I start regarding implementing an opcode 
> cache in ceval loop.  Since my first post on this topic:
> 
> - I've implemented another optimization (LOAD_ATTR);
> 
> - I've added detailed statistics mode so that I can "see" how the cache 
> performs and tune it;
> 
> - some macro benchmarks are now 10-20% faster; 2to3 (a real application) is 
> 7-8% faster;
> 
> - and I have some good insights on the memory footprint.
> 
> ** The purpose of this email is to get a general approval from python-dev, so 
> that I can start polishing the patches and getting them reviewed/committed. **
> 
> 
> Summary of optimizations
> 
> 
> When a code object is executed more than ~1000 times, it's considered "hot".  
> It gets its opcodes analyzed to initialize caches for LOAD_METHOD (a new 
> opcode I propose to add in [1]), LOAD_ATTR, and LOAD_GLOBAL.
> 
> It's important to only optimize code objects that were executed "enough" 
> times, to avoid optimizing code objects for modules, classes, and functions 
> that were imported but never used.
> 
> The cache struct is defined in code.h [2], and is 32 bytes long. When a code 
> object becomes hot, it gets an cache offset table allocated for it (+1 byte 
> for each opcode) + an array of cache structs.
> 
> To measure the max/average memory impact, I tuned my code to optimize *every* 
> code object on *first* run.  Then I ran the entire Python test suite.  Python 
> test suite + standard library both contain around 72395 code objects, which 
> required 20Mb of memory for caches.  The test process consumed around 400Mb 
> of memory.  Thus, the absolute worst case scenario, the overhead is about 5%.
> 
> Then I ran the test suite without any modifications to the patch. This means 
> that only code objects that are called frequently enough are optimized.  In 
> this more, only 2072 code objects were optimized, using less than 1Mb of 
> memory for the cache.
> 
> 
> LOAD_ATTR
> -
> 
> Damien George mentioned that they optimize a lot of dict lookups in 
> MicroPython by memorizing last key/value offset in the dict object, thus 
> eliminating lots of hash lookups.  I've implemented this optimization in my 
> patch.  The results are quite good.  A simple micro-benchmark [3] shows ~30% 
> speed improvement.  Here are some debug stats generated by 2to3 benchmark:
> 
> -- Opcode cache LOAD_ATTR hits = 14778415 (83%)
> -- Opcode cache LOAD_ATTR misses   = 750 (0%)
> -- Opcode cache LOAD_ATTR opts = 282
> -- Opcode cache LOAD_ATTR deopts   = 60
> -- Opcode cache LOAD_ATTR total= 1912
> 
> Each "hit" makes LOAD_ATTR about 30% faster.
> 
> 
> LOAD_GLOBAL
> ---
> 
> This turned out to be a very stable optimization.  Here is the debug output 
> of the 2to3 test:
> 
> -- Opcode cache LOAD_GLOBAL hits   = 3940647 (100%)
> -- Opcode cache LOAD_GLOBAL misses = 0 (0%)
> -- Opcode cache LOAD_GLOBAL opts   = 252
> 
> All benchmarks (and real code) have stats like that.  Globals and builtins 
> are very rarely modified, so the cache works really well.  With LOAD_GLOBAL 
> opcode cache, global lookup is very cheap, there is no hash lookup for it at 
> all.  It makes optimizations like "def foo(len=len)" obsolete.
> 
> 
> LOAD_METHOD
> ---
> 
> This is a new opcode I propose to add in [1].  The idea is to substitute 
> LOAD_ATTR with it, and avoid instantiation of BoundMethod objects.
> 
> With the cache, we can store a reference to the method descriptor (I use 
> type->tp_version_tag for cache invalidation, the same thing _PyType_Lookup is 
> built around).
> 
> The cache makes LOAD_METHOD really efficient.  A simple micro-benchmark like 
> [4], shows that with the cache and LOAD_METHOD, "s.startswith('abc')" becomes 
> as efficient as "s[:3] == 'abc'".
> 
> LOAD_METHOD/CALL_FUNCTION without cache is about 20% faster than 
> LOAD_ATTR/CALL_FUNCTION.  With the cache, it's about 30% faster.
> 
> Here's the debug output of the 2to3 benchmark:
> 
> -- Opcode cache LOAD_METHOD hits   = 5164848 (64%)
> -- Opcode cache LOAD_METHOD misses = 12 (0%)
> -- Opcode cache LOAD_METHOD opts   = 94
> -- Opcode cache LOAD_METHOD d

Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov

Hi Brett,

On 2016-02-01 2:30 PM, Brett Cannon wrote:



On Mon, 1 Feb 2016 at 11:11 Yury Selivanov > wrote:


Hi,


[..]


What's next?


First, I'd like to merge the new LOAD_METHOD opcode, see issue 26110
[1].  It's a very straightforward optimization, the patch is small and
easy to review.


+1 from me.


Second, I'd like to merge the new opcode cache, see issue 26219 [5].
All unittests pass.  Memory usage increase is very moderate (<1mb for
the entire test suite), and the performance increase is significant.
The only potential blocker for this is PEP 509 approval (which I'd be
happy to assist with).


I think the fact that it improves performance across the board as well 
as eliminates the various tricks people use to cache global and 
built-ins, a big +1 from me. I guess that means Victor needs to ask 
for pronouncement on PEP 509.


Great!  AFAIK Victor still needs to update the PEP with some changes 
(globally unique ma_version).  My patch includes the latest 
implementation of PEP 509, and it works fine (no regressions, no broken 
unittests).  I can also assist with reviewing Victor's implementation if 
the PEP is accepted.




BTW, where does LOAD_ATTR fit into all of this?


LOAD_ATTR optimization doesn't use any of PEP 509 new stuff (if I 
understand you question correctly).  It's based on the following 
assumptions (that really make JITs work so well):


1. Most classes don't implement __getattribute__.

2. A lot of attributes are stored in objects' __dict__s.

3. Most attributes aren't shaded by descriptors/getters-setters; most 
code just uses "self.attr".


4. An average method/function works on objects of the same type. Which 
means that those objects were constructed in a very similar (if not 
exact) fashion.


For instance:

class F:
   def __init__(self, name):
   self.name = name
   def say(self):
   print(self.name)   # <- For all F instances,
  # offset of 'name' in `F().__dict__`s
  # will be the same

If LOAD_ATTR gets too many cache misses (20 in my current patch) it gets 
deoptimized, and the default implementation is used.  So if the code is 
very dynamic - there's no improvement, but no performance penalty either.


In my patch, I use the cache to store (for LOAD_ATTR specifically):

- pointer to object's type
- type->tp_version_tag
- the last successful __dict__ offset

The first two fields are used to make sure that we have objects of the 
same type.  If it changes, we deoptimize the opcode immediately.  Then 
we try the offset.  If it's successful - we have a cache hit.  If not, 
that's fine, we'll try another few times before deoptimizing the opcode.




What do you think?


It all looks great to me!


Thanks!

Yury

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Brett Cannon
On Mon, 1 Feb 2016 at 11:51 Yury Selivanov  wrote:

> Hi Brett,
>
> On 2016-02-01 2:30 PM, Brett Cannon wrote:
> >
> >
> > On Mon, 1 Feb 2016 at 11:11 Yury Selivanov  > > wrote:
> >
> > Hi,
> >
> [..]
> >
> > What's next?
> > 
> >
> > First, I'd like to merge the new LOAD_METHOD opcode, see issue 26110
> > [1].  It's a very straightforward optimization, the patch is small
> and
> > easy to review.
> >
> >
> > +1 from me.
> >
> >
> > Second, I'd like to merge the new opcode cache, see issue 26219 [5].
> > All unittests pass.  Memory usage increase is very moderate (<1mb for
> > the entire test suite), and the performance increase is significant.
> > The only potential blocker for this is PEP 509 approval (which I'd be
> > happy to assist with).
> >
> >
> > I think the fact that it improves performance across the board as well
> > as eliminates the various tricks people use to cache global and
> > built-ins, a big +1 from me. I guess that means Victor needs to ask
> > for pronouncement on PEP 509.
>
> Great!  AFAIK Victor still needs to update the PEP with some changes
> (globally unique ma_version).  My patch includes the latest
> implementation of PEP 509, and it works fine (no regressions, no broken
> unittests).  I can also assist with reviewing Victor's implementation if
> the PEP is accepted.
>
> >
> > BTW, where does LOAD_ATTR fit into all of this?
>
> LOAD_ATTR optimization doesn't use any of PEP 509 new stuff (if I
> understand you question correctly).  It's based on the following
> assumptions (that really make JITs work so well):
>
> 1. Most classes don't implement __getattribute__.
>
> 2. A lot of attributes are stored in objects' __dict__s.
>
> 3. Most attributes aren't shaded by descriptors/getters-setters; most
> code just uses "self.attr".
>
> 4. An average method/function works on objects of the same type. Which
> means that those objects were constructed in a very similar (if not
> exact) fashion.
>
> For instance:
>
> class F:
> def __init__(self, name):
> self.name = name
> def say(self):
> print(self.name)   # <- For all F instances,
># offset of 'name' in `F().__dict__`s
># will be the same
>
> If LOAD_ATTR gets too many cache misses (20 in my current patch) it gets
> deoptimized, and the default implementation is used.  So if the code is
> very dynamic - there's no improvement, but no performance penalty either.
>
> In my patch, I use the cache to store (for LOAD_ATTR specifically):
>
> - pointer to object's type
> - type->tp_version_tag
> - the last successful __dict__ offset
>
> The first two fields are used to make sure that we have objects of the
> same type.  If it changes, we deoptimize the opcode immediately.  Then
> we try the offset.  If it's successful - we have a cache hit.  If not,
> that's fine, we'll try another few times before deoptimizing the opcode.
>

So this is a third "next step" that has its own issue?

-Brett


>
> >
> > What do you think?
> >
> >
> > It all looks great to me!
>
> Thanks!
>
> Yury
>
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov

Brett,

On 2016-02-01 3:08 PM, Brett Cannon wrote:



On Mon, 1 Feb 2016 at 11:51 Yury Selivanov > wrote:


Hi Brett,


[..]



The first two fields are used to make sure that we have objects of the
same type.  If it changes, we deoptimize the opcode immediately.  Then
we try the offset.  If it's successful - we have a cache hit.  If not,
that's fine, we'll try another few times before deoptimizing the
opcode.


So this is a third "next step" that has its own issue?


It's all in issue http://bugs.python.org/issue26219 right now.

My current plan is to implement LOAD_METHOD/CALL_METHOD (just opcodes, 
no cache) in 26110.


Then implement caching for LOAD_METHOD, LOAD_GLOBAL, and LOAD_ATTR in 
26219.  I'm flexible to break down 26219 in three separate issues if 
that helps the review process (but that would take more of my time):


- implement support for opcode caching (general infrastructure) + 
LOAD_GLOBAL optimization

- LOAD_METHOD optimization
- LOAD_ATTR optimization

Yury
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov



On 2016-02-01 4:21 PM, Yury Selivanov wrote:

Hi Damien,

On 2016-02-01 3:59 PM, Damien George wrote:


[..]


But then how do you index the cache, do you keep a count of the
current opcode number?  If I remember correctly, CPython has some
opcodes taking 1 byte, and some taking 3 bytes, so the offset into the
bytecode cannot be easily mapped to a bytecode number.




Here are a few links that might explain the idea better:

https://github.com/1st1/cpython/blob/opcache5/Python/ceval.c#L1229
https://github.com/1st1/cpython/blob/opcache5/Python/ceval.c#L2610
https://github.com/1st1/cpython/blob/opcache5/Objects/codeobject.c#L167

Yury
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov



On 2016-02-01 3:21 PM, Brett Cannon wrote:


On Mon, 1 Feb 2016 at 12:16 Yury Selivanov > wrote:


Brett,

On 2016-02-01 3:08 PM, Brett Cannon wrote:
>
>
> On Mon, 1 Feb 2016 at 11:51 Yury Selivanov
mailto:[email protected]>
> >> wrote:
>
> Hi Brett,
>
[..]
>
>
> The first two fields are used to make sure that we have
objects of the
> same type.  If it changes, we deoptimize the opcode
immediately.  Then
> we try the offset.  If it's successful - we have a cache
hit.  If not,
> that's fine, we'll try another few times before deoptimizing the
> opcode.
>
>
> So this is a third "next step" that has its own issue?

It's all in issue http://bugs.python.org/issue26219 right now.

My current plan is to implement LOAD_METHOD/CALL_METHOD (just opcodes,
no cache) in 26110.

Then implement caching for LOAD_METHOD, LOAD_GLOBAL, and LOAD_ATTR in
26219.  I'm flexible to break down 26219 in three separate issues if
that helps the review process (but that would take more of my time):

- implement support for opcode caching (general infrastructure) +
LOAD_GLOBAL optimization
- LOAD_METHOD optimization
- LOAD_ATTR optimization


I personally don't care how you break it down, just trying to keep all 
the moving pieces in my head. :)


Anyway, it sounds like PEP 509 is blocking part of it, but the 
LOAD_METHOD stuff can go in as-is. So are you truly blocked only on 
getting the latest version of that patch up to 
http://bugs.python.org/issue26110 and getting a code review?
Yep.  The initial implementation of LOAD_METHOD doesn't need PEP 509 / 
opcode caching.  I'll have to focus on something else this week, but 
early next week I can upload a new patch for 26110.


When we have 26110 committed and PEP 509 approved and committed, I can 
update the opcode cache patch (issue 26219) and we can start reviewing it.


Yury
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Andrew Barnert via Python-Dev
On Feb 1, 2016, at 09:59, [email protected] wrote:
> 
>  If the stdlib were to use implicit namespace packages
> ( https://www.python.org/dev/peps/pep-0420/ ) and the various
> loaders/importers as well, then python could do what I've done with an
> embedded python application for years.  Freeze the stdlib (or put it
> in a zipfile or whatever is fast).  Then arrange PYTHONPATH to first
> look on the filesystem and then look in the frozen/ziped storage.

This is a great solution for experienced developers, but I think it would be 
pretty bad for novices or transplants from other languages (maybe even 
including Python 2).

There are already multiple duplicate questions every month on StackOverflow 
from people asking "how do I find the source to stdlib module X". The canonical 
answer starts off by explaining how to import the module and use its __file__, 
which everyone is able to handle. If we have to instead explain how to work out 
the .py name from the qualified module name, how to work out the stdlib path 
from sys.path, and then how to find the source from those two things, with the 
caveat that it may not be installed at all on some platforms, and how to make 
sure what they're asking about really is a stdlib module, and how to make sure 
they aren't shadowing it with a module elsewhere on sys.path, that's a lot more 
complicated. Especially when you consider that some people on Windows and Mac 
are writing Python scripts without ever learning how to use the terminal or 
find their Python packages via Explorer/Finder. 

And meanwhile, other people would be asking why their app runs slower on one 
machine than another, because they didn't expect that installing python-dev on 
top of python would slow down startup.

Finally, on Linux and Mac, the stdlib will usually be somewhere that's not 
user-writable--and we shouldn't expect users to have to mess with stuff in 
/usr/lib or /System/Library even if they do have sudo access. Of course we 
could put a "stdlib shadow" location on the sys.path and configure it for 
/usr/local/lib and /Library and/or for somewhere in -, but that just makes the 
lookup proceed even more complicated--not to mention that we've just added 
three stat calls to remove one open, at which point the optimization has 
probably become a pessimization.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov



On 2016-02-01 3:27 PM, Sven R. Kunze wrote:

On 01.02.2016 20:51, Yury Selivanov wrote:
If LOAD_ATTR gets too many cache misses (20 in my current patch) it 
gets deoptimized, and the default implementation is used.  So if the 
code is very dynamic - there's no improvement, but no performance 
penalty either.


Will you re-try optimizing it?


No.

It's important to understand that if we have a lot of cache misses after 
the code object was executed 1000 times, it doesn't make sense to keep 
trying to update that cache.  It just means that the code, in that 
particular point, works with different kinds of objects. FWIW, I 
experimented with different ideas (one is to never de-optimize), and the 
current strategy works best on the vast number of benchmarks.


Yury
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Mark Lawrence

On 01/02/2016 16:54, Yury Selivanov wrote:



On 2016-01-29 11:28 PM, Steven D'Aprano wrote:

On Wed, Jan 27, 2016 at 01:25:27PM -0500, Yury Selivanov wrote:

Hi,

tl;dr The summary is that I have a patch that improves CPython
performance up to 5-10% on macro benchmarks.  Benchmarks results on
Macbook Pro/Mac OS X, desktop CPU/Linux, server CPU/Linux are available
at [1].  There are no slowdowns that I could reproduce consistently.

Have you looked at Cesare Di Mauro's wpython? As far as I know, it's now
unmaintained, and the project repo on Google Code appears to be dead (I
get a 404), but I understand that it was significantly faster than
CPython back in the 2.6 days.

https://wpython.googlecode.com/files/Beyond%20Bytecode%20-%20A%20Wordcode-based%20Python.pdf



Thanks for bringing this up!

IIRC wpython was about using "fat" bytecodes, i.e. using 64bits per
bytecode instead of 8.  That allows to minimize the number of bytecodes,
thus having some performance increase.  TBH, I don't think it was
"significantly faster".



From https://code.google.com/archive/p/wpython/


WPython is a re-implementation of (some parts of) Python, which drops 
support for bytecode in favour of a wordcode-based model (where a is 
word is 16 bits wide).


It also implements an hybrid stack-register virtual machine, and adds a 
lot of other optimizations.



--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Damien George
Hi Yury,

That's great news about the speed improvements with the dict offset cache!

> The cache struct is defined in code.h [2], and is 32 bytes long. When a
> code object becomes hot, it gets an cache offset table allocated for it
> (+1 byte for each opcode) + an array of cache structs.

Ok, so each opcode has a 1-byte cache that sits separately to the
actual bytecode.  But a lot of opcodes don't use it so that leads to
some wasted memory, correct?

But then how do you index the cache, do you keep a count of the
current opcode number?  If I remember correctly, CPython has some
opcodes taking 1 byte, and some taking 3 bytes, so the offset into the
bytecode cannot be easily mapped to a bytecode number.

Cheers,
Damien.
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Sven R. Kunze

On 01.02.2016 22:27, Yury Selivanov wrote:

Right now they are private constants in ceval.c.

I will (maybe) expose a private API via the _testcapi module to 
re-define them (set them to 1 or 0), only to write better unittests.  
I have no plans to make those constants public or have a public API to 
tackle them.  IMHO, this is something that almost nobody will ever use.


Alright. I agree with you on that.

What I actually meant was: how can we find the optimal values? I 
understand that 1000 and 20 are some hand-figured/subjective values for now.


Is there standardized/objective way to find out the best values? What 
does best even mean here?


Best,
Sven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov

Sven,

On 2016-02-01 4:32 PM, Sven R. Kunze wrote:

On 01.02.2016 22:27, Yury Selivanov wrote:

Right now they are private constants in ceval.c.

I will (maybe) expose a private API via the _testcapi module to 
re-define them (set them to 1 or 0), only to write better unittests.  
I have no plans to make those constants public or have a public API 
to tackle them.  IMHO, this is something that almost nobody will ever 
use.


Alright. I agree with you on that.

What I actually meant was: how can we find the optimal values? I 
understand that 1000 and 20 are some hand-figured/subjective values 
for now.


Is there standardized/objective way to find out the best values? What 
does best even mean here?


Running lots of benchmarks and micro-benchmarks hundreds of times ;)  
I've done a lot of that, and I noticed that the numbers don't matter too 
much.


What matters is that we don't want to optimize the code that runs 0 or 1 
times.  To save some memory we don't want to optimize the code that runs 
10 times.  So 1000 seems to be about right.


We also need to deoptimize the code to avoid having too many cache 
misses/pointless cache updates.  I found that, for instance, LOAD_ATTR 
is either super stable (hits 100% of times), or really unstable, so 20 
misses is, again, seems to be alright.


I'm flexible about tweaking those values, I encourage you and everyone 
to experiment, if you have time ;) 
https://github.com/1st1/cpython/blob/opcache5/Python/ceval.c#L100


Thanks,
Yury
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Sven R. Kunze

On 01.02.2016 22:43, Yury Selivanov wrote:

Sven,

On 2016-02-01 4:32 PM, Sven R. Kunze wrote:

On 01.02.2016 22:27, Yury Selivanov wrote:

Right now they are private constants in ceval.c.

I will (maybe) expose a private API via the _testcapi module to 
re-define them (set them to 1 or 0), only to write better 
unittests.  I have no plans to make those constants public or have a 
public API to tackle them.  IMHO, this is something that almost 
nobody will ever use.


Alright. I agree with you on that.

What I actually meant was: how can we find the optimal values? I 
understand that 1000 and 20 are some hand-figured/subjective values 
for now.


Is there standardized/objective way to find out the best values? What 
does best even mean here?


Running lots of benchmarks and micro-benchmarks hundreds of times ;)  
I've done a lot of that, and I noticed that the numbers don't matter 
too much.


That's actually pretty interesting. :)

Do you consider writing a blog post about this at some time?

What matters is that we don't want to optimize the code that runs 0 or 
1 times.  To save some memory we don't want to optimize the code that 
runs 10 times.  So 1000 seems to be about right.


We also need to deoptimize the code to avoid having too many cache 
misses/pointless cache updates.  I found that, for instance, LOAD_ATTR 
is either super stable (hits 100% of times), or really unstable, so 20 
misses is, again, seems to be alright.


I'm flexible about tweaking those values, I encourage you and everyone 
to experiment, if you have time ;) 
https://github.com/1st1/cpython/blob/opcache5/Python/ceval.c#L100


Right now, I am busy with the heap implementation but I think I can look 
into it later.


Best,
Sven
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Opcode cache in ceval loop

2016-02-01 Thread Yury Selivanov

Andrew,

On 2016-02-01 4:29 PM, Andrew Barnert wrote:

Looking over the thread and the two issues, you've got good arguments for why 
the improved code will be the most common code, and good benchmarks for various 
kinds of real-life code, but it doesn't seem like you'd tried to stress it on 
anything that could be made worse. From your explanations and your code, I 
wouldn't expect that @classmethods, functions stored in the object dict or 
generated by __getattr__, non-function callables as methods, etc. would go 
significantly slower,


Right.  The caching, of course, has some overhead, albeit barely 
detectable.  The only way the slow down might become "significant" if 
there is a bug in the ceval.c code -- i.e. an opcode doesn't get 
de-optimized etc.  That should be fixable.



  or code that mixes @properties or __getattr__ proxy attributes with real 
attributes, or uses __slots__,


No performance degradation for __slots__, we have a benchmark for that.  
I also tried to add __slots__ to every class in the Richards test - no 
improvement or performance degradation there.



  or code that does frequently write to a global, etc. But it would be nice to 
_know_ that they don't instead of just expecting it.


FWIW I've just tried to write a micro-benchmark for __getattr__: 
https://gist.github.com/1st1/22c1aa0a46f246a31515


Opcode cache gets quickly deoptimized with it, but, as expected, the 
CPython with opcode cache is <1% slower.  But that's a 1% in a super 
micro-benchmark; of course the cost of having a cache that isn't used 
will show up.  In a real code that doesn't consist only of LOAD_ATTRs, 
it won't even be possible to see any slowdown.


Thanks,
Yury
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Speeding up CPython 5-10%

2016-02-01 Thread Greg Ewing

Sven R. Kunze wrote:
Are there some resources on why register machines are considered faster 
than stack machines?


If a register VM is faster, it's probably because each register
instruction does the work of about 2-3 stack instructions,
meaning less trips around the eval loop, so less unpredictable
branches and less pipeline flushes.

This assumes that bytecode dispatching is a substantial fraction
of the time taken to execute each instruction. For something
like cpython, where the operations carried out by the bytecodes
involve a substantial amount of work, this may not be true.

It also assumes the VM is executing the bytecodes directly. If
there is a JIT involved, it all gets translated into something
else anyway, and then it's more a matter of whether you find
it easier to design the JIT to deal with stack or register code.

--
Greg
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Terry Reedy

On 2/1/2016 3:39 PM, Andrew Barnert via Python-Dev wrote:


There are already multiple duplicate questions every month on
StackOverflow from people asking "how do I find the source to stdlib
module X". The canonical answer starts off by explaining how to
import the module and use its __file__, which everyone is able to
handle.


Perhaps even easier: start IDLE, hit Alt-M, type in module name as one 
would import it, click OK.  If Python source is available, IDLE will 
open in an editor window. with the path on the title bar.


 If we have to instead explain how to work out the .py name

from the qualified module name, how to work out the stdlib path from
sys.path, and then how to find the source from those two things, with
the caveat that it may not be installed at all on some platforms, and
how to make sure what they're asking about really is a stdlib module,
and how to make sure they aren't shadowing it with a module elsewhere
on sys.path, that's a lot more complicated.


The windows has the path on the title bar, so one can tell what was loaded.

IDLE currently uses imp.find_module (this could be updated), with a 
backup of __import__(...).__file__, so it will load non-stdlib files 
that can be imported.


> Finally, on Linux and Mac, the stdlib will usually be somewhere
> that's not user-writable

On Windows, this depends on the install location.  Perhaps there should 
be an option for edit-save or view only to avoid accidental changes.


--
Terry Jan Reedy

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] More optimisation ideas

2016-02-01 Thread Andrew Barnert via Python-Dev
On Feb 1, 2016, at 19:44, Terry Reedy  wrote:
> 
>> On 2/1/2016 3:39 PM, Andrew Barnert via Python-Dev wrote:
>> 
>> There are already multiple duplicate questions every month on
>> StackOverflow from people asking "how do I find the source to stdlib
>> module X". The canonical answer starts off by explaining how to
>> import the module and use its __file__, which everyone is able to
>> handle.
> 
> Perhaps even easier: start IDLE, hit Alt-M, type in module name as one would 
> import it, click OK.  If Python source is available, IDLE will open in an 
> editor window. with the path on the title bar.
> 
>> If we have to instead explain how to work out the .py name
>> from the qualified module name, how to work out the stdlib path from
>> sys.path, and then how to find the source from those two things, with
>> the caveat that it may not be installed at all on some platforms, and
>> how to make sure what they're asking about really is a stdlib module,
>> and how to make sure they aren't shadowing it with a module elsewhere
>> on sys.path, that's a lot more complicated.
> 
> The windows has the path on the title bar, so one can tell what was loaded.

The point of this thread is the suggestion that the stdlib modules be frozen or 
stored in a zipfile, unless a user modifies things in some way to make the 
source accessible. So, if a user hasn't done that (which no novice will know 
how to do), there won't be a path to show in the title bar, so IDLE won't be 
any more help than the command line.

(I suppose IDLE could grow a new feature to look up "associated source files" 
for a zipped stdlib or something, but that seems like a pretty big new feature.)

> IDLE currently uses imp.find_module (this could be updated), with a backup of 
> __import__(...).__file__, so it will load non-stdlib files that can be 
> imported.
> 
> > Finally, on Linux and Mac, the stdlib will usually be somewhere
> > that's not user-writable
> 
> On Windows, this depends on the install location.  Perhaps there should be an 
> option for edit-save or view only to avoid accidental changes.

The problem is that, if the standard way for users to see stdlib sources is to 
copy them from somewhere else (like $install/src/Lib) into a stdlib directory 
(like $install/Lib), then that stdlib directory has to be writable--and on Mac 
and Linux, it's not.

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com