Re: [Python-Dev] PEP 488: elimination of PYO files

2015-03-07 Thread Steven D'Aprano
On Fri, Mar 06, 2015 at 08:00:20PM -0500, Ron Adam wrote:

> Have you considered doing this by having different magic numbers in the 
> .pyc file for standard, -O, and -O0 compiled bytecode files?  Python 
> already checks that number and recompiles the files if it's not what it's 
> expected to be.  And it wouldn't require any naming conventions or new 
> cache directories.  It seems to me it would be much easier to do as well.

And it would fail to solve the problem. The problem isn't just that the 
.pyo file can contain the wrong byte-code for the optimization level, 
that's only part of the problem. Another issue is that you cannot have 
pre-compiled byte-code for multiple different optimization levels. You 
can have a "no optimization" byte-code file, the .pyc file, but only one 
"optimized" byte-code file at the same time.

Brett's proposal will allow -O optimized and -OO optimized byte-code 
files to co-exist, as well as setting up a clear naming convention for 
future optimizers in either the Python compiler or third-party 
optimizers.

No new cache directories are needed. The __pycache__ directory has been 
used since Python 3.3 (or was it 3.2? I forget which). 



-- 
Steve
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 488: elimination of PYO files

2015-03-07 Thread Ron Adam



On 03/07/2015 04:58 AM, Steven D'Aprano wrote:

On Fri, Mar 06, 2015 at 08:00:20PM -0500, Ron Adam wrote:


>Have you considered doing this by having different magic numbers in the
>.pyc file for standard, -O, and -O0 compiled bytecode files?  Python
>already checks that number and recompiles the files if it's not what it's
>expected to be.  And it wouldn't require any naming conventions or new
>cache directories.  It seems to me it would be much easier to do as well.

And it would fail to solve the problem. The problem isn't just that the
.pyo file can contain the wrong byte-code for the optimization level,
that's only part of the problem. Another issue is that you cannot have
pre-compiled byte-code for multiple different optimization levels. You
can have a "no optimization" byte-code file, the .pyc file, but only one
"optimized" byte-code file at the same time.

Brett's proposal will allow -O optimized and -OO optimized byte-code
files to co-exist, as well as setting up a clear naming convention for
future optimizers in either the Python compiler or third-party
optimizers.


So all the different versions can be generated ahead of time. I think that 
is the main difference.


My suggestion would cause a recompile of all dependent python files when 
different optimisation levels are used in different projects. Which may be 
worse than not generating bytecode files at all.  OK



A few questions...

Can a submodule use an optimazation level that is different from the file 
that imports it?   (Other than the case this is trying to solve.)


Is there way to specify that an imported module not use any optimisation 
level, or to always use a specific optimisation level?


Is there a way to run tests with all the different optimisation levels?


Cheers,
   Ron

___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 488: elimination of PYO files

2015-03-07 Thread Brett Cannon
On Sat, Mar 7, 2015 at 9:29 AM Ron Adam  wrote:

>
>
> On 03/07/2015 04:58 AM, Steven D'Aprano wrote:
> > On Fri, Mar 06, 2015 at 08:00:20PM -0500, Ron Adam wrote:
> >
> >> >Have you considered doing this by having different magic numbers in the
> >> >.pyc file for standard, -O, and -O0 compiled bytecode files?  Python
> >> >already checks that number and recompiles the files if it's not what
> it's
> >> >expected to be.  And it wouldn't require any naming conventions or new
> >> >cache directories.  It seems to me it would be much easier to do as
> well.
> > And it would fail to solve the problem. The problem isn't just that the
> > .pyo file can contain the wrong byte-code for the optimization level,
> > that's only part of the problem. Another issue is that you cannot have
> > pre-compiled byte-code for multiple different optimization levels. You
> > can have a "no optimization" byte-code file, the .pyc file, but only one
> > "optimized" byte-code file at the same time.
> >
> > Brett's proposal will allow -O optimized and -OO optimized byte-code
> > files to co-exist, as well as setting up a clear naming convention for
> > future optimizers in either the Python compiler or third-party
> > optimizers.
>
> So all the different versions can be generated ahead of time. I think that
> is the main difference.
>
> My suggestion would cause a recompile of all dependent python files when
> different optimisation levels are used in different projects. Which may be
> worse than not generating bytecode files at all.  OK
>
>
> A few questions...
>
> Can a submodule use an optimazation level that is different from the file
> that imports it?   (Other than the case this is trying to solve.)
>

Currently yes, with this PEP no (without purposefully doing it with some
custom loader).


>
> Is there way to specify that an imported module not use any optimisation
> level, or to always use a specific optimisation level?
>

Not without a custom loader.


>
> Is there a way to run tests with all the different optimisation levels?
>

You have to remember you can't change the optimization levels of the
interpreter once you have started it up. The change in semantics is handled
deep in the AST compiler and there is no exposed way to flip-flop the
setting once the interpreter starts. So to test the different optimization
levels would require either (a) implementing the optimizations are part of
some AST optimizer and doing the right thing in terms of reloading the
modules, or (b) simply running the tests again by running the interpreter
again with different flags (this is when something like tox is useful).
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 488: elimination of PYO files

2015-03-07 Thread Scott Dial
On 2015-03-06 11:34 AM, Brett Cannon wrote:
> This PEP proposes eliminating the concept of PYO files from Python.
> To continue the support of the separation of bytecode files based on
> their optimization level, this PEP proposes extending the PYC file
> name to include the optimization level in bytecode repository
> directory (i.e., the ``__pycache__`` directory).

As a packager, this PEP is a bit silent on it's expectations about what
will happen with (for instance) Debian and Fedora packages for Python.
My familiarity is with Fedora, and on that platform, we ship .pyc and
.pyo files (using -O for the .pyo). Is it your expectation that such
platforms will still distribute -O only? Or also -OO? In my world, all
of the __pycache__ directories are owned by root.

-- 
Scott Dial
[email protected]
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Asking for review for Windows issues 21518 and 22080

2015-03-07 Thread Claudiu Popa
Hello,


The winreg module has a function for loading a registry key under
another registry key, called winreg.LoadKey. Unfortunately, the module
doesn't provide a way to unload that key after the user finishes
operating with it. There's a patch [1] for exporting the RegUnloadKey
[2] API in winreg module as winreg.UnloadKey, similar to how
RegLoadKey is exported as winreg.LoadKey. The patch is helped by
another one [3],  which provides a new module,
test.support.windows_helper, for handling various issues on the
Windows platform, such as acquiring or releasing a privilege.
Unfortunately, it seems there's a dearth of reviewers for this
platform. Could someone knowledgeable with Windows be so kind to
review these patches?
They could make a good addition for Python 3.5.

Thank you very much.


[1] http://bugs.python.org/issue21518 - Expose RegUnloadKey in winreg
[2] 
https://msdn.microsoft.com/en-us/library/windows/desktop/ms724924%28v=vs.85%29.aspx
[3] http://bugs.python.org/issue22080 - Add windows_helper module helper


/Claudiu
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Optimize binary insertion sort algorithm in Timsort.

2015-03-07 Thread nha pham
This describes an optimization for "binary insertion sort" (BINSORT for
short).

BINSORT has been implemented in Python, CyThon, and Timsort (the default
Array.sort() in JAVA SE 7 and JAVA SE 8)
I have read the BINSORT in Timsort

http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/util/TimSort.java#TimSort.binarySort%28java.lang.Object%5B%5D%2Cint%2Cint%2Cint%2Cjava.util.Comparator%29
and I think that I can make a little more optimization.

=
The old BINSORT:
The basic idea is to use binary search on the sorted list to find a final
position
for a new element X, then insert X to the sorted list.
 [SORTED_LIST], [X in UNSORTED_LIST]  // pick X in UNSORTED_LIST
index = binarySeach([SORTED_LIST]) // use binary search to find appropriate
location for X in SORTED_LIST
[SORTED_LIST].add(index, X) // insert X to index location

==
New BINSORT:
[SORTED_LIST], [A]  // A is an UNSORTED_LIST
j = compare(A[i], A[i+1]) // pick A[i], compare to next element
index = binarySeach([SORTED_LIST], A[i])  // use binary search to find
  // appropriate location for A[i] in SORTED_LIST, and remember the index
[SORTED_LIST].add(index, A[i]) // insert A[i] to index location
 // Now for A[+1], where already know where it should be, compare to A[i]
if j >= 0:
// A[i] > A[i+1], so A[i+1] will be in the right side of A[i]
// We only have to search on a reduced Array:
index = binarySearch(SORTED_LIST[index : length(SORTED_LIST)], A[i+1])
else:
// we search on the left side of of A[i]
index = binarySearch(SORTED_LIST[0 : index], A[i+1])
[SORTED_LIST].add(index, A[i+1]) // insert A[i+1] to index location
 //repeat the loop
==
Run test.
Intuitively, new BINSORT will be better if the Array become bigger, because
it reduce the array to search with the cost of only 1 more comparison.

I only care about small array, with the length < 100 (we have known that in
Timsort, the
list is divided to chunks with length 64, then apply BINSORT to them).

So I make a big Array, divide them, and apply new BINSORT in each chunk,
then compare to OLD BINSORT.
The source code is in the bottom of this document. Here is the results:

cpuinfo:
model name  : Intel(R) Core(TM) i3 CPU   M 350  @ 2.27GHz
stepping: 2
microcode   : 0xc
cpu MHz : 933.000
cache size  : 3072 KB
-

random array:
ARRAY_SIZE: 100
CHUNK_SIZE: 100
DATA: randint(0, 100)

OLD BINSORT: 81.45754
new BINSORT: 5.26754
RATIO: (OLD - new) / new = 14.464
---
incremental array:
ARRAY_SIZE: 100
CHUNK_SIZE: 100
DATA: range(0, 100)

OLD BINSORT: 81.87927
new BINSORT: 5.41651
RATIO: (OLD - new) / new = 14.11659
---
decremental array:
ARRAY_SIZE: 100
CHUNK_SIZE: 100
DATA: range(0, 100)

OLD BINSORT: 81.45723
new BINSORT: 5.09823
RATIO: (OLD - new) / new = 14.97753

all equal array:
ARRAY_SIZE: 100
CHUNK_SIZE: 100
DATA: 5

OLD BINSORT: 40.46027
new BINSORT: 5.41221
RATIO: (OLD - new) / new = 6.47573


What should we do next:
- Tuning my test code (I have just graphed it).
- Test other cases, and bigger array (my laptop cannot handle array more
than 10^6.)
- Modifying Timsort in java.util. and test if it is better.


My test code, written by Python.

from timeit import Timer

setup ="""\
import bisect
from random import randint
from timeit import Timer
SIZE = 100
CHUNK = 100
NUM_CHUNK = SIZE/CHUNK
data = []
data2 = []
data3 = []
for i in range(0,SIZE):
data.append(randint(0,100))
#data.append(i)
#data = data[::-1]
"""
sample ="""\
for j in range(0,NUM_CHUNK):
low =  CHUNK*j
high=  low + CHUNK
data2.append(data[low])
index = low
for i in range(low,high):
x = data[i]
index = bisect.bisect_right(data2[low:], x, low, len(data2) - low-1)
data2.insert(index, x)
"""

new ="""\
for j in range(0,NUM_CHUNK):
low =  CHUNK*j
high=  low + CHUNK
data3.append(data[low])
index = low
for i in range(low,high):
x = data[i]
if x >= data[i-1]:
index = bisect.bisect_right(data3[low:len(data3) - low-1], x,
index, len(data3) - low-1)
else:
index = bisect.bisect_right(data3[low:index], x, low, index)
data3.insert(index, x)
"""
t2 = Timer(stmt = sample, setup=setup)
a = t2.timeit(1)
print a
t3 = Timer(stmt = new, setup=setup)
b = t3.timeit(1)
print b
print (str((a - b)/b))



Nha Pham
Mar 07 2015
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 488: elimination of PYO files

2015-03-07 Thread Brett Cannon
On Sat, Mar 7, 2015 at 12:39 PM Scott Dial 
wrote:

> On 2015-03-06 11:34 AM, Brett Cannon wrote:
> > This PEP proposes eliminating the concept of PYO files from Python.
> > To continue the support of the separation of bytecode files based on
> > their optimization level, this PEP proposes extending the PYC file
> > name to include the optimization level in bytecode repository
> > directory (i.e., the ``__pycache__`` directory).
>
> As a packager, this PEP is a bit silent on it's expectations about what
> will happen with (for instance) Debian and Fedora packages for Python.
> My familiarity is with Fedora, and on that platform, we ship .pyc and
> .pyo files (using -O for the .pyo). Is it your expectation that such
> platforms will still distribute -O only? Or also -OO? In my world, all
> of the __pycache__ directories are owned by root.
>

I assume they will generate all .pyc files at all levels, but I don't if
it's my place to dictate such a thing since bytecode files are an
optimization to Python itself and do not influence how people interact with
the interpreter like with PEP 394 (The "python" Command on Unix-Like
Systems).
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] PEP 471 Final: os.scandir() merged into Python 3.5

2015-03-07 Thread Victor Stinner
Hi,

FYI I commited the implementation of os.scandir() written by Ben Hoyt.
I hope that it will be part of Python 3.5 alpha 2 (Ben just sent the
final patch today).

Please test this new feature. You may benchmark here.
http://bugs.python.org/issue22524 contains some benchmark tools and
benchmark results of older versions of the patch.

The implementation was tested on Windows and Linux. I'm now watching
for buildbots to see how other platforms like os.scandir().

Bad news: OpenIndiana doesn't support d_type: the dirent structure has
no d_type field. I already fixed the implementation to support this
case. os.scandir() is still useful on OpenIndiana, because the stat
result is cached in a DirEntry, so only one syscall is required,
instead of multiple, when multiple DirEntry methods are called (ex:
entry.is_dir() and not entry.is_symlink()).

Victor
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 Final: os.scandir() merged into Python 3.5

2015-03-07 Thread Ben Hoyt
Thanks for committing this, Victor! And fixing the d_type issue on funky
platforms.

Others: if you want to benchmark this, the simplest way is to use my
os.walk() benchmark.py test program here: https://github.com/benhoyt/scandir
-- it compares the built-in os.walk() implemented with os.listdir() with a
version of walk() implemented with os.scandir(). I see huge gains on
Windows (12-50x) and modest gains on my Linux VM (3-5x).

Note that the actual CPython version of os.walk() doesn't yet use
os.scandir(). I intend to open a separate issue for that shortly (or Victor
can). But that part should be fairly straight-forward, as I already have a
version available in my GitHub project.

-Ben


On Sat, Mar 7, 2015 at 9:13 PM, Victor Stinner 
wrote:

> Hi,
>
> FYI I commited the implementation of os.scandir() written by Ben Hoyt.
> I hope that it will be part of Python 3.5 alpha 2 (Ben just sent the
> final patch today).
>
> Please test this new feature. You may benchmark here.
> http://bugs.python.org/issue22524 contains some benchmark tools and
> benchmark results of older versions of the patch.
>
> The implementation was tested on Windows and Linux. I'm now watching
> for buildbots to see how other platforms like os.scandir().
>
> Bad news: OpenIndiana doesn't support d_type: the dirent structure has
> no d_type field. I already fixed the implementation to support this
> case. os.scandir() is still useful on OpenIndiana, because the stat
> result is cached in a DirEntry, so only one syscall is required,
> instead of multiple, when multiple DirEntry methods are called (ex:
> entry.is_dir() and not entry.is_symlink()).
>
> Victor
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 Final: os.scandir() merged into Python 3.5

2015-03-07 Thread Victor Stinner
2015-03-08 3:31 GMT+01:00 Ben Hoyt :
> Thanks for committing this, Victor! And fixing the d_type issue on funky
> platforms.

You're welcome.

> Note that the actual CPython version of os.walk() doesn't yet use
> os.scandir(). I intend to open a separate issue for that shortly (or Victor
> can). But that part should be fairly straight-forward, as I already have a
> version available in my GitHub project.

Yes, I just opened an issue for os.walk():
http://bugs.python.org/issue23605

We need a patch and benchmarks on Linux and Windows for that
(including benchmarks on a NFS share for the Linux case).

I changed the status of the PEP 471 to Final even if os.walk() was not
modified yet. IMO the most important part was os.scandir() since
"os.scandir()" is in the title of the PEP 471.

Victor
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 471 Final: os.scandir() merged into Python 3.5

2015-03-07 Thread Ryan Stuart
Hi,

On Sun, 8 Mar 2015 at 12:33 Ben Hoyt  wrote:

> Others: if you want to benchmark this, the simplest way is to use my
> os.walk() benchmark.py test program here:
> https://github.com/benhoyt/scandir -- it compares the built-in os.walk()
> implemented with os.listdir() with a version of walk() implemented with
> os.scandir(). I see huge gains on Windows (12-50x) and modest gains on my
> Linux VM (3-5x).
>

I have a MacBook Pro Laptop running OS X 10.10.2. I did the following:

   - hg update -r 8ef4f75a8018
   - patch -p1 < scandir-8.patch
   - ./configure --with-pydebug && make -j7

I then ran ./python.exe ~/Workspace/python/scandir/benchmark.py and I got:

*Creating tree at /Users/rstuart/Workspace/python/scandir/benchtree:
depth=4, num_dirs=5, num_files=50*
*Using slower ctypes version of scandir*
*Comparing against builtin version of os.walk()*
*Priming the system's cache...*
*Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree,
repeat 1/3...*
*Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree,
repeat 2/3...*
*Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree,
repeat 3/3...*
*os.walk took 0.184s, scandir.walk took 0.158s -- 1.2x as fast*

I then did ./python.exe ~/Workspace/python/scandir/benchmark.py -s and got:

*Using slower ctypes version of scandir*
*Comparing against builtin version of os.walk()*
*Priming the system's cache...*
*Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree,
repeat 1/3...*
*Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree,
repeat 2/3...*
*Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree,
repeat 3/3...*
*os.walk size 23400, scandir.walk size 23400 -- equal*
*os.walk took 0.483s, scandir.walk took 0.463s -- 1.0x as fast*

Hope this helps.

Cheers

Note that the actual CPython version of os.walk() doesn't yet use
> os.scandir(). I intend to open a separate issue for that shortly (or Victor
> can). But that part should be fairly straight-forward, as I already have a
> version available in my GitHub project.
>
> -Ben
>
>
> On Sat, Mar 7, 2015 at 9:13 PM, Victor Stinner 
> wrote:
>
>> Hi,
>>
>> FYI I commited the implementation of os.scandir() written by Ben Hoyt.
>> I hope that it will be part of Python 3.5 alpha 2 (Ben just sent the
>> final patch today).
>>
>> Please test this new feature. You may benchmark here.
>> http://bugs.python.org/issue22524 contains some benchmark tools and
>> benchmark results of older versions of the patch.
>>
>> The implementation was tested on Windows and Linux. I'm now watching
>> for buildbots to see how other platforms like os.scandir().
>>
>> Bad news: OpenIndiana doesn't support d_type: the dirent structure has
>> no d_type field. I already fixed the implementation to support this
>> case. os.scandir() is still useful on OpenIndiana, because the stat
>> result is cached in a DirEntry, so only one syscall is required,
>> instead of multiple, when multiple DirEntry methods are called (ex:
>> entry.is_dir() and not entry.is_symlink()).
>>
>> Victor
>> ___
>> Python-Dev mailing list
>> [email protected]
>> https://mail.python.org/mailman/listinfo/python-dev
>>
> Unsubscribe:
>> https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com
>>
>
> ___
> Python-Dev mailing list
> [email protected]
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: https://mail.python.org/mailman/options/python-dev/
> ryan.stuart.85%40gmail.com
>
___
Python-Dev mailing list
[email protected]
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com