Re: [Python-Dev] PEP 488: elimination of PYO files
On Fri, Mar 06, 2015 at 08:00:20PM -0500, Ron Adam wrote: > Have you considered doing this by having different magic numbers in the > .pyc file for standard, -O, and -O0 compiled bytecode files? Python > already checks that number and recompiles the files if it's not what it's > expected to be. And it wouldn't require any naming conventions or new > cache directories. It seems to me it would be much easier to do as well. And it would fail to solve the problem. The problem isn't just that the .pyo file can contain the wrong byte-code for the optimization level, that's only part of the problem. Another issue is that you cannot have pre-compiled byte-code for multiple different optimization levels. You can have a "no optimization" byte-code file, the .pyc file, but only one "optimized" byte-code file at the same time. Brett's proposal will allow -O optimized and -OO optimized byte-code files to co-exist, as well as setting up a clear naming convention for future optimizers in either the Python compiler or third-party optimizers. No new cache directories are needed. The __pycache__ directory has been used since Python 3.3 (or was it 3.2? I forget which). -- Steve ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 488: elimination of PYO files
On 03/07/2015 04:58 AM, Steven D'Aprano wrote: On Fri, Mar 06, 2015 at 08:00:20PM -0500, Ron Adam wrote: >Have you considered doing this by having different magic numbers in the >.pyc file for standard, -O, and -O0 compiled bytecode files? Python >already checks that number and recompiles the files if it's not what it's >expected to be. And it wouldn't require any naming conventions or new >cache directories. It seems to me it would be much easier to do as well. And it would fail to solve the problem. The problem isn't just that the .pyo file can contain the wrong byte-code for the optimization level, that's only part of the problem. Another issue is that you cannot have pre-compiled byte-code for multiple different optimization levels. You can have a "no optimization" byte-code file, the .pyc file, but only one "optimized" byte-code file at the same time. Brett's proposal will allow -O optimized and -OO optimized byte-code files to co-exist, as well as setting up a clear naming convention for future optimizers in either the Python compiler or third-party optimizers. So all the different versions can be generated ahead of time. I think that is the main difference. My suggestion would cause a recompile of all dependent python files when different optimisation levels are used in different projects. Which may be worse than not generating bytecode files at all. OK A few questions... Can a submodule use an optimazation level that is different from the file that imports it? (Other than the case this is trying to solve.) Is there way to specify that an imported module not use any optimisation level, or to always use a specific optimisation level? Is there a way to run tests with all the different optimisation levels? Cheers, Ron ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 488: elimination of PYO files
On Sat, Mar 7, 2015 at 9:29 AM Ron Adam wrote: > > > On 03/07/2015 04:58 AM, Steven D'Aprano wrote: > > On Fri, Mar 06, 2015 at 08:00:20PM -0500, Ron Adam wrote: > > > >> >Have you considered doing this by having different magic numbers in the > >> >.pyc file for standard, -O, and -O0 compiled bytecode files? Python > >> >already checks that number and recompiles the files if it's not what > it's > >> >expected to be. And it wouldn't require any naming conventions or new > >> >cache directories. It seems to me it would be much easier to do as > well. > > And it would fail to solve the problem. The problem isn't just that the > > .pyo file can contain the wrong byte-code for the optimization level, > > that's only part of the problem. Another issue is that you cannot have > > pre-compiled byte-code for multiple different optimization levels. You > > can have a "no optimization" byte-code file, the .pyc file, but only one > > "optimized" byte-code file at the same time. > > > > Brett's proposal will allow -O optimized and -OO optimized byte-code > > files to co-exist, as well as setting up a clear naming convention for > > future optimizers in either the Python compiler or third-party > > optimizers. > > So all the different versions can be generated ahead of time. I think that > is the main difference. > > My suggestion would cause a recompile of all dependent python files when > different optimisation levels are used in different projects. Which may be > worse than not generating bytecode files at all. OK > > > A few questions... > > Can a submodule use an optimazation level that is different from the file > that imports it? (Other than the case this is trying to solve.) > Currently yes, with this PEP no (without purposefully doing it with some custom loader). > > Is there way to specify that an imported module not use any optimisation > level, or to always use a specific optimisation level? > Not without a custom loader. > > Is there a way to run tests with all the different optimisation levels? > You have to remember you can't change the optimization levels of the interpreter once you have started it up. The change in semantics is handled deep in the AST compiler and there is no exposed way to flip-flop the setting once the interpreter starts. So to test the different optimization levels would require either (a) implementing the optimizations are part of some AST optimizer and doing the right thing in terms of reloading the modules, or (b) simply running the tests again by running the interpreter again with different flags (this is when something like tox is useful). ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 488: elimination of PYO files
On 2015-03-06 11:34 AM, Brett Cannon wrote: > This PEP proposes eliminating the concept of PYO files from Python. > To continue the support of the separation of bytecode files based on > their optimization level, this PEP proposes extending the PYC file > name to include the optimization level in bytecode repository > directory (i.e., the ``__pycache__`` directory). As a packager, this PEP is a bit silent on it's expectations about what will happen with (for instance) Debian and Fedora packages for Python. My familiarity is with Fedora, and on that platform, we ship .pyc and .pyo files (using -O for the .pyo). Is it your expectation that such platforms will still distribute -O only? Or also -OO? In my world, all of the __pycache__ directories are owned by root. -- Scott Dial [email protected] ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Asking for review for Windows issues 21518 and 22080
Hello, The winreg module has a function for loading a registry key under another registry key, called winreg.LoadKey. Unfortunately, the module doesn't provide a way to unload that key after the user finishes operating with it. There's a patch [1] for exporting the RegUnloadKey [2] API in winreg module as winreg.UnloadKey, similar to how RegLoadKey is exported as winreg.LoadKey. The patch is helped by another one [3], which provides a new module, test.support.windows_helper, for handling various issues on the Windows platform, such as acquiring or releasing a privilege. Unfortunately, it seems there's a dearth of reviewers for this platform. Could someone knowledgeable with Windows be so kind to review these patches? They could make a good addition for Python 3.5. Thank you very much. [1] http://bugs.python.org/issue21518 - Expose RegUnloadKey in winreg [2] https://msdn.microsoft.com/en-us/library/windows/desktop/ms724924%28v=vs.85%29.aspx [3] http://bugs.python.org/issue22080 - Add windows_helper module helper /Claudiu ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] Optimize binary insertion sort algorithm in Timsort.
This describes an optimization for "binary insertion sort" (BINSORT for short). BINSORT has been implemented in Python, CyThon, and Timsort (the default Array.sort() in JAVA SE 7 and JAVA SE 8) I have read the BINSORT in Timsort http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/8-b132/java/util/TimSort.java#TimSort.binarySort%28java.lang.Object%5B%5D%2Cint%2Cint%2Cint%2Cjava.util.Comparator%29 and I think that I can make a little more optimization. = The old BINSORT: The basic idea is to use binary search on the sorted list to find a final position for a new element X, then insert X to the sorted list. [SORTED_LIST], [X in UNSORTED_LIST] // pick X in UNSORTED_LIST index = binarySeach([SORTED_LIST]) // use binary search to find appropriate location for X in SORTED_LIST [SORTED_LIST].add(index, X) // insert X to index location == New BINSORT: [SORTED_LIST], [A] // A is an UNSORTED_LIST j = compare(A[i], A[i+1]) // pick A[i], compare to next element index = binarySeach([SORTED_LIST], A[i]) // use binary search to find // appropriate location for A[i] in SORTED_LIST, and remember the index [SORTED_LIST].add(index, A[i]) // insert A[i] to index location // Now for A[+1], where already know where it should be, compare to A[i] if j >= 0: // A[i] > A[i+1], so A[i+1] will be in the right side of A[i] // We only have to search on a reduced Array: index = binarySearch(SORTED_LIST[index : length(SORTED_LIST)], A[i+1]) else: // we search on the left side of of A[i] index = binarySearch(SORTED_LIST[0 : index], A[i+1]) [SORTED_LIST].add(index, A[i+1]) // insert A[i+1] to index location //repeat the loop == Run test. Intuitively, new BINSORT will be better if the Array become bigger, because it reduce the array to search with the cost of only 1 more comparison. I only care about small array, with the length < 100 (we have known that in Timsort, the list is divided to chunks with length 64, then apply BINSORT to them). So I make a big Array, divide them, and apply new BINSORT in each chunk, then compare to OLD BINSORT. The source code is in the bottom of this document. Here is the results: cpuinfo: model name : Intel(R) Core(TM) i3 CPU M 350 @ 2.27GHz stepping: 2 microcode : 0xc cpu MHz : 933.000 cache size : 3072 KB - random array: ARRAY_SIZE: 100 CHUNK_SIZE: 100 DATA: randint(0, 100) OLD BINSORT: 81.45754 new BINSORT: 5.26754 RATIO: (OLD - new) / new = 14.464 --- incremental array: ARRAY_SIZE: 100 CHUNK_SIZE: 100 DATA: range(0, 100) OLD BINSORT: 81.87927 new BINSORT: 5.41651 RATIO: (OLD - new) / new = 14.11659 --- decremental array: ARRAY_SIZE: 100 CHUNK_SIZE: 100 DATA: range(0, 100) OLD BINSORT: 81.45723 new BINSORT: 5.09823 RATIO: (OLD - new) / new = 14.97753 all equal array: ARRAY_SIZE: 100 CHUNK_SIZE: 100 DATA: 5 OLD BINSORT: 40.46027 new BINSORT: 5.41221 RATIO: (OLD - new) / new = 6.47573 What should we do next: - Tuning my test code (I have just graphed it). - Test other cases, and bigger array (my laptop cannot handle array more than 10^6.) - Modifying Timsort in java.util. and test if it is better. My test code, written by Python. from timeit import Timer setup ="""\ import bisect from random import randint from timeit import Timer SIZE = 100 CHUNK = 100 NUM_CHUNK = SIZE/CHUNK data = [] data2 = [] data3 = [] for i in range(0,SIZE): data.append(randint(0,100)) #data.append(i) #data = data[::-1] """ sample ="""\ for j in range(0,NUM_CHUNK): low = CHUNK*j high= low + CHUNK data2.append(data[low]) index = low for i in range(low,high): x = data[i] index = bisect.bisect_right(data2[low:], x, low, len(data2) - low-1) data2.insert(index, x) """ new ="""\ for j in range(0,NUM_CHUNK): low = CHUNK*j high= low + CHUNK data3.append(data[low]) index = low for i in range(low,high): x = data[i] if x >= data[i-1]: index = bisect.bisect_right(data3[low:len(data3) - low-1], x, index, len(data3) - low-1) else: index = bisect.bisect_right(data3[low:index], x, low, index) data3.insert(index, x) """ t2 = Timer(stmt = sample, setup=setup) a = t2.timeit(1) print a t3 = Timer(stmt = new, setup=setup) b = t3.timeit(1) print b print (str((a - b)/b)) Nha Pham Mar 07 2015 ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 488: elimination of PYO files
On Sat, Mar 7, 2015 at 12:39 PM Scott Dial wrote: > On 2015-03-06 11:34 AM, Brett Cannon wrote: > > This PEP proposes eliminating the concept of PYO files from Python. > > To continue the support of the separation of bytecode files based on > > their optimization level, this PEP proposes extending the PYC file > > name to include the optimization level in bytecode repository > > directory (i.e., the ``__pycache__`` directory). > > As a packager, this PEP is a bit silent on it's expectations about what > will happen with (for instance) Debian and Fedora packages for Python. > My familiarity is with Fedora, and on that platform, we ship .pyc and > .pyo files (using -O for the .pyo). Is it your expectation that such > platforms will still distribute -O only? Or also -OO? In my world, all > of the __pycache__ directories are owned by root. > I assume they will generate all .pyc files at all levels, but I don't if it's my place to dictate such a thing since bytecode files are an optimization to Python itself and do not influence how people interact with the interpreter like with PEP 394 (The "python" Command on Unix-Like Systems). ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] PEP 471 Final: os.scandir() merged into Python 3.5
Hi, FYI I commited the implementation of os.scandir() written by Ben Hoyt. I hope that it will be part of Python 3.5 alpha 2 (Ben just sent the final patch today). Please test this new feature. You may benchmark here. http://bugs.python.org/issue22524 contains some benchmark tools and benchmark results of older versions of the patch. The implementation was tested on Windows and Linux. I'm now watching for buildbots to see how other platforms like os.scandir(). Bad news: OpenIndiana doesn't support d_type: the dirent structure has no d_type field. I already fixed the implementation to support this case. os.scandir() is still useful on OpenIndiana, because the stat result is cached in a DirEntry, so only one syscall is required, instead of multiple, when multiple DirEntry methods are called (ex: entry.is_dir() and not entry.is_symlink()). Victor ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 Final: os.scandir() merged into Python 3.5
Thanks for committing this, Victor! And fixing the d_type issue on funky platforms. Others: if you want to benchmark this, the simplest way is to use my os.walk() benchmark.py test program here: https://github.com/benhoyt/scandir -- it compares the built-in os.walk() implemented with os.listdir() with a version of walk() implemented with os.scandir(). I see huge gains on Windows (12-50x) and modest gains on my Linux VM (3-5x). Note that the actual CPython version of os.walk() doesn't yet use os.scandir(). I intend to open a separate issue for that shortly (or Victor can). But that part should be fairly straight-forward, as I already have a version available in my GitHub project. -Ben On Sat, Mar 7, 2015 at 9:13 PM, Victor Stinner wrote: > Hi, > > FYI I commited the implementation of os.scandir() written by Ben Hoyt. > I hope that it will be part of Python 3.5 alpha 2 (Ben just sent the > final patch today). > > Please test this new feature. You may benchmark here. > http://bugs.python.org/issue22524 contains some benchmark tools and > benchmark results of older versions of the patch. > > The implementation was tested on Windows and Linux. I'm now watching > for buildbots to see how other platforms like os.scandir(). > > Bad news: OpenIndiana doesn't support d_type: the dirent structure has > no d_type field. I already fixed the implementation to support this > case. os.scandir() is still useful on OpenIndiana, because the stat > result is cached in a DirEntry, so only one syscall is required, > instead of multiple, when multiple DirEntry methods are called (ex: > entry.is_dir() and not entry.is_symlink()). > > Victor > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com > ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 Final: os.scandir() merged into Python 3.5
2015-03-08 3:31 GMT+01:00 Ben Hoyt : > Thanks for committing this, Victor! And fixing the d_type issue on funky > platforms. You're welcome. > Note that the actual CPython version of os.walk() doesn't yet use > os.scandir(). I intend to open a separate issue for that shortly (or Victor > can). But that part should be fairly straight-forward, as I already have a > version available in my GitHub project. Yes, I just opened an issue for os.walk(): http://bugs.python.org/issue23605 We need a patch and benchmarks on Linux and Windows for that (including benchmarks on a NFS share for the Linux case). I changed the status of the PEP 471 to Final even if os.walk() was not modified yet. IMO the most important part was os.scandir() since "os.scandir()" is in the title of the PEP 471. Victor ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 471 Final: os.scandir() merged into Python 3.5
Hi, On Sun, 8 Mar 2015 at 12:33 Ben Hoyt wrote: > Others: if you want to benchmark this, the simplest way is to use my > os.walk() benchmark.py test program here: > https://github.com/benhoyt/scandir -- it compares the built-in os.walk() > implemented with os.listdir() with a version of walk() implemented with > os.scandir(). I see huge gains on Windows (12-50x) and modest gains on my > Linux VM (3-5x). > I have a MacBook Pro Laptop running OS X 10.10.2. I did the following: - hg update -r 8ef4f75a8018 - patch -p1 < scandir-8.patch - ./configure --with-pydebug && make -j7 I then ran ./python.exe ~/Workspace/python/scandir/benchmark.py and I got: *Creating tree at /Users/rstuart/Workspace/python/scandir/benchtree: depth=4, num_dirs=5, num_files=50* *Using slower ctypes version of scandir* *Comparing against builtin version of os.walk()* *Priming the system's cache...* *Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree, repeat 1/3...* *Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree, repeat 2/3...* *Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree, repeat 3/3...* *os.walk took 0.184s, scandir.walk took 0.158s -- 1.2x as fast* I then did ./python.exe ~/Workspace/python/scandir/benchmark.py -s and got: *Using slower ctypes version of scandir* *Comparing against builtin version of os.walk()* *Priming the system's cache...* *Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree, repeat 1/3...* *Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree, repeat 2/3...* *Benchmarking walks on /Users/rstuart/Workspace/python/scandir/benchtree, repeat 3/3...* *os.walk size 23400, scandir.walk size 23400 -- equal* *os.walk took 0.483s, scandir.walk took 0.463s -- 1.0x as fast* Hope this helps. Cheers Note that the actual CPython version of os.walk() doesn't yet use > os.scandir(). I intend to open a separate issue for that shortly (or Victor > can). But that part should be fairly straight-forward, as I already have a > version available in my GitHub project. > > -Ben > > > On Sat, Mar 7, 2015 at 9:13 PM, Victor Stinner > wrote: > >> Hi, >> >> FYI I commited the implementation of os.scandir() written by Ben Hoyt. >> I hope that it will be part of Python 3.5 alpha 2 (Ben just sent the >> final patch today). >> >> Please test this new feature. You may benchmark here. >> http://bugs.python.org/issue22524 contains some benchmark tools and >> benchmark results of older versions of the patch. >> >> The implementation was tested on Windows and Linux. I'm now watching >> for buildbots to see how other platforms like os.scandir(). >> >> Bad news: OpenIndiana doesn't support d_type: the dirent structure has >> no d_type field. I already fixed the implementation to support this >> case. os.scandir() is still useful on OpenIndiana, because the stat >> result is cached in a DirEntry, so only one syscall is required, >> instead of multiple, when multiple DirEntry methods are called (ex: >> entry.is_dir() and not entry.is_symlink()). >> >> Victor >> ___ >> Python-Dev mailing list >> [email protected] >> https://mail.python.org/mailman/listinfo/python-dev >> > Unsubscribe: >> https://mail.python.org/mailman/options/python-dev/benhoyt%40gmail.com >> > > ___ > Python-Dev mailing list > [email protected] > https://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: https://mail.python.org/mailman/options/python-dev/ > ryan.stuart.85%40gmail.com > ___ Python-Dev mailing list [email protected] https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
