[issue10302] Add class-functions to hash many small objects with hashlib
New submission from Lukas Lueg : The objects provided by hashlib mainly serve the purpose of computing hashes over strings of arbitrary size. The user gets a new object (e.g. hashlib.sha1()), calls .update() with chunks of data and then finally uses .digest() or .hexdigest() to get the hash. For convenience reasons these steps can also be done in almost one step (e.g. hashlib.sha1('foobar').hexdigest()). While the above approach basically covers all use-cases for hash-functions, when computing hashes of many small strings it is yet inefficient (e.g. due to interpreter-overhead) and leaves out the possibility for performance improvements. There are many cases where we need the hashes of numerous (small) objects, most or all of which being available in memory at the same time. I therefor propose to extend the classes provided by hashlib with an additional function that takes an iterable object, computes the hash over the string representation of each member and returns the result. Due to the aim of this interface, the function is a member of the class (not the instance) and has therefor no state bound to an instance. Memory requirements are to be anticipated and met by the programmer. For example: foo = ['my_database_key1', 'my_database_key2'] hashlib.sha1.compute(foo) >> ('\x00\x00', '\xff\xff') I consider this interface to hashlib particular useful, as we can take advantage of vector-based implementations that compute multiple hashes in one pass (e.g. through SSE2). GCC has a vector-extension that provides a *somewhat* standard way to write code that can get compiled to SSE2 or similar machine code. Examples of vector-based implementations of SHA1 and MD5 can be found at https://code.google.com/p/pyrit/issues/detail?id=207 Contigency plan: We compile to code iterating over OpenSSL's EVP-functions if compiler is other than GCC or SSE2 is not available. The same approach can be used to cover hashlib-objects for which we don't have an optimized implementation. -- components: Library (Lib) messages: 120351 nosy: ebfe priority: normal severity: normal status: open title: Add class-functions to hash many small objects with hashlib type: feature request versions: Python 3.2, Python 3.3 ___ Python tracker <http://bugs.python.org/issue10302> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10302] Add class-functions to hash many small objects with hashlib
Lukas Lueg added the comment: Thanks for your comment; it is a very valid point to consider. However, as a vector-based implementation is roughly three to four times faster than what the current API can provide by design (reduced overhead and GIL-relaxation not included), I may disagree with it. I'm willing to make a proposal (read: patch) if you and the other overlords have enough confidence in this API-change and it has a chance to get submitted. -- ___ Python tracker <http://bugs.python.org/issue10302> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue11655] map() must not swallow exceptions from PyObject_GetIter
New submission from Lukas Lueg : The built-in function map() currently swallows any exception that might have occured while trying to get an iterator from any parameter. This produces unexpected behaviour for applications that require a certain type of exception to be raised when __iter__() is called on their objects. >From 24179f82b7de, inside map_new(): 973 /* Get iterator. */ 974 curseq = PyTuple_GetItem(args, i+1); 975 sqp->it = PyObject_GetIter(curseq); 976 if (sqp->it == NULL) { 977 static char errmsg[] = 978 "argument %d to map() must support iteration"; 979 char errbuf[sizeof(errmsg) + 25]; 980 PyOS_snprintf(errbuf, sizeof(errbuf), errmsg, i+2); 981 PyErr_SetString(PyExc_TypeError, errbuf); 982 goto Fail_2; 983 } We *must* check if there has been any other kind of exception already being set when returning from PyObject_GetIter before setting PyExc_TypeError in line 981. If there is none, it is ok to raise a TypeError; any other exception must be passed on. For example: raising TooManyCacheMissesException in __iter__() causes map(foobar, myobject) to raise TypeError instead of TooManyCacheMissesException. Workaround: use map(foobar, iter(myobject)). The explicit call to iter will either produce an iterator object (which returns self to map()) or raises the correct exception. Python 3 is not affected as map_new() does not throw it's own TypeError in case PyObject_GetIter() fails. -- components: None messages: 131932 nosy: ebfe priority: normal severity: normal status: open title: map() must not swallow exceptions from PyObject_GetIter versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue11655> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9285] A decorator for cProfile and profile modules
Lukas Lueg added the comment: +1 -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue9285> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10576] Add a progress callback to gcmodule
Lukas Lueg added the comment: Why not make the start-callback be able to return a boolean value to the gcmodule that indicates if garbage collection should take place or not. For example, any value returned from the callback that evaluates to False (like null) will cause the module to evaluate any other callback and possibly collect garbage objects. Any value that evaluates to True (like True) returned from any callback causes all further callbacks to not be called and garbage collection not to take place now. -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue10576> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10576] Add a progress callback to gcmodule
Lukas Lueg added the comment: Collection may re-occur at any time, there is no promise to the callback code. However, the callback can disable the gc, preventing further collection. I don't think we need the other callbacks to be informed. As the callbacks are worked down in the order they registered, whoever comes first is served first. Returning True from the callback is mereley a "I dont mind if gc happens now..." -- ___ Python tracker <http://bugs.python.org/issue10576> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10576] Add a progress callback to gcmodule
Lukas Lueg added the comment: Agreed, let's have the simple callback first. To solve 2) later on, we could have the callback proposed here be the 'execution'-callback. It neither has nor will have the capability to prevent garbage-collection. We can introduce another 'prepare'-callback later which is called when the gc-modules decides that it is time for collection. Callbacks may react with a negative value so execution does not happen and the execution-callbacks are also never called. -- ___ Python tracker <http://bugs.python.org/issue10576> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1501108] Add write buffering to gzip
Lukas Lueg added the comment: agreed -- ___ Python tracker <http://bugs.python.org/issue1501108> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4818] Patch for thread-support in md5module.c
Lukas Lueg added the comment: Sent the form by fax ___ Python tracker <http://bugs.python.org/issue4818> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4818] Patch for thread-support in md5module.c
Lukas Lueg added the comment: fixed naming, lock get's tried before releasing the gil to wait for it Added file: http://bugs.python.org/file12568/md5module_small_locks-2.diff ___ Python tracker <http://bugs.python.org/issue4818> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4818] Patch for thread-support in md5module.c
Changes by Lukas Lueg : Removed file: http://bugs.python.org/file12565/md5module_small_locks.diff ___ Python tracker <http://bugs.python.org/issue4818> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4821] Patches for thread-support in built-in SHA modules
New submission from Lukas Lueg : Here is the follow-up to issue #4818. The patches attached allow the built-in SHA modules to release the GIL. Also the build-in SHA modules will now no longer accept "s#" as input. Input is parsed just as in the openssl-driven classes where unicode-objects are explicitly rejected. The built-in hash modules have been not quite beautiful before even more code is now copy & pasted between them. Is there any interest in refactoring all those modules? AFAIK _sha1 and such are only used by hashlib.py ... -- messages: 78975 nosy: ebfe severity: normal status: open title: Patches for thread-support in built-in SHA modules ___ Python tracker <http://bugs.python.org/issue4821> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4821] Patches for thread-support in built-in SHA modules
Changes by Lukas Lueg : -- keywords: +patch Added file: http://bugs.python.org/file12569/sha1module_small_locks.diff ___ Python tracker <http://bugs.python.org/issue4821> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4821] Patches for thread-support in built-in SHA modules
Changes by Lukas Lueg : Added file: http://bugs.python.org/file12570/sha256module_small_locks.diff ___ Python tracker <http://bugs.python.org/issue4821> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4821] Patches for thread-support in built-in SHA modules
Changes by Lukas Lueg : Added file: http://bugs.python.org/file12571/sha512module_small_locks.diff ___ Python tracker <http://bugs.python.org/issue4821> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
Lukas Lueg added the comment: The lock is created while having the GIL in EVP_update. No other function releases the GIL (besides the creator-function which does not need the local lock). Thereby no other thread can be in between ENTER and LEAVE while the lock is allocated. ___ Python tracker <http://bugs.python.org/issue4751> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
Changes by Lukas Lueg : Removed file: http://bugs.python.org/file12533/hashopenssl_threads-4.diff ___ Python tracker <http://bugs.python.org/issue4751> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
Lukas Lueg added the comment: I've modified haypo's patch as commented. The object's lock should be free 99.9% of the time so we try non-blocking first and can thereby skip releasing and re-locking the gil (to avoid a deadlock). Added file: http://bugs.python.org/file12587/hashlibopenssl_small_lock-4.diff ___ Python tracker <http://bugs.python.org/issue4751> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4757] reject unicode in zlib
Lukas Lueg added the comment: The current behaviour may help the majority by ignorance and cause weird errors for others. We tell people that Python distincts between Text and Data but actually treat it all the same by implicit encoding. Modules that only operate on Bytes should reject Unicode-objects in Python3; it's a matter of 3 lines to display a warning in Python 2. Those modules that usually operate on Text but have single functions that operate on Bytes should display a warning but not enforce explicit encoding. Also see #4821 and #4818 where unicode already got rejected by the openssl-driven classes but silently accepted by the build-in ones. ___ Python tracker <http://bugs.python.org/issue4757> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4844] ZipFile doesn't range check in _EndRecData()
Lukas Lueg added the comment: please attach 64times01-double.zip if possible -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue4844> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3745] _sha256 et al. encode to UTF-8 by default
Lukas Lueg added the comment: solved in #4818 and #4821 -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue3745> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
Changes by Lukas Lueg : Removed file: http://bugs.python.org/file12587/hashlibopenssl_small_lock-4.diff ___ Python tracker <http://bugs.python.org/issue4751> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
Lukas Lueg added the comment: PyThread_allocate_lock can fail without interference. object->lock will stay NULL and the GIL is simply not released. ___ Python tracker <http://bugs.python.org/issue4751> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4858] Deprecation of MD5
New submission from Lukas Lueg : MD5 is one of the most popular cryptographic hash-functions around, mainly for it's good performance and availability throughout applications and libraries. The MD5 algorithm is currently implemented in python as part of the hashlib-module and (in more general terms) as part of SSL in the ssl-module. However, concerns about the security of MD5 have risen during the last few years. In 2007 a practical attack to create collisions in the compression-function has been released and on 12/31/2008 US-CERT issued a note to warn about the general insecurity of MD5 (http://www.kb.cert.org/vuls/id/836068). I propose and strongly suggest to start deprecate direct support for MD5 during this year and completly remove support for it afterwards. * MD5 is a cryptographic hash function, it's reason for being is security. By means of current hardware and attack vectors it's a matter of hours to create collisions and fool MD5 hashes. The reason for being has come to an end. * Python runs an uncountable number of exposed user interfaces on the web. Usually the programmers rely on the security of the backing libraries. Python can't provide this with MD5. * The functionality of MD5 can be easily replaced by using other hashes that are supported by python (e.g. SHA1). They supply compareable performance but are not binary-compatible (yay). * Programmers use MD5 in python without the need for it's cryptographic attributes (e.g. creating unique indexes). Keeping MD5 for this use however devaluates overall security of python for the good of few. I'd like to start a discussion about this. Please keep in mind that - although MD5 is currently still very popular and python's support for it is justifed by demand - it's existence will come to an end soon. We should now act and give people time to update their implementations. In a rough cut: - Patch haslib to throw a DeprecationWarning, starting during the first half of 2009. - Update documentation not to use MD5 for security reasons - Remove MD5 from python in 2010. - Keep accordance to PEP 4 Goodbye MD5 and thanks for all the fish. -- components: Extension Modules messages: 79281 nosy: ebfe severity: normal status: open title: Deprecation of MD5 versions: Python 2.7, Python 3.1 ___ Python tracker <http://bugs.python.org/issue4858> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4858] Deprecation of MD5
Lukas Lueg added the comment: As I already said to Raymond: At least we should update the documentation. The "FAQ" currently linked is from 2005. The CERT-Advisory from provides a clean and simple language: "In 2008, researchers demonstrated the practical vulnerability [...] We are currently unaware of a practical solution to this problem. *Do not use the MD5 algorithm*." ___ Python tracker <http://bugs.python.org/issue4858> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4858] Deprecation of MD5
Lukas Lueg added the comment: > It might be a good idea to remove the word "secure" from the > hashlib documentation, since security of these algorithms is > always limited to a certain period of time. I'm sorry, was that a boy attempted humor ? [Misuse quote from DH3: Check] Anyway, in fact that might be a good idea: Reflect that the hashlib module includes hash functions for the sake of compatibility and interoperability and not everlasting security. ___ Python tracker <http://bugs.python.org/issue4858> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4871] zipfile can't decrypt
Lukas Lueg added the comment: This is basically the same problem as with other bytes-orientated modules. The choice is either to reject unicode and force the caller to use .encode() on all his strings (so 'password' is an instance of bytes and 'ch' an instance of int). I'd prefer that. Or to check if 'password' is a unicode-object and encode to default within zipfile. -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue4871> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4871] zipfile can't decrypt
Lukas Lueg added the comment: The default encoding is UTF8. Even without that there should be no problems while staying in the same environment as the character->translation is the same. However it violates the rule of least surprise to see zipfile throwing CRC errors because the password is something like 'u?è´´n' and encoded by some locale-changing-nitwit in korea... ___ Python tracker <http://bugs.python.org/issue4871> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
Lukas Lueg added the comment: I'll do a patch for 2.7 -- versions: +Python 2.7 -Python 3.1 ___ Python tracker <http://bugs.python.org/issue4751> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
Lukas Lueg added the comment: yes, I got lost on that one. I'll create a patch for 2.7 tonight. ___ Python tracker <http://bugs.python.org/issue4751> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
Lukas Lueg added the comment: Patch for 2.7 Added file: http://bugs.python.org/file13057/hashlibopenssl_gil_py27.diff ___ Python tracker <http://bugs.python.org/issue4751> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5246] test.test_hashlib.HashLibTestCase fails on darwin
Lukas Lueg added the comment: test_case_md5_0 (__main__.HashLibTestCase) ... ok test_case_md5_1 (__main__.HashLibTestCase) ... ok test_case_md5_2 (__main__.HashLibTestCase) ... ok test_case_md5_huge (__main__.HashLibTestCase) ... ok test_case_md5_uintmax (__main__.HashLibTestCase) ... ok test_case_sha1_0 (__main__.HashLibTestCase) ... ok test_case_sha1_1 (__main__.HashLibTestCase) ... ok test_case_sha1_2 (__main__.HashLibTestCase) ... ok test_case_sha1_3 (__main__.HashLibTestCase) ... ok test_case_sha224_0 (__main__.HashLibTestCase) ... ok test_case_sha224_1 (__main__.HashLibTestCase) ... ok test_case_sha224_2 (__main__.HashLibTestCase) ... ok test_case_sha224_3 (__main__.HashLibTestCase) ... ok test_case_sha256_0 (__main__.HashLibTestCase) ... ok test_case_sha256_1 (__main__.HashLibTestCase) ... ok test_case_sha256_2 (__main__.HashLibTestCase) ... ok test_case_sha256_3 (__main__.HashLibTestCase) ... ok test_case_sha384_0 (__main__.HashLibTestCase) ... ok test_case_sha384_1 (__main__.HashLibTestCase) ... ok test_case_sha384_2 (__main__.HashLibTestCase) ... ok test_case_sha384_3 (__main__.HashLibTestCase) ... ok test_case_sha512_0 (__main__.HashLibTestCase) ... ok test_case_sha512_1 (__main__.HashLibTestCase) ... ok test_case_sha512_2 (__main__.HashLibTestCase) ... ok test_case_sha512_3 (__main__.HashLibTestCase) ... ok test_hexdigest (__main__.HashLibTestCase) ... ok test_large_update (__main__.HashLibTestCase) ... ok test_no_unicode (__main__.HashLibTestCase) ... ok test_unknown_hash (__main__.HashLibTestCase) ... ok -- Ran 29 tests in 0.399s OK [22842 refs] mac-lueg:py27 llueg$ ./python.exe Python 2.7a0 (trunk:69584, Feb 13 2009, 15:12:58) [GCC 4.0.1 (Apple Inc. build 5484)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import hashlib [36860 refs] >>> hashlib.md5(u'spam') Traceback (most recent call last): File "", line 1, in TypeError: Unicode-objects must be encoded before hashing [36893 refs] ___ Python tracker <http://bugs.python.org/issue5246> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1533164] Installed but not listed *.pyo break bdist_rpm
Lukas Lueg added the comment: passing optimize=1 does not help when there is a script (...scripts=['bla.py']...) in the given distribution. The error will be thrown for bla.pyo and bla.pyc -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue1533164> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16427] Faster hash implementation
Lukas Lueg added the comment: I was investigating a callgrind dump of my code, showing how badly unicode_hash() was affecting my performance. Using google's cityhash instead of the builtin algorithm to hash unicode objects improves overall performance by about 15 to 20 percent for my case - that is quite a thing. Valgrind shows that the number of instructions spent by unicode_hash() drops from ~20% to ~11%. Amdahl crunches the two-fold performance increase to the mentioned 15 percent. Cityhash was chosen because of it's MIT license and advertisement for performance on short strings. I've now found this bug and attached a log for haypo's benchmark which compares native vs. cityhash. Caching was disabled during the test. Cityhash was compiled using -O3 -msse4.2 (cityhash uses cpu-native crc instructions). CPython's unittests fail due to known_hash and gdb output; besides that, everything else seems to work fine. Cityhash is advertised for it's performance with short strings, which does not seem to show in the benchmark. However, longer strings perform *much* better. If people are insterested, i can repeat the test on a armv7l -- nosy: +ebfe Added file: http://bugs.python.org/file30446/cityhash.txt ___ Python tracker <http://bugs.python.org/issue16427> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16427] Faster hash implementation
Lukas Lueg added the comment: It's a cache sitting between an informix db and and an internal web service. Stuff comes out of db, processed, json'ifed, cached and put on the wire. 10**6s of strings pass this process per request if uncached... I use CityHash64WithSeed, the seed being cpython's hash prefix (which I don't care about but found reassuring to put in anyway) -- ___ Python tracker <http://bugs.python.org/issue16427> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16427] Faster hash implementation
Lukas Lueg added the comment: Here are some benchmarks for a arm7l on a rk30-board. CityHash was compiled with -mcpu=native -O3. CityHash is around half as fast as the native algorithm for small strings and way, way slower on larger ones. My guess would be that the complex arithmetic in cityhash outweights the gains of better scheduling. The results are somewhat inconclusive, as the performance increases again for very long strings. -- Added file: http://bugs.python.org/file30447/cityhash_arm.txt ___ Python tracker <http://bugs.python.org/issue16427> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16427] Faster hash implementation
Lukas Lueg added the comment: The 10**4-case is an error (see insane %), I've never been able to reproduce. Having done more tests with fixed cpu frequency and other daemons' process priority reduced, cityhash always comes out much slower on arm7l. -- ___ Python tracker <http://bugs.python.org/issue16427> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16427] Faster hash implementation
Changes by Lukas Lueg : Added file: http://bugs.python.org/file30475/cityhash_fasthast3.txt ___ Python tracker <http://bugs.python.org/issue16427> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16427] Faster hash implementation
Lukas Lueg added the comment: Here are more benchmarks of vanilla 3.4 vs. cityhash vs. fast_hash_3 on both arm7l and x86-64. The patch was applied varbatim, only caching disabled. On arm7l, the cpu was fixed to maximum freq (it seems to take ages to switch frequencies, at least there is a lot of jitter with ondemand). The cityhash implementation was compiled with -O3 on both platforms and -msse4.2 on x86-64. CityHash and fh3 come out much better than vanilla on x86-64 with cityhash being slightly faster (which is surprising). On ARM7 CityHash performs much worse than vanilla and fh3 significantly better. -- Added file: http://bugs.python.org/file30474/cityhash_fashhash3_arm.txt ___ Python tracker <http://bugs.python.org/issue16427> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20957] test_smptnet Fail instead of Skip if SSL-port is unavailable
New submission from Lukas Lueg: If the SSL-port is unavailable due to firewall settings (or the host simply being down), the SSL-tests in test_smtpnet.py fail instead of being skipped. The tests should be skipped if the smtp.google.com can't be reached and fail only in case of unexpected behaviour. -- components: Tests files: test_smptnet.txt messages: 213855 nosy: ebfe priority: normal severity: normal status: open title: test_smptnet Fail instead of Skip if SSL-port is unavailable type: behavior versions: Python 3.4 Added file: http://bugs.python.org/file34461/test_smptnet.txt ___ Python tracker <http://bugs.python.org/issue20957> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue20957] test_smptnet Fail instead of Skip if SSL-port is unavailable
Lukas Lueg added the comment: Diff the make test_smtpnet pass if the network-resource is available but smtp.google.com's ssl-port can't be reached. Most probably there is a better way to do this. -- keywords: +patch Added file: http://bugs.python.org/file34462/cpython_hg_89810_to_89811.diff ___ Python tracker <http://bugs.python.org/issue20957> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21213] Memory bomb by incorrect custom serializer to json.dumps
Lukas Lueg added the comment: The behavior is triggered in Modules/_json.c:encoder_listencode_obj(). It actually has nothing to do with the TypeError itself, any object that produces a new string representation of itself will do. The function encoder_listencode_obj() calls the user-supplied function with the instance to get a string, float, integer or whatever it knows to how convert to json by itself. As the function keeps returning new instances of TypeError, the recursion builds up. The MemoryError is ultimately triggered by the fact that repr() keeps escaping all single quotes from the previous repr(), generating a huge string. Also see "repr(repr(repr("'")))" Testing with 2gb of ram and no swap (disable to to prevent starvation instead of immediate crash!), cpython dies within 34 recursion levels. The obj-parameter for encoder_listencode_obj() looks like "Foo(obj='\\\'>">\\\'>\\\'>\'>')". My two cents: This is expected behavior. The json-module has no way to tell in advance if the encoding-function never returns. The fact that repr() causes this blowup here can't be fixed. -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue21213> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue21213] Memory bomb by incorrect custom serializer to json.dumps
Lukas Lueg added the comment: It's perfectly fine for the function to return an object that can't be put directly into a json string. The function may not convert the object directly but in multiple steps; the encoder will call the function again with the new object until everything boils down to a str, an integer etc.. If one keeps returning objects that never converge to one of those basic types, the interpreter faces death by infinite recursion. The situation described here adds the oom condition caused by repr() blowing up. -- ___ Python tracker <http://bugs.python.org/issue21213> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16381] Introduce option to force the interpreter to exit upon MemoryErrors
Lukas Lueg added the comment: I have to say this feels like spooky action at a distance. Wouldnt it be less intrusive - while achieving the same result - to make MemoryError uncatchable if the flag is set? -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue16381> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16381] Introduce option to force the interpreter to exit upon MemoryErrors
Lukas Lueg added the comment: In any strategy only a heuristic could be used in order to decide wether or not it's safe to raise MemoryError. How exactly is memory pressure expected for x=[2]*200 but not for x=2*200 ? I don't think a new function could ultimatly achieve it's goal. If MemoryError occurs, all bets are off. Shutting down as fast as possible is the best we could do, if requested. -- ___ Python tracker <http://bugs.python.org/issue16381> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16381] Introduce option to force the interpreter to exit upon MemoryErrors
Lukas Lueg added the comment: The heuristic basically has to decide if memory pressure is so high that it's not save to return to the interpreter. Even if there is a chosen value (e.g. failed allocation attempts below 1mb are considered fatal), there can always be another OS-thread in the interpreter process that eats away exactly that memory while we are returning MemoryError - the program might still hang. FWICS all MemoryErrors are to be considered fatal or none of them -- ___ Python tracker <http://bugs.python.org/issue16381> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16385] evaluating literal dict with repeated keys gives no warnings/errors
Lukas Lueg added the comment: This could be avoided by lives_in_init = (('lion': ['Africa', 'America']), ('lion': ['Europe'])) lives_in = {} for k, v in lives_in_init: assert k not in lives_in lives_in[k] = v del lives_in_init Which is fast enough if executed only during module-loading -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue16385> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16385] evaluating literal dict with repeated keys gives no warnings/errors
Lukas Lueg added the comment: PyLint or PyChecker can only do this if the keys are all simple objects like ints or strings. Consider a class with a custom __hash__ -- ___ Python tracker <http://bugs.python.org/issue16385> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16547] IDLE segfaults in tkinter after fresh file's text has been rendered
New submission from Lukas Lueg: IDLE crashes due to what seems to be a use-after-free bug. Opening a file from the 'Open...'-menu leads to a segfault after the text has been rendered. It seems this can be reproduced 100% of the time if the file is big (e.g. 150kb) and the window receives events (clicks...) while the colorization is going on. I've been able to reproduce this behaviour on MacOS 10.6 (3.2, 3.3 and 3.4a) and Windows 7 (3.3) but not on Fedora 17. Python 3.4.0a0 (default:a728056347ec, Nov 23 2012, 19:52:20) [GCC 4.2.1 (Apple Inc. build 5664)] on darwin I'll attach a backtrace and a malloc_history (from a different trace). Any guidance on further debugging on this? -- components: IDLE files: backtrace.txt messages: 176307 nosy: ebfe priority: normal severity: normal status: open title: IDLE segfaults in tkinter after fresh file's text has been rendered type: crash versions: Python 3.2, Python 3.3, Python 3.4 Added file: http://bugs.python.org/file28098/backtrace.txt ___ Python tracker <http://bugs.python.org/issue16547> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16547] IDLE segfaults in tkinter after fresh file's text has been rendered
Lukas Lueg added the comment: using NSZombieEnabled and MallocStackLoggingNoCompact we can see the use-after-free behaviour -- Added file: http://bugs.python.org/file28099/malloc_history.txt ___ Python tracker <http://bugs.python.org/issue16547> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16547] IDLE segfaults in tkinter after fresh file's text has been rendered
Lukas Lueg added the comment: Switching to ActiveState's TCL fixes the problem on MacOS 10.6 I won't be able to produce a trace for a debug-build on Windows; attaching a semi-useless trace anyway. -- Added file: http://bugs.python.org/file28135/backtrace_windows.txt ___ Python tracker <http://bugs.python.org/issue16547> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16547] IDLE segfaults in tkinter after fresh file's text has been rendered
Lukas Lueg added the comment: On windows, IDLE only crashes if executed via pythonw.exe; if executed under python.exe, the attached traceback is dumped to stderr -- Added file: http://bugs.python.org/file28136/excp_traceback.txt ___ Python tracker <http://bugs.python.org/issue16547> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16547] IDLE segfaults in tkinter after fresh file's text has been rendered
Lukas Lueg added the comment: self.io is set to null before the colorization is finished. When IDLE's text-window is closed, the AttributeErrors printed to stderr cause IDLE to crash due to #13582. One can also trigger the exceptions on any other OS as described in OP. While #13582 is an issue of it's own, this is still a race condition to be considered :-) -- ___ Python tracker <http://bugs.python.org/issue16547> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16547] IDLE raises an exception in tkinter after fresh file's text has been rendered
Lukas Lueg added the comment: @Roger: Triggering the segfault on MacOS 10.6 requires some interaction with the text-window while the text is being rendered. This includes moving the window or just clicking into it's canvas. Carefully leaving the window alone while colorization is going on avoids the segfault thereafter. My guess is that TK's event queue gets upset; the segfault was fixed when switching to ActiveState TK. -- ___ Python tracker <http://bugs.python.org/issue16547> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16606] hashlib memory leak
Lukas Lueg added the comment: Thorsten, the problem is that you are using line-based syntax. The code 'for buffer in f:' will read one line per iteration and put it to 'buffer'; for a file opened in binary mode, the iterator will always seek to the next b'\n'. Depending on the content of the file, python may have to read tons of data before the next b'\n' appears. -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue16606> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16632] Enable DEP and ASLR
Lukas Lueg added the comment: Only way to be sure: Enable & announce for 3.5 and wait for bug reports -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue16632> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16764] Make zlib accept keyword-arguments
New submission from Lukas Lueg: The patch "zlib_keywords.patch" makes zlib's classes and functions accept keyword arguments as documented. It also fixes two cases in which the docstring differ from the documentation (decompress(data) vs. decompress(string) and compressobj(memlevel) vs. compressobj(memLevel)). Additional tests are provided. -- components: Library (Lib) files: zlib_keywords.patch keywords: patch messages: 178053 nosy: ebfe priority: normal severity: normal status: open title: Make zlib accept keyword-arguments versions: Python 3.4 Added file: http://bugs.python.org/file28418/zlib_keywords.patch ___ Python tracker <http://bugs.python.org/issue16764> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16764] Make zlib accept keyword-arguments
Lukas Lueg added the comment: Attaching a patch to fix all pep8/pyflakes warnings and errors in test_zlib.py -- Added file: http://bugs.python.org/file28419/zlib_tests_pep8.patch ___ Python tracker <http://bugs.python.org/issue16764> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16764] Make zlib accept keyword-arguments
Lukas Lueg added the comment: Nothing of what you mention is a problem of this patch. The memLevel-keyword was not supported as of now, only the docstring ("memLevel") and the documentation ("memlevel") mentioned it. There is no third-party code that could have used it. The current docstring says that a "string"-keyword should be used with decompress(), the documentation talks about a "data"-keyword. Both are not supported, the patch adds support for a "data"-keyword and fixes the docstring. -- components: +Library (Lib) -Extension Modules type: enhancement -> ___ Python tracker <http://bugs.python.org/issue16764> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1054967] bdist_deb - Debian packager
Lukas Lueg added the comment: Count me in -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue1054967> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1497532] C API to retain GIL during Python Callback
Lukas Lueg added the comment: I'm not sure if such a API is feasible. The very nature of Python makes it impossible to tell in advance what the interpreter will do when getting called. This is even true for simple functions - think of your function getting decorated... Let's consider the following scenario: - Python-Thread 1 is already running and owns a lock on some object. While it still owns the lock, it releases the GIL. - We are executing in Python-Thread 2 and call from Python to C. The C function has the GIL, "locks" it and calls back to Python. - The Python function executes in the same thread, still having the GIL. It tries to acquire the lock on the same object as Thread 1. Preventing a deadlock between those two threads, it releases the GIL and waits for the object-lock. - The GIL is "locked" to the current thread and the current thread is the only one that we can allow to unlock it again; this narrows our options down to Py_BEGIN_ALLOW_THREADS becoming a No-Op in such situations. - Py_BEGIN_ALLOW_THREADS executed in the second C-function is silently ignored. Thread 2 waits for the object-lock with Thread 1 never having a chance to release it. - The interpreter just deadlocked. AFAICS we can't guarantee not to deadlock if there are other threads running before we lock the GIL to the current thread. -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue1497532> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1501108] Add write buffering to gzip
Lukas Lueg added the comment: This is true for all objects whose input could be concatenated. For example with hashlib: data = ['foobar']*10 mdX = hashlib.sha1() for d in data: mdX.update(d) mdY = hashlib.sha1() mdY.update("".join(data)) mdX.digest() == mdY.digest() the second version is multiple times faster... -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue1501108> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5541] File's current position inconsistent with 'a+' mode
New submission from Lukas Lueg : The file pointer's behaviour after opening a file in 'a+b' mode is not consistent among platforms: The pointer is set to the beginning of the file on Linux and to the end of the file on MacOS. You have to call .seek(0) before calling .read() to get consistent behaviour on all platforms. While this is not a serious problem, it somewhat violates the rule of least surprise. Also we are not bound to this behaviour and can make sure that all file objects have their respective positions well-defined after object-creation. Thoughts? -- messages: 83997 nosy: ebfe severity: normal status: open title: File's current position inconsistent with 'a+' mode versions: Python 2.5 ___ Python tracker <http://bugs.python.org/issue5541> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
Lukas Lueg added the comment: bump hashlibopenssl_gil_py27.diff has not yet been applied to py27 and does not apply cleanly any more. Here is an updated version. -- status: pending -> open Added file: http://bugs.python.org/file13646/hashlibopenssl_gil_py27_2.diff ___ Python tracker <http://bugs.python.org/issue4751> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4751] Patch for better thread support in hashlib
Changes by Lukas Lueg : Removed file: http://bugs.python.org/file13057/hashlibopenssl_gil_py27.diff ___ Python tracker <http://bugs.python.org/issue4751> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1054967] bdist_deb - Debian packager
Lukas Lueg added the comment: Thanks for your efforts. I don't think you are stepping on anyone's toes when picking up an issue that was unsolved for almost 5 years :-) Please post patches to this bug for review/comments/help/whatever -- ___ Python tracker <http://bugs.python.org/issue1054967> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16632] Enable DEP and ASLR
Changes by Lukas Lueg : -- nosy: -ebfe ___ Python tracker <http://bugs.python.org/issue16632> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9285] Add a profile decorator to profile and cProfile
Changes by Lukas Lueg : -- nosy: -ebfe ___ Python tracker <http://bugs.python.org/issue9285> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16381] Introduce option to force the interpreter to exit upon MemoryErrors
Lukas Lueg added the comment: Another proposal: Add a new BaseClass that, if inherited from, causes an exception to be uncatchable (e.g. class HardMemoryError(MemoryError, UncatchableException)). -- ___ Python tracker <http://bugs.python.org/issue16381> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue25465] Pickle uses O(n) memory overhead
Lukas Lueg added the comment: I very strongly doubt that it actually crashes your kernel - it basically can't. Your desktop becomes unresponsive for up to several minutes as the kernel has paged out about every single bit of memory to disk, raising access times by several orders of magnitude. Disable your swap and try again, it will just die. -- nosy: +ebfe ___ Python tracker <http://bugs.python.org/issue25465> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com