[issue1778443] robotparser.py fixes
Jim Jewett added the comment: On line 108 (new 104), spaces should probably be added on both sides of the comparison operator, instead of only after the ">=". The "%s" changes might end up getting changed again as part of 2to3, but this is a clear improvement over status quo, particularly with the loops. I recommend applying. -- nosy: +jimjjewett _ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1778443> _ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1177] urllib* 20x responses not OK?
New submission from Jim Jewett : Under the http protocol, any 2xx response is OK. urllib.py and urllib2.py hardcoded only response 200 (the most common). http://bugs.python.org/issue912845 added 206 as acceptable to urllib2, but not any other 20x responses. (It also didn't fix urllib.) Suggested for 2.6, as it does change behavior. (Also see duplicate http://bugs.python.org/issue971965 which I will try to close after opening this. ) -- components: Library (Lib) messages: 56009 nosy: jimjjewett severity: normal status: open title: urllib* 20x responses not OK? type: behavior versions: Python 2.6 __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1177> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1177] urllib* 20x responses not OK?
Jim Jewett added the comment: Jafo: His fix is great for urllib2, but the same issue applies to the original urllib. The ticket should not be closed until a similar fix is made to lines 330 and 417 of urllib.py. That is, change "if errcode == 200:" to "if 200 <= errcode < 300:" (Or, if rejecting the change, add a comment saying that it is left that way intentionally for backwards compatibility.) -jJ __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1177> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1177] urllib* 20x responses not OK?
Jim Jewett added the comment: The change still missed the httpS copy. I'm attaching a minimal change. I think it might be better to just combine the methods -- as was already done in Py3K. Unfortunately, the py3K code doesn't run cleanly in 2.5, and I haven't yet had a chance to test a backported equivalent. (Hopefully tonight.) __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1177> __*** urllibhead.py --- urllib.py *** *** 435,441 # something went wrong with the HTTP status line raise IOError, ('http protocol error', 0, 'got a bad status line', None) ! if errcode == 200: return addinfourl(fp, headers, "https:" + url) else: if data is None: --- 435,443 # something went wrong with the HTTP status line raise IOError, ('http protocol error', 0, 'got a bad status line', None) ! # According to RFC 2616, "2xx" code indicates that the client's ! # request was successfully received, understood, and accepted. ! if not (200 <= errcode < 300): return addinfourl(fp, headers, "https:" + url) else: if data is None: ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1209] IOError won't accept tuples longer than 3
New submission from Jim Jewett : EnvironmentError (including subclass IOError) has special treatment when constructed with a 2-tuple or 3-tuple. A four-tuple turns off this special treatment (and was used by urllib for that reason). As of 2.5, a four-tuple raises a TypeError instead of just turning off the special treatment. -- components: Extension Modules messages: 56150 nosy: jimjjewett severity: normal status: open title: IOError won't accept tuples longer than 3 type: behavior versions: Python 2.5 __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1209> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1401] urllib2 302 POST
Jim Jewett added the comment: > But you said that #2 solution was more RFC compliant... > Could you please quote the RFC part that describes this behaviour? RFD2616 http://www.faqs.org/rfcs/rfc2616.html section 4.3 Message Body ... The presence of a message-body in a request is signaled by the inclusion of a Content-Length or Transfer-Encoding header field in the request's message-headers. A message-body MUST NOT be included in a request if the specification of the request method (section 5.1.1) does not allow sending an entity-body in requests. [I couldn't actually find a quote saying that GET has no body, but ... it doesn't.] Section 10.3 Redirection 3xx says The action required MAY be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD. In other words, changing it to GET may not be quite pure, but leaving it as POST would technically mean that the user MUST confirm that the redirect is OK. This MUST NOT becomes more explicit later, such as in 10.3.2 (301 Moved Permanently). Section 10.3.3 (302 Found) says that 307 was added specifically to insist on keeping it a POST, and even 307 says it MUST NOT automatically redirect unless it can be confirmed by the user. Which is why user agents change redirects to a GET and try that... -- components: +XML -None nosy: +jimjjewett __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1401> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1501] 0 ** 0 documentation
New submission from Jim Jewett: http://docs.python.org/lib/typesnumeric.html contains a table listing the mathematical operators. Please add a note to the final row (x ** y meaning x to the power y) indicating that Python has chosen to define 0**0==1 Note 6: Python defines 0**0 to be 1. For background, please see http:// en.wikipedia.org/wiki/Exponentiation#Zero_to_the_zero_power This doc change should have prevented issue 1461; I *think* there have been similar issues in the past. -- components: Documentation messages: 57855 nosy: jimjjewett severity: minor status: open title: 0 ** 0 documentation type: rfe versions: Python 2.5 __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1501> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13604] update PEP 393 (match implementation)
New submission from Jim Jewett : The implementation has a larger state.kind Clarified wording on wstr_length and surrogate pairs. Clarified that the canonical "data" format doesn't always have a data pointer. Mentioned that calling PyUnicode_READY would finalize a string, so that it couldn't be resized. Changed section head "Other macros" to "Finalization macro" and removed the non-existent PyUnicode_CONVERT_BYTES (there is a similarly named private macro). -- files: pep-0393.txt.patch keywords: patch messages: 149497 nosy: Jim.Jewett priority: normal severity: normal status: open title: update PEP 393 (match implementation) Added file: http://bugs.python.org/file23960/pep-0393.txt.patch ___ Python tracker <http://bugs.python.org/issue13604> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13604] update PEP 393 (match implementation)
Changes by Jim Jewett : -- assignee: -> docs@python components: +Documentation nosy: +docs@python versions: +Python 3.3 Added file: http://bugs.python.org/file23961/pep-0393.txt ___ Python tracker <http://bugs.python.org/issue13604> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13604] update PEP 393 (match implementation)
Changes by Jim Jewett : Added file: http://bugs.python.org/file23968/pep-0393.txt ___ Python tracker <http://bugs.python.org/issue13604> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13604] update PEP 393 (match implementation)
Jim Jewett added the comment: Updated to resolve most of Victor's concerns, but this meant enough changes that I'm not sure it quite counts as editorial only. A few questions that I couldn't answer: (1) Upon string creation, do we want to *promise* to discard the UTF-8 and wstr, so that the caller can memory manage? (2) PyUnicode_AS_DATA(), Py_UNICODE_strncpy, Py_UNICODE_strncmp seemed to be there in the code I was looking at. (3) I can't justify the born-deprecated function "PyUnicode_AsUnicodeAndSize". Perhaps rename it with a leading underscore? Though I'm not sure it is really needed at all. (4) I tried to reword the "for compatibility" ... "redundant" part ... but I'm not sure I resolved it. -- ___ Python tracker <http://bugs.python.org/issue13604> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13604] update PEP 393 (match implementation)
Jim Jewett added the comment: >> So even if a third party module uses the legagy Unicode API, the PEP >> 393 will still optimize the memory usage thanks to implicit calls to >> PyUnicode_READY() (done everywhere in Python source code). > ... unless they inspect a given Unicode string, in which case it > will use twice the memory (or 1.5x). Why is the utf-8 representation not cached when it is generated for ParseTuple et alia? It seems like these parameters are likely to either be re-used as parameters (in which case caching makes sense) or not re-used at all (in which case, the whole string can go away). > Well, I meant the resizing of strings that doesn't move the object > in memory (i.e. unicode_resize). This may easily fail because the new size can't be found at that location; wouldn't it be better to just encourage proper sizing in the first place? >> (1) Upon string creation, do we want to *promise* to discard >> the UTF-8 and wstr, so that the caller can memory manage? > I don't understand the question. Assuming "discards" means > "releases" here, then there is no API which releases memory > during creation of the string object - let alone that there is > any promise to do so. I'm also not aware of any candidate buffer > that you might want to release. When a string is created from a wchar_t array, who is responsible for releasing the original wchar_t array? As I read it now, Python doesn't release the buffer, and the caller can't because maybe Python just pointed to it as memory shared with the canonical representation. >> (2) PyUnicode_AS_DATA(), Py_UNICODE_strncpy, Py_UNICODE_strncmp >> seemed to be there in the code I was looking at. > That's very well possible. What's the question? Victor listed them as missing. I now suspect he meant "missing from the PEP list of deprecated functions and macros", and I just misunderstood. -- Added file: http://bugs.python.org/file23970/pep-0393.txt ___ Python tracker <http://bugs.python.org/issue13604> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13604] update PEP 393 (match implementation)
Changes by Jim Jewett : Added file: http://bugs.python.org/file23971/pep-0393v20111215.patch ___ Python tracker <http://bugs.python.org/issue13604> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13608] remove born-deprecated PyUnicode_AsUnicodeAndSize
New submission from Jim Jewett : In reviewing issue 13604 (aligning PEP 393 with the implementation) Victor Stinner noticed that PyUnicode_AsUnicodeAndSize is new in 3.3, but that it is already deprecated (because it relies on the old PyUnicode type). This born-deprecated function is just a shortcut for PyUnicode_AsUnicode plus PyUnicode_GET_SIZE, and should be removed. -- components: Unicode messages: 149585 nosy: Jim.Jewett, ezio.melotti priority: normal severity: normal status: open title: remove born-deprecated PyUnicode_AsUnicodeAndSize versions: Python 3.3 ___ Python tracker <http://bugs.python.org/issue13608> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13604] update PEP 393 (match implementation)
Jim Jewett added the comment: >> Why is the utf-8 representation not cached when it is generated for >> ParseTuple et alia? My error -- I read something backwards. >> When a string is created from a wchar_t array, who is responsible for >> releasing the original wchar_t array? > The caller. OK, I'll document that. >> As I read it now, Python >> doesn't release the buffer, and the caller can't because maybe Python >> just pointed to it as memory shared with the canonical >> representation. > But Python won't; it will always make a copy for itself. I thought I found an example each way, but it is possible that the shared version was something python had already copied. If not, I'll raise that as a separate issue to get the code changed. (Note that I may not be able to look at this again until after Christmas, so I'm likely to go silent for a while.) -- Added file: http://bugs.python.org/file23979/pep-0393.txt ___ Python tracker <http://bugs.python.org/issue13604> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13604] update PEP 393 (match implementation)
Changes by Jim Jewett : Added file: http://bugs.python.org/file23980/pep-0393_20111216.txt.patch ___ Python tracker <http://bugs.python.org/issue13604> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13677] correct docstring for builtin compile
New submission from Jim Jewett : The current docstring for compile suggests that the flags are strictly for selecting future statements. These are not the only flags. It also suggests that the source must be source code and the result will be bytecode, which isn't quite true. I suggest changing: "The flags argument, if present, controls which future statements influence the compilation of the code." to: "The flags argument, if present, largely controls which future statements influence the compilation of the code. (Additional flags are documented in the AST module.)" -- assignee: docs@python components: Documentation files: bltinmodule.c.patch keywords: patch messages: 150337 nosy: Jim.Jewett, docs@python priority: normal severity: normal status: open title: correct docstring for builtin compile type: behavior Added file: http://bugs.python.org/file24105/bltinmodule.c.patch ___ Python tracker <http://bugs.python.org/issue13677> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13677] correct docstring for builtin compile
Jim Jewett added the comment: I'm not sure we're looking at the same thing. I was talking about the docstring that shows up at the interactive prompt in response to >>> help(compile) Going to hg.python.org/cpython and selecting branches, then default, then browse, got me to http://hg.python.org/cpython/file/7010fa9bd190/Python/bltinmodule.c which still doesn't mention AST. I also don't see any reference to "src" or "dst", or any "source" that looks like it should be capitalized. I agree that there is (to my knowledge, at this time) only one additional flag. I figured ast or future was needed to get the compilation constants, so it made sense to delegate -- but you seem to be reading something newer than I am. -- ___ Python tracker <http://bugs.python.org/issue13677> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13776] formatter_unicode.c still assumes ASCII
New submission from Jim Jewett : http://docs.python.org/library/string.html#format-specification-mini-language defines fill::= and the text also excludes '{'. It does not require that the fill character be ASCII. However, function parse_internal_render_format_spec http://hg.python.org/cpython/file/c2153ce1b5dd/Python/formatter_unicode.c#l277 raises a ValueError if fill_char > 127. I'm honestly not certain which of the three is correct, but they should be consistent, and if anything but '{' is excluded, it would be best to explain why. -- components: Unicode messages: 151128 nosy: Jim.Jewett, ezio.melotti priority: normal severity: normal status: open title: formatter_unicode.c still assumes ASCII type: behavior ___ Python tracker <http://bugs.python.org/issue13776> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation
Jim Jewett added the comment: The currently applied patch ( http://hg.python.org/cpython/rev/f7e05d205a52 ) left some dead code in unicodeobject.c function fixup ( http://hg.python.org/cpython/file/f7e05d205a52/Objects/unicodeobject.c#l9386 ) has a shortcut for when the fixer doesn't make any actual changes. The removed fixers (like fixupper ) returned 0 rather than maxchar to indicate that. The only remaining fixer, fix_decimal_and_space_to_ascii (line 8839), does not. (I think fix_decimal_and_space_to_ascii *should* add a touched flag, but until it does, the shortcut dedup code is dead.) Also, around line 10502, there is an #if 0 section with code that relied on one of the removed fixers; is it time to remove that section? -- nosy: +Jim.Jewett ___ Python tracker <http://bugs.python.org/issue12736> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10576] Add a progress callback to gcmodule
Jim Jewett added the comment: I like the idea, but I do quibble with the signature. As nearly as I can tell, you're assuming (a) Only one callback. I would prefer a sequence of callbacks, to make cooperation easier. (This does mean you would need a callback removal, instead of just setting it to None.) (b) The callback will be called once before collecting generations, and once after (with the number of objects that weren't collected). Should these be separate callbacks, rather than the same one with a boolean? And why does it need the number of uncollected objects? (This might be a case where Practicality Beats Purity, but it is worth documenting.) -- nosy: +jimjjewett ___ Python tracker <http://bugs.python.org/issue10576> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10576] Add a progress callback to gcmodule
Jim Jewett added the comment: Does anyone think that it is simpler to register two different callbacks than one? Moderately, yes. Functions that actually help with cleanup should normally be run only in one phase; it is just stats-gathering and logging functions that might run both times, and I don't mind registering those twice. For functions that are run only once (which I personally think is the more normal case), the choices are between @register_gc def my_callback(actually_run_flag, mydict): if not actually_run_flag: return ... vs @register_gc_before def my_callback(mydict): ... -- ___ Python tracker <http://bugs.python.org/issue10576> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3025] batch/IDLE differ: print broken for chraracters>ascii
New submission from Jim Jewett <[EMAIL PROTECTED]>: The str->Unicode change widened IDLE/batch discrepancy. In python 2.x, bytes are printable. >>> for i in range(256): print i, chr(i) works fine. In python 3, chr has become (the old) unichr, and whether a unicode character is printable depends on the environment. In particular, under my Windows XP, the equivalent >>> for i in range(256): print (i, chr(i)) will still work fine under IDLE, but will now crash with an UnicodeEncodeError when run from the command line. Unfortunately, I'm not sure what the right solution actually is, other than a mention in the Whats New document. I believe the 2.5 code was using a system page to print those characters, as they often looked like letters rather than . Copying that would probably be the wrong solution. Limiting IDLE would add consistency, but might be a lot of work for the equivalent of a --pedantic flag. PEP 3138 seems to be proposing a default stdout BackslashReplace, which may at least help. -- assignee: georg.brandl components: Documentation, Unicode messages: 67617 nosy: georg.brandl, jimjjewett severity: normal status: open title: batch/IDLE differ: print broken for chraracters>ascii type: behavior versions: Python 3.0 ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3025> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue775544] Tk.quit leads to crash in python.exe
Jim Jewett <[EMAIL PROTECTED]> added the comment: Were you using IDLE at the time? When I try this (Windows XP SP2), the button and its window do not go away (which is arguably a bug), but it does not crash. If I then try to close the window using the little X (from the window manager), (1) A qb started from the command-line interface exits, as it should. (2) A qb started from within IDLE becomes non-responsive, and Windows asks whether or not I want to continue shutting it down. -- nosy: +jimjjewett ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue775544> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3300] urllib.quote and unquote - Unicode issues
Jim Jewett <[EMAIL PROTECTED]> added the comment: Is there still disagreement over anything except: (1) The type signature of quote and unquote (as opposed to the explicit "quote_as_bytes" or "quote_as string"). (2) The default encoding (latin-1 vs UTF8), and (if UTF-8) what to do with invalid byte sequences? (3) Would waiting for 3.1 cause too many compatibility problems? -- nosy: +jimjjewett ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3300> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3300] urllib.quote and unquote - Unicode issues
Jim Jewett <[EMAIL PROTECTED]> added the comment: Matt pointed out that the email package assumes Latin-1 rather than UTF-8; I assume Bill could patch his patch the same way Matt did, and this would resolve the email tests. (Unless you pronounce to stick with Latin-1) The cookiejar failure probably has the same root cause; that test is encoding (non-ASCII) Latin-1 characters, and urllib.parse.py/Quoter assumes Latin-1. So I see some evidence (probably not enough) for sticking with Latin-1 instead of UTF-8. But I don't see any evidence that fixing the semantics (encoded results should be bytes) at the same time made the conversion any more painful. On the other hand, Matt shows that some of those extra str->byte code changes might never need to be done at all, except for purity. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3300> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3300] urllib.quote and unquote - Unicode issues
Jim Jewett <[EMAIL PROTECTED]> added the comment: > http://codereview.appspot.com/2827/diff/1/5#newcode1450 > Line 1450: "%3c%3c%0Anew%C3%A5/%C3%A5", > I'm guessing this test broke otherwise? Yes; that is one of the breakages you found in Bill's patch. (He didn't modify the test.) > Given that this references an RFC, > is it correct to just fix it this way? Probably. Looking at http://www.faqs.org/rfcs/rfc2965.html (1) That is not among the exact tests in the RFC. (2) The RFC does not specify charset for the cookie in general, but the Comment field MUST be in UTF-8, and the only other reference I could find to a specific charset was "possibly in a server-selected printable ASCII encoding." Whether we have to use Latin-1 (or document charset) in practice for compatibility reasons, I don't know. ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3300> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3300] urllib.quote and unquote - Unicode issues
Jim Jewett <[EMAIL PROTECTED]> added the comment: Matt, Bill's main concern is with a policy decision; I doubt he would object to using your code once that is resolved. The purpose of the quoting functions is to turn a string (representing the human-readable version) into bytes (that go over the wire). If everything is ASCII, there isn't any disagreement -- but it also isn't obvious that they're bytes instead of characters. So people started (well, continued, since it dates to pre-unicode C) treating them as though they were strings. The fact that ASCII (and therefore most wire protocols) looks the same as bytes or as characters was one of the strongest arguments against splitting the bytes and string types. Now that this has been done, Bill feels we should be consistent. (You feel wire-protocol bytes should be treated as strings, if only as bytestrings, because the libraries use them that way -- but this is a policy decision.) To quote the final paragraph of 1.2.1 """ In local or regional contexts and with improving technology, users might benefit from being able to use a wider range of characters; such use is not defined by this specification. Percent-encoded octets (Section 2.1) may be used within a URI to represent characters outside the range of the US-ASCII coded character set if this representation is allowed by the scheme or by the protocol element in which the URI is referenced. Such a definition should specify the character encoding used to map those characters to octets prior to being percent-encoded for the URI. """ So the mapping to bytes (or "octets") for non-ASCII isn't defined (here), and if you want to use it, you need to specify charset. But in practice, people do use it without specifying a charset. Which charset should be assumed? The old code (and test cases) assumed Latin-1. You want to assume UTF-8 (though you took the document charset when available -- which might also make sense). ___ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3300> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1657] [patch] epoll and kqueue wrappers for the select module
Jim Jewett <[EMAIL PROTECTED]> added the comment: Is pyepoll a good prefix? To me, it looks a lot like the _Py and Py reservered namespaces, but not quite... -- nosy: +jimjjewett __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue1657> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2679] email.feedparser regex duplicate
New submission from Jim Jewett <[EMAIL PROTECTED]>: feedparser defines four regexs for end-of-line, but two are redundant. NLCRE checks for the three common line endings. NLCRE_crack also captures the line ending. NLCRE_eol also adds a $ to ensure it is at the end. NLCRE_bol ... is identical to NLCRE_crack. It should either use a ^ to insist on line-start, or be explicitly the same. (e.g., NLCRE_bol=NLCRE_crack.) (It gets away with not listing the ^ because the current code only uses NLCRE_bol.match. (Actually, if the regexes are considered private, then the current code could just use the bound methods directly ... setting NLCRE_bol to the .match method, NLCRE_eol to the .search method, and NLCRE_crack to the .split method.) -- components: Library (Lib) messages: 65723 nosy: jimjjewett severity: normal status: open title: email.feedparser regex duplicate versions: Python 2.6, Python 3.0 __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2679> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.6 (modifications to current re 2.2.2)
Jim Jewett <[EMAIL PROTECTED]> added the comment: > These features are to bring the Regexp code closer in line with Perl 5.10 Why 5.1 instead of 5.8 or at least 5.6? Is it just a scope-creep issue? > as well as add a few python-specific because this also adds to the scope. > 2) Make named matches direct attributes > of the match object; i.e. instead of m.group('foo'), > one will be able to write simply m.foo. > 3) (maybe) make Match objects subscriptable, such > that m[n] is equivalent to m.group(n) and allow slicing. (2) and (3) would both be nice, but I'm not sure it makes sense to do *both* instead of picking one. > 5) Add a well-formed, python-specific comment modifier, > e.g. (?P#...); [handles parens in comments without turning on verbose, but is slower] Why? It adds another incompatibility, so it has to be very useful or clear. What exactly is the advantage over just turning on verbose? > 9) C-Engine speed-ups. ... > a number of Macros are being eliminated where appropriate. Be careful on those, particular on str/unicode and different compile options. -- nosy: +jimjjewett __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2636> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue2636] Regexp 2.6 (modifications to current re 2.2.2)
Jim Jewett <[EMAIL PROTECTED]> added the comment: Python 2.6 isn't the last, but Guido has said that there won't be a 2.10. > Match object is a C-struct with python binding > and I'm not exactly sure how to add either feature to it I may be misunderstanding -- isn't this just a matter of writing the function and setting it in the tp_as_sequence and tp_as_mapping slots? > Larry Wall and Guido agreed long ago that we, the python > community, own all expressions of the form (?P...) Cool -- that reference should probably be added to the docs. For someone trying to learn or translate regular expressions, it helps to know that (?P ...) is explicitly a python extension (even if Perl adopts it later). Definately put the example in the doc. r'He(?# 2 (TWO) ls)llo' should match "Hello" but it doesn't. Maybe even without the change, as doco on the current situation. Does VERBOSE really have to be the first flag, or does it just have to be on the whole pattern instead of an internal switch? I'm not sure I fully understand what you said about template. Is this a special undocumented switch, or just an internal optimization mode that should be triggered whenever the repeat operators don't happen to occur? __ Tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue2636> __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4882] Behavior of backreferences to named groups in regular expressions unclear
Jim Jewett added the comment: That sounds like a good idea, particularly since it is a bit different from Perl. Please do write up the a clarification. Typically, I have either attached a file with the suggested wording, or included it in a comment from which a commiter could cut-and-paste. (If Georg has different preferences on how to submit the patch, they should probably go into a FAQ anyhow.) -- nosy: +jimjjewett ___ Python tracker <http://bugs.python.org/issue4882> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue4888] misplaced (or misleading) assert in ceval.c
Jim Jewett added the comment: I agree with Raymond. A comment *might* be sufficient, but ... in some sense, that is the purpose of an assert. The loop is reasonably long; it already includes macros which could (but currently don't) change the value, and function calls which might plausibly (but don't) reset a "why" variable. The why variable is techically local, but the scope is still pretty large, so that isn't clear at first. It took me some work to verify the assertion, and I'm not at all confident that a later change wouldn't violate it. Nor am I confident that the symptoms would make for straightforward debugging. (Would it look like stack corruption? Would it take several more opcodes before a problem was visible?) -- nosy: +jimjjewett ___ Python tracker <http://bugs.python.org/issue4888> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24275] lookdict_* give up too soon
Jim Jewett added the comment: What is the status on this? If you are losing interest, would you like someone else to turn your patch into a pull request? -- ___ Python tracker <https://bugs.python.org/issue24275> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue44548] ttk Indeterminate Progressbar Not Animating Correctly After `start`
Jim Jewett added the comment: It sounds like the fix is a configuration change already included in the next version, so ... I think that counts as a fix. -- nosy: +Jim.Jewett resolution: -> fixed status: open -> pending ___ Python tracker <https://bugs.python.org/issue44548> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24275] lookdict_* give up too soon
Jim Jewett added the comment: This was originally "can be reopened if a patch is submitted" and Hristo Venev has now done so. Therefore, I am reopening. -- resolution: rejected -> remind stage: -> patch review status: closed -> open versions: +Python 3.10 ___ Python tracker <https://bugs.python.org/issue24275> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24275] lookdict_* give up too soon
Jim Jewett added the comment: Based on Hristo's timing, it appears to be a clear win. A near-wash for truly string-only dicts that shouldn't be effected; a near-wash for looking up non-(exact-)strings, and a nearly 40% speedup for the target case of looking up but not inserting a non-string or string subclass, then looking up strings thereafter. Additional comments: Barring objections, I will promote from patch review to commit review when I've had a chance to look more closely. I don't have commit privs, but I think some of the others following this issue do. The test looks pretty good enough -- good enough that I wonder if I'm missing something on the parts that seem odd. It would be great if you either cleaned them up or commented to explain why: Why is the first key vx1, which seems, if anything, like a variable? Why not k1 or string_key? Why is the first key built up as vx='x'; vx += '1' instead of just k1="x1"? Using a str subclass in the test is a great idea, and you've created a truly minimal one. It would probably be good to *also* test with a non-string, like 3 or 42.0. I can't imagine this affecting things (unless you missed an eager lookdict demotion somewhere), but it would be good to have that path documented against regression. This seems like a test that could probably be rolled into a bigger testfile for the actual commit. I don't have the name of such an appropriate file at hand right now, but will try to find it on a deeper review. -- ___ Python tracker <https://bugs.python.org/issue24275> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39542] Cleanup object.h header
Jim Jewett added the comment: Raymond, did you replace the screenshot with a later one showing that things are fixed now? The timestamp suggests it went up at the same time as your comment, but what I see in the .png file is that the two are identical other than addresses. -- nosy: +Jim.Jewett ___ Python tracker <https://bugs.python.org/issue39542> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41212] Emoji Unicode failing in standard release of Python 3.8.3 / tkinter 8.6.8
Jim Jewett added the comment: @Ben Griffin -- Unicode has defined astral characters for a while, but they were explicitly intended for rare characters, with any living languages intended for the basic plane. It is only the most recent releases of unicode that have broken the "most people won't need this" expectation, so it wasn't unreasonable for languages targeting memory-constrained devices to make astral support at best a compile-time operation. I've seen a draft for an upcoming spec update of an old but still-supported language (extended Gerber, for photoplotting machines) that "handles" this simply by clarifying that their unicode support is limited to characters < 65K. Given that their use of unicode is essentially limited to comments, and there is plenty of hardware that can't be updated ... this is may well be correct. Python itself does the right thing, and tcl can't do the right thing anyhow without font support ... so this may be fixed in less time than it would take to replace Tk/Tcl. If you need a faster workaround, consider a private-use-area and private font. -- nosy: +Jim.Jewett ___ Python tracker <https://bugs.python.org/issue41212> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41217] Obsolete note for default asyncio event loop on Windows
Jim Jewett added the comment: Looks good to me. -- nosy: +Jim.Jewett ___ Python tracker <https://bugs.python.org/issue41217> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41246] IOCP Proactor same socket overlapped callbacks
Jim Jewett added the comment: Looks good to me. I at first worried that the different function names were useful metadata that was getting lost -- but the names were already duplicated in several cases. *If* that is still a concern for the committer, then instead of repeating the code (as current production does), each section should just say newname=origname before registering the static method (as the patch does), and should bind a distinct name for each usage. -- nosy: +Jim.Jewett versions: +Python 3.10 ___ Python tracker <https://bugs.python.org/issue41246> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41220] add optional make_key argument to lru_cache
Jim Jewett added the comment: Going back to Raymond's analysis, this is useful when at least some of the parameters either do not change the result, or are not hashable. At a minimum, you need to figure out which parameters those are, and whether to drop them or transform them. Is this already sufficiently rare or tricky that a subclass is justified, instead of trying to shoehorn things into a single key method? -- nosy: +Jim.Jewett ___ Python tracker <https://bugs.python.org/issue41220> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41405] python 3.9.0b5 test
Jim Jewett added the comment: Is this a platform where 3.8 was working? The curses test seems to think you have too many color-pairs defined, and this might well be part of a semi-compatible curses library. I guess I would add some output to the test showing how many (and which) color pairs it thinks there are. The pwd complaint is correct, but seems like it is complaining about the interface between python and your OS. The tkinter problem is really a failure to round a floating point, and I would be surprised if python had made changes there recently. I would be slightly less surprised if something in the compile chain of tk for your system hard-coded a specific rounding format. -- nosy: +Jim.Jewett ___ Python tracker <https://bugs.python.org/issue41405> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41409] deque.pop(index) is not supported
Jim Jewett added the comment: It may well have been intentional, as deques should normally be mutated only at the ends. But Raymond did make changes to conform to the ABC, so this should probably be supported too. Go ahead and include docstrings and/or discouraging it, though, except for i=0 and i=-1 -- nosy: +Jim.Jewett ___ Python tracker <https://bugs.python.org/issue41409> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40841] Provide mimetypes.sniff API as stdlib
Jim Jewett added the comment: The standard itself says that it only applies to content served over http; if the content is retrieved by ftp or from a file system, then you should trust that. I don't notice that in the code you pointed to. So maybe filetype is the right answer if the data isn't coming over the network? For whatwg demonstration code, it is reasonable to assume that, but in python -- at a minimum, you should document the assumption prominently in the docs and docstring. -- nosy: +Jim.Jewett ___ Python tracker <https://bugs.python.org/issue40841> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue18280] Documentation is too personalized
Jim Jewett added the comment: I won't speak of nroff or troff in particular, but many programs had trouble distinguishing the end of a sentence from an honorific abbreviation, such as Mr. Spock or Dr. Seuss. -- nosy: +Jim.Jewett ___ Python tracker <https://bugs.python.org/issue18280> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41407] Tricky behavior of builtin-function map
Jim Jewett added the comment: Why would you raise StopIteration if you didn't want to stop the nearest iteration loop? I agree that the result of your sample code seems strange, but that is because it is strange code. I agree with Steven D'Aprano that changing it would cause more pain than it would remove. Unless it gets a lot more support by the first week of August, I recommend closing this request as rejected. -- nosy: +Jim.Jewett status: open -> pending ___ Python tracker <https://bugs.python.org/issue41407> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31904] Python should support VxWorks RTOS
Jim Jewett added the comment: Is it safe to say that there is an now intent to support VxWorks within the main tree, with Wind River agreeing to be primary support? And this ticket has become a tracking ticket for the status on getting it there, small PR by small PR plus buildbot? -- nosy: +Jim.Jewett ___ Python tracker <https://bugs.python.org/issue31904> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41391] Make test_unicodedata pass when running without network
Jim Jewett added the comment: Looks Good To Me -- nosy: +Jim.Jewett ___ Python tracker <https://bugs.python.org/issue41391> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue40841] Provide mimetypes.sniff API as stdlib
Jim Jewett added the comment: There are a zillion reasons a filename could be wrong -- but the standard says to trust the filesystem. So if it sniffs based on contents, it isn't quite following the standard. It is probably still a useful tool, but it won't be the One Right Way, and it isn't even clear that it should replace current heuristics. On Mon, Jul 27, 2020 at 7:22 PM Guido van Rossum wrote: > > Guido van Rossum added the comment: > > Whether the data was retrieved over a network has nothing to do with it. > > There are complementary ways of guessing what data you are working with -- > guess based on the filename extension or sniff based on the contents of the > file (or downloaded data). > > There are a zillion reasons why the filename could be a lie -- e.g. a user > could pick the wrong extension, or rename a file, or a tool could save a > file using the wrong extension or no extension at all. Then again sometimes > the contents of the file might not be enough, e.g. > ``` > foo() // bar > ``` > is both valid Python and valid JavaScript. :-) > > -- > > ___ > Python tracker > <https://bugs.python.org/issue40841> > ___ > -- ___ Python tracker <https://bugs.python.org/issue40841> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41405] python 3.9.0b5 test
Jim Jewett added the comment: Then I suspect they also exist in even earlier versions, and are actually tied to your development setup. That should still be fixed, but it is probably not in Python's own code. It might be in python's build process, which is still on us. Or it might be in your distribution, or in a dependency like Tk, or in your personal C compiler or setup. Could you look to see what your system's actual passwd file says, and how tcl rounds outside of python, and how many color pairs your curses supports or has? -- versions: +Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue41405> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13828] Further improve casefold documentation
Jim Jewett added the comment: Unicode probably won't make the correction, because of backwards compatibility. I do support the sentence suggested in Thorsten's most recent reply. Is expanding ligatures the only other normalization it does? Ideally, we should also mention that it shifts to the canonical case, which is usually (but not always) lowercase. I think Cherokee is one that folds to the upper case. On Mon, Aug 24, 2020 at 11:02 AM Thorsten wrote: > > Thorsten added the comment: > > I see. I found the documents. That's an issue. That usage is incorrect. It > is still valid to upper case "ß" to SS since "ẞ" is fairly new as an > official German character, but the other way around is not valid. > > As such the current sentence in documentation also just does not make > sense. > > >"Since it is already lowercase, lower() would do nothing to 'ß'" > > Exactly. Why would it? It is nonsensical to change an already lowercase > character with a lowercase function. > > Suggest to update to: > > "For example, the Unicode standard for German lower case letter 'ß' > prescribes full casefolding to 'ss'. Since it is already lowercase, lower() > would do nothing to 'ß'; casefold() converts it to 'ss'. > In addition to full lowercasing, this function also expands ligatures, for > example, 'fi' becomes 'fi'." > > -- > > ___ > Python tracker > <https://bugs.python.org/issue13828> > ___ > -- ___ Python tracker <https://bugs.python.org/issue13828> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue41246] IOCP Proactor same socket overlapped callbacks
Change by Jim Jewett : -- stage: patch review -> commit review ___ Python tracker <https://bugs.python.org/issue41246> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation
Jim Jewett added the comment: Why was the delta-processing removed from the casing functions? As best I can tell, the whole point of going through multiple levels of indirection (courtesy splitbins) is to maximize compression and minimize the amount of cache that unicode might occupy. By using deltas, only one record is needed for each combination of (upper - lower, upper - title), which is generally only one or two combinations per script. Without deltas, nearly every cased letter needs its own record, and the index tables also get bigger. (It seems to be about 2.6 times as large, but cache effects may be worse, since letters from the same script will no longer be in the same record or the same index chain.) If it is a concern about not enough room for flags, then the decimal/digit chars could be combined. They are always the same, unless the number isn't decimal (in which case the flag is enough). -- ___ Python tracker <http://bugs.python.org/issue12736> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13793] hasattr, delattr, getattr fail with unnormalized names
New submission from Jim Jewett : The documentation for hasattr, getattr, and delattr state that they are equivalent to object.attribute access; this isn't quite true, because object.attribute uses a NFKC-normalized version of the string as only the secondary location, while hasattr, getattr, and delattr (assuming an object rather than an Identifier or string) don't seem to do the normalization at all. I think the simplest fix would be to normalize and retry when hasattr, getattr, and delattr fail with a string, but I'm not sure that normalization shouldn't be the only string tried. >>> o.º Traceback (most recent call last): File "", line 1, in o.º AttributeError: 'Object' object has no attribute 'o' >>> o.o Traceback (most recent call last): File "", line 1, in o.o AttributeError: 'Object' object has no attribute 'o' >>> o.º=[] >>> hasattr(o, "º") False >>> getattr(o, "º") Traceback (most recent call last): File "", line 1, in getattr(o, "º") AttributeError: 'Object' object has no attribute 'º' >>> delattr(o, "º") Traceback (most recent call last): File "", line 1, in delattr(o, "º") AttributeError: º >>> o.º [] >>> o.º is o.o True >>> o.o [] >>> del o.º >>> o.o Traceback (most recent call last): File "", line 1, in o.o AttributeError: 'Object' object has no attribute 'o' >>> o.º = 5 >>> hasattr(o, "º") False >>> hasattr(o, "o") True >>> hasattr(o, "o") True >>> o.º 5 >>> delattr(o, "o") >>> o.º -- components: Unicode messages: 151320 nosy: Jim.Jewett, ezio.melotti priority: normal severity: normal status: open title: hasattr, delattr, getattr fail with unnormalized names ___ Python tracker <http://bugs.python.org/issue13793> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13793] hasattr, delattr, getattr fail with unnormalized names
Jim Jewett added the comment: Why is normalization in getattr unacceptable? I won't pretend to *like* it, but the difference between two canonically equal strings really is (by definition) just a representational issue. Would it be OK to normalize in object's own implementation, so that custom classes could avoid the normalization, but it would happen by default? -- ___ Python tracker <http://bugs.python.org/issue13793> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13165] Integrate stringbench in the Tools directory
Jim Jewett added the comment: The URL got mangled in at least my browser, so I'm repasting it on its own line: http://svn.python.org/projects/sandbox/trunk/stringbench -- nosy: +Jim.Jewett ___ Python tracker <http://bugs.python.org/issue13165> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Jim Jewett added the comment: To be more explicit about Martin A. Lemburg's msg151121 (which I agree with): Count the collisions on a single lookup. If they exceed a threshhold, do something different. Martin's strawman proposal was threshhold=1000, and raise. It would be just as easy to say "whoa! 5 collisions -- time to use the alternative hash instead" (and, possibly, to issue a warning). Even that slight tuning removes the biggest objection, because it won't ever actually fail. Note that the use of a (presumably stronger 2nd) hash wouldn't come into play until (and unless) there was a problem for that specific key in that specific dictionary. For the normal case, nothing changes -- unless we take advantage of the existence of a 2nd hash to simplify the first few rounds of collision resolution. (Linear probing is more cache-friendly, but also more vulnerable to worst-case behavior -- but if probing stops at 4 or 8, that may not matter much.) For quick scripts, the 2nd hash will almost certainly never be needed, so startup won't pay the penalty. The only down side I see is that the 2nd (presumably randomized) hash won't be cached without another slot, which takes more memory and shouldn't be done in a bugfix release. -- nosy: +Jim.Jewett ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13820] 2.6 is no longer in the future
New submission from Jim Jewett : http://docs.python.org/reference/lexical_analysis.html Changed in version 2.5: Both as and with are only recognized when the with_statement future feature has been enabled. It will always be enabled in Python 2.6. See section The with statement for details. Note that using as and with as identifiers will always issue a warning, even when the with_statement future directive is not in effect. That was reasonable wording for 2.5 itself, but at this point, I think it would be simpler to add a Changed in version 2.6 entry. Perhaps: Changed in version 2.5: Using as or with as identifiers triggers a warning. Using them as statements requires the with_statement future feature. Changed in Python 2.6: as and with became full keywords. -- assignee: docs@python components: Documentation messages: 151595 nosy: Jim.Jewett, docs@python priority: normal severity: normal status: open title: 2.6 is no longer in the future type: enhancement versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue13820> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13821] misleading return from isidentifier
New submission from Jim Jewett : Python identifiers are in NFKC form; string method .isidentifier() returns true on strings that are not in that form. In some contexts, these non-canonical strings will be replaced with their NFKC equivalent, but in other contexts (such as the builtins hasattr, getattr, delattr) they will not. >>> cha=chr(170) >>> cha 'ª' >>> cha.isidentifier() True >>> uc.normalize("NFKC", cha) 'a' >>> obj.ª = 5 >>> hasattr(obj, "ª") False >>> obj.a 5 -- components: Unicode messages: 151597 nosy: Jim.Jewett, ezio.melotti priority: normal severity: normal status: open title: misleading return from isidentifier ___ Python tracker <http://bugs.python.org/issue13821> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13821] misleading return from isidentifier
Jim Jewett added the comment: My preference would be for non_NFKC.isidentifier() to return False, but that may be a problem for backwards compatibility. It *may* be worth adding an asidentifier() method that returns either False or the canonicalized string that should be used instead. At a minimum, the documentation (including docstring) should warn that the method doesn't check for NFKC form, and that if the input is not ASCII, the caller should first ensure this by calling str1=unicodedata.normalize("NFKC", str1) -- ___ Python tracker <http://bugs.python.org/issue13821> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13821] misleading return from isidentifier
Jim Jewett added the comment: @Benjamin -- the catch is, if it isn't already in NFKC form, then python won't really accept it as an identifier. Sometimes it will silently canonicalize it for you so that it seems to work, but other times it won't. And program calling isidentifier is likely to be a program that uses the strings directly for access, instead of always routing them through the parser. -- ___ Python tracker <http://bugs.python.org/issue13821> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13828] Further improve casefold documentation
New submission from Jim Jewett : > http://hg.python.org/cpython/rev/0b5ce36a7a24 > changeset: 74515:0b5ce36a7a24 > + Casefolding is similar to lowercasing but more aggressive because it is > + intended to remove all case distinctions in a string. For example, the > German > + lowercase letter ``'ß'`` is equivalent to ``"ss"``. Since it is already > + lowercase, :meth:`lower` would do nothing to ``'ß'``; :meth:`casefold` > + converts it to ``"ss"``. Perhaps add the recommendation to canonicalize as well. A complete, but possibly too long, try is below: Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. For example, the German lowercase letter ``'ß'`` is equivalent to ``"ss"``. Since it is already lowercase, :meth:`lower` would do nothing to ``'ß'``; :meth:`casefold` converts it to ``"ss"``. Note that most case-insensitive matches should also match compatibility equivalent characters. The casefolding algorithm is described in section 3.13 of the Unicode Standard. Per D146, a compatibility caseless match can be achieved by from unicodedata import normalize def caseless_compat(string): nfd_string = normalize("NFD", string) nfkd1_string = normalize("NFKD", nfd_string.casefold()) return normalize("NFKD", nfkd1_string.casefold()) -- assignee: docs@python components: Documentation messages: 151644 nosy: Jim.Jewett, benjamin.peterson, docs@python priority: normal severity: normal status: open title: Further improve casefold documentation versions: Python 3.3 ___ Python tracker <http://bugs.python.org/issue13828> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13828] Further improve casefold documentation
Jim Jewett added the comment: Frankly, I do think that sample code is too long, but correctness matters ... perhaps a better solution would be to add either a method or a unicodedata function that does the work, then the extra note could just say Note that most case-insensitive matches should also match compatibility equivalent characters; see unicodedata.compatibity_casefold -- ___ Python tracker <http://bugs.python.org/issue13828> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13832] tokenization assuming ASCII whitespace; missing multiline case
New submission from Jim Jewett : Parser/parsetok.c was recently changed (e.g. http://hg.python.org/cpython/rev/2bd7f40108b4 ) to raise an error if multiple statements were found in a single-statement compile call. It sensibly ignores trailing whitespace and comments. Unfortunately, (1) It looks only at (c == ' ' || c == '\t' || c == '\n' || c == '\014') as opposed to using Py_UNICODE_ISSPACE(ch) (2) It assumes that a "#" means the rest of the line is OK, instead of looking for additional linebreaks. Not sure whether to mark this a bug or an enhancement, since it is already strictly better than the 3.2 behavior of never warning about extra text. -- components: Interpreter Core messages: 151652 nosy: Jim.Jewett priority: normal severity: normal status: open title: tokenization assuming ASCII whitespace; missing multiline case versions: Python 3.3 ___ Python tracker <http://bugs.python.org/issue13832> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13832] tokenization assuming ASCII whitespace; missing multiline case
Jim Jewett added the comment: Ignoring non-ascii whitespace is defensible, and I agree that it should match the rest of the parser. Ignoring 2nd lines is still a problem, and supposedly part of what got fixed. Test case: s="""x=5 # comment x=6 """ compile(s, "", 'single') -- ___ Python tracker <http://bugs.python.org/issue13832> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Jim Jewett added the comment: Marc-Andre Lemburg: >> So you get the best of both worlds and randomization would only >> kick in when it's really needed to keep the application running. Charles-François Natali > The only argument in favor the collision counting is that it will not > break applications relying on dict order: There is also the "taxes suck" argument; if hashing is made complex, then every object (or at least almost every string) pays a price, even if it will never be stuck in a dict big enough to matter. With collision counting, there are no additional operations unless and until there is at least one collision -- in other words, after the base hash algorithm has already started to fail for that particular piece of data. In fact, the base algorithm can be safely simplified further, precisely because it does not need to be quite as adequate for reprobes on data that does have at least one collision. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Jim Jewett added the comment: On Fri, Jan 20, 2012 at 7:58 AM, STINNER Victor > If the hash output depends on an argument, the result cannot be > cached. They can still be cached in a separate dict based on id, rather than string contents. It may also be possible to cache them in the dict itself; for a string-only dict, the hash of each entry is already cached on the object, and the cache member of the entry is technically redundant. Entering a key with the alternative hash can also switch the lookup function to one that handles that possibility, just as entering a non-string key currently does. > It would require to add an > optional argument to hash functions, or add a new function to some > (or all?) builtin types. For backports, the alternative hashing could be done privately within dict and set, and would not require new slots on other types. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Jim Jewett added the comment: On Mon, Jan 23, 2012 at 4:39 PM, Marc-Andre Lemburg wrote: > Running (part of (*)) the test suite with debugging enabled on a 64-bit > machine shows that slot collisions are much more frequent than > hash collisions, which only account for less than 0.01% of all > collisions. Even 1 in 10,000 seems pretty high, though I suppose it is a result of non-random input. (For a smalldict with 8 == 2^3 slots, on a 64-bit machine, true hash collisions "should" only account for 1 in 2^61 slot collisions.) > It also shows that slot collisions in the low 1-10 range are > most frequent, with very few instances of a dict lookup > reaching 20 slot collisions (less than 0.0002% of all > collisions). Thus the argument that collisions > N implies (possibly malicious) data that really needs a different hash -- and that this dict instance in particular should take the hit to use an alternative hash. (Do note that this alternative hash could be stored in the hash member of the PyDictEntry; if anything actually *equal* to the key comes along, it will have gone through just as many collisions, and therefore also have been rehashed.) > The great number of cases with 1 or 2 slot collisions surprised > me. It seems that there's potential for improvement of > the perturbation formula left. In retrospect, this makes sense. for (perturb = hash; ; perturb >>= PERTURB_SHIFT) { i = (i << 2) + i + perturb + 1; If two objects collided then they have the same last few last few bits in their hashes -- which means they also have the same last few bits in their initial perturb. And since the first probe is to slot 6i+1, it funnels down to only even consider half the slots until the second probe. Also note that this explains why Randomization could make the Django tests fail, even though 64-bit users haven't complained. The initial hash(&mask) is the same, and the first probe is the same, and (for a small enough dict) so are the next several. In a dict with 2^12 slots, the first 6 tries will be the same ... so I doubt the test cases have sufficiently large amounts of sufficiently unlucky data to notice very often -- unless the hash itself is changed, as in the patch. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Jim Jewett added the comment: On Wed, Jan 25, 2012 at 6:06 AM, Dave Malcolm added the comment: > hybrid-approach-dmalcolm-2012-01-25-001.patch > As per haypo's random-8.patch, a randomization seed is read at startup. Why not wait until it is needed? I suspect a lot of scripts will never need it for any dict, so why add the overhead to startup? > Once a dict has transitioned to paranoid mode, it isn't using > PyObject_Hash anymore, and thus isn't using cached object values The alternative hashes could be stored in an id-keyed dict performing a more expensive calculation, but I believe this calculation is essentially constant-time. > > This preserves hash() and dict order for the cases where you're not under > attack, and gracefully handles the attack without having to raise an > exception: it doesn't introduce any new exception types. > > It preserves ABI, assuming no-one else is reusing ma_smalltable. > > It is suitable for backporting to 3.2, 2.7, and earlier (I'm investigating > fixing this going all the way back to Python 2.2) > > Under the old implementation, there were 4 types of PyDictObject, given these > two booleans: > * "small vs large" i.e ma_table == ma_smalltable vs ma_table != ma_smalltable > * "all keys are str" vs arbitary keys i.e ma_lookdict == lookdict_unicode vs > lookdict > > Under this implementation, this doubles to 8 kinds, adding the boolean: > * normal hash vs randomized hash (normal vs "paranoid"). > > This is expressed via the ma_lookdict callback, adding two new variants, > lookdict_unicode_paranoid and lookdict_paranoid > > Note that if a paranoid dict goes small again (ma_table == ma_smalltable), it > stays paranoid. This is for simplicity: it avoids having to rebuild all of > the non-randomized me_hash values again (which could fail). > > Naturally the patch adds selftests. I had to add some diagnostic methods to > support them; dict gains _stats() and _make_paranoid() methods, and sys gains > a _getrandomizedhash() method. These could be hidden more thoroughly if need > be (see DICT_PROTECTION_TRACKING in dictobject.c). Amongst other things, the > selftests measure wallclock time taken for various dict operations (and so > might introduce failures on a heavily-loaded machine, I guess). > > Hopefully this approach is a viable way forward. > > Caveats and TODO items: > > TODO: I haven't yet tuned the safety threshold. According to > http://bugs.python.org/issue13703#msg151850: >> slot collisions are much more frequent than >> hash collisions, which only account for less than 0.01% of all >> collisions. >> >> It also shows that slot collisions in the low 1-10 range are >> most frequent, with very few instances of a dict lookup >> reaching 20 slot collisions (less than 0.0002% of all >> collisions). > > This suggests that the threshold of 32 slot/hash collisions per lookup may > already be high enough. > > TODO: in a review of an earlier version of the complexity detection idea, > Antoine Pitrou suggested that make the protection scale factor be a run-time > configurable value, rather than a #define. This isn't done yet. > > TODO: run more extensive tests (e.g. Django and Twisted), monitoring the > worst-case complexity that's encountered > > TODO: not yet benchmarked and optimized. I want to get feedback on the > approach before I go in and hand-optimize things (e.g. by hand-inlining > check_iter_count, and moving the calculations out of the loop etc). I > believe any performance issues ought to be fixable, in that the we can get > the cost of this for the "we're not under attack" case to be negligible, and > the "under attack" case should transition from O(N^2) to O(N), albeit it with > a larger constant factor. > > TODO: this doesn't cover sets, but assuming this approach works, the patch > can be extended to cover it in an analogous way. > > TODO: should it cover PyMemoryViewObject, buffer object, etc? > > TODO: should it cover the hashing in Modules/expat/xmlparse.c? FWIW I rip > this code out when doing my downstream builds in RHEL and Fedora, and instead > dynamically link against a system copy of expat > > TODO: only tested on Linux so far (which is all I've got). Fedora 15 x86_64 > fwiw > > Doc/using/cmdline.rst | 6 > Include/bytesobject.h | 2 > Include/object.h | 8 > Include/pythonrun.h | 2 > Include/unicodeobject.h | 2 > Lib/os.py | 17 -- > Lib/test/regrtest.py | 5 > Lib/test/test_dict.py | 298 + &g
[issue13703] Hash collision security issue
Jim Jewett added the comment: Sorry; hit the wrong key... intended message below: On Wed, Jan 25, 2012 at 6:06 AM, Dave Malcolm added the comment: [lots of good stuff] > hybrid-approach-dmalcolm-2012-01-25-001.patch > As per haypo's random-8.patch, a randomization seed is read at > startup. Why not wait until it is needed? I suspect a lot of scripts will never need it for any dict, so why add the overhead to startup? > Once a dict has transitioned to paranoid mode, it isn't using > PyObject_Hash anymore, and thus isn't using cached object values The alternative hashes could be stored in an id-keyed WeakKeyDictionary; that would handle at least the normal case of using exactly the same string for the lookup. > Note that if a paranoid dict goes small again > (ma_table == ma_smalltable), it stays paranoid. As I read it, that couldn't happen, because paranoid dicts couldn't shrink at all. (Not letting them shrink beneath 2*PyDict_MINSIZE does seem like a reasonable solution.) Additional TODOs... The checks for Unicode and Dict should not be exact; it is OK to do on a subclass so long as they are using the same lookdict (and, for unicode, the same eq). Additional small strings should be excluded from the new hash, to avoid giving away the secret. At a minimum, single-char strings should be excluded, and I would prefer to exclude all strings of length <= N (where N defaults to 4). -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Jim Jewett added the comment: On Wed, Jan 25, 2012 at 1:05 PM, Antoine Pitrou added the comment: > It looks like that approach will break any non-builtin type (in either C > or Python) which can compare equal to bytes or str objects. If that's > the case, then I think the likelihood of acceptance is close to zero. (1) Isn't that true of *any* patch that changes hashing? (Thus the PYTHONHASHSEED=0 escape hatch.) (2) I think it would still work for the lookdict_string (or lookdict_unicode) case ... which is the normal case, and also where most vulnerabilities should appear. (3) If the alternate hash is needed for non-string keys, there is no perfect resolution, but I suppose you could get closer with if obj == str(obj): newhash=hash(str(obj)) -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue6056] socket.setdefaulttimeout affecting multiprocessing Manager
Jim Jewett added the comment: The wording in 138415 suggested this patch was changing socket to not support timeouts -- which would be unacceptable. But the actual patch only seems to touch multiprocessing/connection.py -- a far more reasonable change. Unfortunately, the patch no longer applies to the development tip. I *think* the places you wanted to change are still there, and just moved. (1) Is it sufficiently clear that this is not-a-feature to justify a backport? (2) Are the problems already fixed by some of the other changes? (It doesn't look like it, but I'm not sure.) (3) Can you produce an updated patch? (The current tip is http://hg.python.org/cpython/file/fec45282dc28/Lib/multiprocessing/connection.py ) (4) If I understand the intent, then s.setblocking(True) would be slightly more clear than s.settimeout(None), though that change obviously isn't essential. -- nosy: +Jim.Jewett ___ Python tracker <http://bugs.python.org/issue6056> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13867] misleading comment in weakrefobject.h
New submission from Jim Jewett : http://hg.python.org/cpython/file/fec45282dc28/Include/weakrefobject.h#l54 The comment makes sense -- but doesn't appear to be true, so perhaps it is the macro that should change. /* This macro calls PyWeakref_CheckRef() last since that can involve a function call; this makes it more likely that the function call will be avoided. */ #define PyWeakref_Check(op) \ (PyWeakref_CheckRef(op) || PyWeakref_CheckProxy(op)) -- assignee: docs@python components: Documentation, Extension Modules messages: 151983 nosy: Jim.Jewett, docs@python priority: normal severity: normal status: open title: misleading comment in weakrefobject.h ___ Python tracker <http://bugs.python.org/issue13867> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue10042] total_ordering
Jim Jewett added the comment: I like Nick Coghlan's suggestion in msg140493, but I think he was giving up too soon in the "or" cases, and I think the confusion could be slightly reduced by some re-spellings around return values and comments about short-circuiting. def not_op(op, other): # "not a < b" handles "a >= b" # "not a <= b" handles "a > b" # "not a >= b" handles "a < b" # "not a > b" handles "a <= b" op_result = op(other) if op_result is NotImplemented: return NotImplemented return not op_result def op_or_eq(op, self, other): # "a < b or a == b" handles "a <= b" # "a > b or a == b" handles "a >= b" op_result = op(other) if op_result is NotImplemented return self.__eq__(other) or NotImplemented if op_result: return True return self.__eq__(other) def not_op_and_not_eq(op, self, other): # "not (a < b or a == b)" handles "a > b" # "not a < b and a != b" is equivalent # "not (a > b or a == b)" handles "a < b" # "not a > b and a != b" is equivalent op_result = op(other) if op_result is NotImplemented: return NotImplemented if op_result: return False return self.__ne__(other) def not_op_or_eq(op, self, other): # "not a <= b or a == b" handles "a >= b" # "not a >= b or a == b" handles "a <= b" op_result = op(other) if op_result is NotImplemented: return self.__eq__(other) or NotImplemented if op_result: return self.__eq__(other) return True def op_and_not_eq(op, self, other): # "a <= b and not a == b" handles "a < b" # "a >= b and not a == b" handles "a > b" op_result = op(other) if op_result is NotImplemented: return NotImplemented if op_result: return self.__ne__(other) return False -- nosy: +Jim.Jewett ___ Python tracker <http://bugs.python.org/issue10042> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13870] false comment in collections/__init__.py ordered dict
New submission from Jim Jewett : http://hg.python.org/cpython/file/tip/Lib/collections/__init__.py#l37 states that the prev/next links are weakref proxies; as of http://hg.python.org/cpython/diff/3977dc349ae7/Lib/collections.py this is no longer true of the next links. It could be fixed by changing # The prev/next links are weakref proxies (to prevent circular references). to # The prev links are weakref proxies (to prevent circular references). -- components: Library (Lib) files: collections_init.patch keywords: patch messages: 151996 nosy: Jim.Jewett priority: normal severity: normal status: open title: false comment in collections/__init__.py ordered dict Added file: http://bugs.python.org/file24326/collections_init.patch ___ Python tracker <http://bugs.python.org/issue13870> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13871] namedtuple does not normalize field names when sanitizing
New submission from Jim Jewett : collections.namedtuple raises a ValueError if any of the field names are not valid identifiers, or are duplicates. It does not normalize the identifiers when checking for duplicates. (Similar issue with the typename) >>> namedtuple("dup_fields", ["a", "a"]) Traceback (most recent call last): File "", line 1, in namedtuple("dup_fields", ["a", "a"]) File "C:\python32\lib\collections.py", line 345, in namedtuple raise ValueError('Encountered duplicate field name: %r' % name) ValueError: Encountered duplicate field name: 'a' >>> namedtuple("nfk_tester", ["a", "ª"]) Traceback (most recent call last): File "", line 1, in namedtuple("nfk_tester", ["a", "ª"]) File "C:\python32\lib\collections.py", line 365, in namedtuple raise SyntaxError(e.msg + ':\n\n' + class_definition) File "", line None SyntaxError: duplicate argument 'a' in function definition: ... and >>> namedtuple("justª", "ª") Traceback (most recent call last): File "", line 1, in namedtuple("justª", "ª") File "C:\python32\lib\collections.py", line 366, in namedtuple result = namespace[typename] KeyError: 'justª' -- messages: 151997 nosy: Jim.Jewett priority: normal severity: normal status: open title: namedtuple does not normalize field names when sanitizing ___ Python tracker <http://bugs.python.org/issue13871> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Jim Jewett added the comment: On Thu, Jan 26, 2012 at 8:19 PM, Antoine Pitrou wrote: > If I read your [Martin v. Löwis' ] patch correctly, collisions will > produce additional allocations ... That's a pretty massive > change in memory consumption for string dicts Not in practice. The point I first missed is that this triggers only when the hash is *fully* equal; if the hashes are merely equal after masking, then today's try-another-slot approach will still be used, even for strings. Per ( http://bugs.python.org/issue13703#msg151850 ) Marc-Andre Lemburg's measurements, full-hash equality explains only 1 in 10,000 collisions. From a performance standpoint, we can almost ignore a case that rare; it is almost certainly dwarfed by resizing. I *am* a bit concerned that the possible contents of a dictentry change; this could cause easily-missed-in-testing breakage for anything that treats table as an array. That said, it doesn't seem much worse than the search finger, and there seemed to be recent consensus that even promising an exact dict -- subclasses not allowed -- didn't mean that direct access was sanctioned. So it still seems safer than changing the de-facto iteration order. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Jim Jewett added the comment: Given PYTHONHASHSEED, what is the point of PYTHONHASHRANDOMIZATION? Alternative: On startup, python reads a config file with the seed (which defaults to zero). Add a function to write a random value to that config file for the next startup. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Jim Jewett added the comment: On Mon, Jan 30, 2012 at 12:31 PM, Dave Malcolm added the comment: > It's useful for the selftests, so I've kept PYTHONHASHSEED. The reason to read PYTHONHASHSEED was so that multiple members of a cluster could use the same hash. It would have been nice to have fewer environment variables, but I'll grant that it is hard to say "use something random that we have *not* precomputed" without either a config file or a magic value for PYTHONHASHSEED. -jJ -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Jim Jewett added the comment: On Mon, Feb 6, 2012 at 8:12 AM, Marc-Andre Lemburg wrote: > > Marc-Andre Lemburg added the comment: > > Antoine Pitrou wrote: >> >> The simple collision counting approach leaves a gaping hole open, as >> demonstrated by Frank. > Could you elaborate on this ? > Note that I've updated the collision counting patch to cover both > possible attack cases I mentioned in > http://bugs.python.org/issue13703#msg150724. > If there's another case I'm unaware of, please let me know. The problematic case is, roughly, (1) Find out what N will trigger collision-counting countermeasures. (2) Insert N-1 colliding entries, to make it as slow as possible. (3) Keep looking up (or updating) the N-1th entry, so that the slow-as-possible-without-countermeasures path keeps getting rerun. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Jim Jewett added the comment: On Mon, Feb 6, 2012 at 12:07 PM, Marc-Andre Lemburg wrote: > > Marc-Andre Lemburg added the comment: > > Jim Jewett wrote: >> The problematic case is, roughly, >> (1) Find out what N will trigger collision-counting countermeasures. >> (2) Insert N-1 colliding entries, to make it as slow as possible. >> (3) Keep looking up (or updating) the N-1th entry, so that the >> slow-as-possible-without-countermeasures path keeps getting rerun. > Since N is constant, I don't see how such an "attack" could be used > to trigger the O(n^2) worst-case behavior. Agreed; it tops out with a constant, but if it takes only 16 bytes of input to force another run through a 1000-long collision, that may still be too much leverage. > BTW: If you set the limit N to e.g. 100 (which is reasonable given > Victor's and my tests), Agreed. Frankly, I think 5 would be more than reasonable so long as there is a fallback. > the time it takes to process one of those > sets only takes 0.3 ms on my machine. That's hardly usable as basis > for an effective DoS attack. So it would take around 3Mb to cause a minute's delay... -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Jim Jewett added the comment: On Mon, Feb 6, 2012 at 1:53 PM, Frank Sievertsen wrote: >>> BTW: If you set the limit N to e.g. 100 (which is reasonable given >>> Victor's and my tests), >> So it would take around 3Mb to cause a minute's delay... > How did you calculate that? 16 bytes/entry * 3300 entries/second * 60 seconds/minute But if there is indeed a way to cut that 16 bytes/entry, that is worse. Switching dict implementations at 5 collisions is still acceptable, except from a complexity standpoint. -jJ -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13958] Comment _PyUnicode_FromId
New submission from Jim Jewett : Add a comment explaining why _PyUnicode_FromId can (and should) assume ASCII-only identifiers. /* PEP3131 guarantees that all python-internal identifiers are ASCII-only. Violating this would break some supported C compilers. */ See http://mail.python.org/pipermail/python-dev/2012-February/116234.html -- components: Unicode messages: 152775 nosy: Jim.Jewett, ezio.melotti priority: normal severity: normal status: open title: Comment _PyUnicode_FromId versions: Python 3.3 ___ Python tracker <http://bugs.python.org/issue13958> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13958] Comment _PyUnicode_FromId
Jim Jewett added the comment: On Mon, Feb 6, 2012 at 4:25 PM, Martin v. Löwis wrote: > Martin v. Löwis added the comment: > This has nothing to do with PEP 3131. Python could (and does) > support non-ASCII identifiers just fine, regardless of C compiler > limitations. I *think* you're saying that the _Py_Identifier( ) is a smaller set than identifiers in general. Would the following be more accurate? /* PEP3131 does allow non-ASCII identifiers in user code, but limits their use within the implementation itself. In particular, a _Py_Identifier may be passed directly to C code; such identifiers are restricted to ASCII to avoid breaking some supported C compilers. */ -- ___ Python tracker <http://bugs.python.org/issue13958> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13958] Comment _PyUnicode_FromId
Jim Jewett added the comment: And is there a way to characterize the compilers that would break? Is it a few specific compilers, or "compilers that do not implement UTF8, which is not required by the C standard", or ... -- ___ Python tracker <http://bugs.python.org/issue13958> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13958] Comment _PyUnicode_FromId
Jim Jewett added the comment: After clarification, the original change was backed out. These are C Identifiers, and nothing beyond ASCII is guaranteed, but other characters are in practice possible. -- resolution: -> fixed status: open -> closed ___ Python tracker <http://bugs.python.org/issue13958> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13977] importlib simplification
New submission from Jim Jewett : http://hg.python.org/cpython/file/aba513307f78/Lib/importlib/_bootstrap.py#l974 974 # The hell that is fromlist ... 975 if not fromlist: 976 # Return up to the first dot in 'name'. This is complicated by the fact 977 # that 'name' may be relative. 978 if level == 0: 979 return sys.modules[name.partition('.')[0]] 980 elif not name: 981 return module 982 else: 983 cut_off = len(name) - len(name.partition('.')[0]) 984 return sys.modules[module.__name__[:-cut_off]] If level is 0, should name == module.__name__? Yes. If so, then I think that simplifies to if not name: return module genericname=module.__name__.rpartition(".")[0] return sys.modules[genericname] Seems right. Can you file a bug and assign it to me? -- messages: 152970 nosy: Jim.Jewett, brett.cannon priority: normal severity: normal status: open title: importlib simplification ___ Python tracker <http://bugs.python.org/issue13977> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Jim Jewett added the comment: On Fri, Feb 10, 2012 at 6:02 PM, STINNER Victor > - PYTHONHASHSEED doc is not clear: it should be mentionned > that the variable is ignored if PYTHONHASHRANDOMIZATION > is not set *That* is why this two-envvar solution bothers me. PYTHONHASHSEED has to be a string anyhow, so why not just get rid of PYTHONHASHRANDOMIZATION? Use PYTHONHASHSEED=random to use randomization. Other values that cannot be turned into an integer will be (currently) undefined. (You may want to raise a fatal error, on the assumption that errors should not pass silently.) A missing PYTHONHASHSEED then has the pleasant interpretation of defaulting to "0" for backwards compatibility. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue8087] Unupdated source file in traceback
Jim Jewett added the comment: Martin v. Löwis (loewis) wrote: > Displaying a warning whenever the code has changed on disk is > clearly unacceptable As clarified, the request is only for when a traceback is being created (or perhaps even only for when one is being printed). I agree that we don't want to watch every file every time any code is run, but by the time a traceback is being displayed, any tight loops are ending. Nick Coghlan (ncoghlan) wrote: > There are a few different cases: ... > 2. Source has been changed, but module has not been reloaded ... > 3. Source has been changed, module has been reloaded, but object ... Given that a traceback is being displayed, I think it is reasonable to rerun the find-module portion of import, and verify that there is not stale byte-code. Frankly, I think it would be worth storing a file timestamp on modules, and verifying that whatever-would-be-imported-if-imported-now matches that timestamp. This would also catch case (3). I also think that -- on traceback display -- it might be worth verifying that the code's __globals__ is the __globals__ associated with the module of that name in sys.modules. This would warn about some intentional manipulations, but would catch case (3) even more accurately. -- nosy: +Jim.Jewett ___ Python tracker <http://bugs.python.org/issue8087> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14014] codecs.StreamWriter.reset contract not fulfilled
New submission from Jim Jewett : def reset(self): """ Flushes and resets the codec buffers used for keeping state. Calling this method should ensure that the data on the output is put into a clean state, that allows appending of new fresh data without having to rescan the whole stream to recover state. """ pass This does not ensure that the stream is flushed, as the docstring promises. I believe the following would work better. def reset(self): """ Flushes and resets the codec buffers used for keeping state. Calling this method should ensure that the data on the output is put into a clean state, that allows appending of new fresh data without having to rescan the whole stream to recover state. """ if hasattr(self.stream, "flush"): self.stream.flush() -- components: Unicode messages: 153354 nosy: Jim.Jewett, ezio.melotti priority: normal severity: normal status: open title: codecs.StreamWriter.reset contract not fulfilled type: behavior versions: Python 3.2 ___ Python tracker <http://bugs.python.org/issue14014> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14015] surrogateescape largely missing from documentation
New submission from Jim Jewett : Recent discussion on the mailing lists and in http://bugs.python.org/issue13997 make it clear that the best way to get python2 results for "ASCII-in-the-parts-I-might-process-or-change" is to replace f = open(fname) with f = open(fname, encoding="ascii", errors="surrogateescape") Unfortunately, surrogateescape (let alone this recipe) is not easily discoverable. http://docs.python.org/dev/library/functions.html#open lists 5 error-handlers -- but not this one. It says that other error handlers are possible if they are registered with http://docs.python.org/dev/library/codecs.html#codecs.register_error but I haven't found a way to determine which error handlers are already registered. The codecs.register (as opposed to register_error) documentation does list it as a possible value, but that is the only reference. The other 5 error handlers are also available as module-level functions within the codecs module, and have their own documenation sections within http://docs.python.org/dev/library/codecs.html Neither help(open) nor import codecs; help(codecs) provides any hints of the existence of surrogateescape. Both explicitly suggest that it does not exist, by enumerating other values. -- assignee: docs@python components: Documentation, Unicode messages: 153359 nosy: Jim.Jewett, docs@python, ezio.melotti priority: normal severity: normal status: open title: surrogateescape largely missing from documentation versions: Python 3.1, Python 3.2, Python 3.3 ___ Python tracker <http://bugs.python.org/issue14015> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13997] Clearly explain the bare minimum Python 3 users should know about Unicode
Jim Jewett added the comment: See bugs/python.org/issue14015 for one reason that surrogateescape isn't better known. -- nosy: +Jim.Jewett ___ Python tracker <http://bugs.python.org/issue13997> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13703] Hash collision security issue
Jim Jewett added the comment: On Mon, Feb 13, 2012 at 3:37 PM, Dave Malcolm added the comment: > * added comments about the specialcasing of length 0: > /* > We make the hash of the empty string be 0, rather than using > (prefix ^ suffix), since this slightly obfuscates the hash secret > */ Frankly, other short strings may give away even more, because you can put several into the same dict. I would prefer that the randomization not kick in until strings are at least 8 characters, but I think excluding length 1 is a pretty obvious win. -- ___ Python tracker <http://bugs.python.org/issue13703> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14067] Avoid more stat() calls in importlib
Jim Jewett added the comment: As long as the interpreter knows about about files that *it* wrote, no repeat checks during startup seems utterly reasonable; sneaking in a new or changed file is inherently a race condition. I think it would also be reasonable for general use, so long as there was also a way to say "for this particular directory, always check". -- nosy: +Jim.Jewett ___ Python tracker <http://bugs.python.org/issue14067> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue13903] New shared-keys dictionary implementation
Jim Jewett added the comment: As of Feb 28, 2012, the PEP mentions an additional optimization of storing the values in an array indexed by (effectively) key insertion order, rather than key position. ("Alternative Implementation") It states that this would reduce memory usage for the values array by 1/3. 1/3 is a worst-case measurement; average is 1/2. (At savings of less than 1/3, the keys would resize, to initial savings of 2/3. And yes, that means in practice, the average savings would be greater than half, because the frequency of dicts of size N decreases with N.) It states that the keys table would need an additional "values_size" field, but in the absence of dummies, this is just ma_used. Note that this would also simplify resizing, as the values arrays would not have to be re-ordered, and would not have to be modified at all unless/until that particular instance received a value for a position beyond its current size. -- nosy: +Jim.Jewett ___ Python tracker <http://bugs.python.org/issue13903> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14205] Raise an error if a dict is modified during a lookup
Jim Jewett added the comment: Can't this be triggered by non-malicious code that just happened to have a python comparison and get hit with a thread switch? I'm not sure how often it happens, but today it would not be visible to the user; after the patch, users will see a sporadic failure that they can't easily defend against. Would it be worth adding a counter to lookdict, so that one modification is OK, but 5 aren't? -- nosy: +Jim.Jewett ___ Python tracker <http://bugs.python.org/issue14205> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue14205] Raise an error if a dict is modified during a lookup
Jim Jewett added the comment: On Tue, Mar 6, 2012 at 11:56 AM, Mark Shannon wrote: > Jim Jewett: >> Can't this be triggered by non-malicious code that just happened >> to have a python comparison and get hit with a thread switch? > So, they are writing to a dict in one thread while reading from the > same dict in another thread, without any external locks and with > keys written in Python. Correct. For example, it could be a configuration manager, or a cache, or even a worklist. If they're just adding new keys, or even deleting other (==> NOT the one being looked up) keys, why should that keep them from finding the existing, unchanged keys? >> I'm not sure how often it happens, but today it would not be visible >> to the user; after the patch, users will see a sporadic failure that >> they can't easily defend against. > I suspect, they are already seeing sporadic failures. How? The chain terminates as soon as the dict doesn't resize; without malicious intent, the odds of hitting several resizes in a row are so miniscule that it probably hasn't even slowed them down. -- ___ Python tracker <http://bugs.python.org/issue14205> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7652] Merge C version of decimal into py3k.
Jim Jewett added the comment: (1) I think this module would benefit greatly from a map explaining what each file does, and perhaps from some reorganization. As best I can yet tell, there are about ~130 files, over a dozen directories, but the only ones that directly affect the implementation are a subset (~33) of the *.c and *h files in Modules/_decimal/ (and not subdirectories). Even files that do affect the implementation, such as mpdecimal.c, also seem to have functions thrown in just for testing small pieces of functionality, such as Newton Division. There may also be some code that really isn't needed, except possibly for backwards compatibility, and could be #ifdef'ed or at least commented. For example, the comments above io.c function _mpd_strneq(const char *s, const char *l, const char *u, size_t n) mention working around the Turkish un/dotted-i problem when lowercasing -- but why is a decimal library even worried about casing? (2) Is assembly allowed? If not, please make it clear that vcdiv64.asm is just an optional speedup, and that the code doesn't rely upon it. (3) Are there parts of this library that provide functionality NOT in the current decimal library? If so, this should be at least documented, and perhaps either removed or exposed. -- nosy: +Jim.Jewett ___ Python tracker <http://bugs.python.org/issue7652> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com