[issue1778443] robotparser.py fixes

2007-09-18 Thread Jim Jewett


Jim Jewett
 added the comment:

On line 108 (new 104), spaces should probably be added on both sides of the 
comparison operator, instead of only after the ">=".

The "%s" changes might end up getting changed again as part of 2to3, but 
this is a clear improvement over status quo, particularly with the loops.

I recommend applying.

--
nosy: +jimjjewett

_
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1778443>
_
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1177] urllib* 20x responses not OK?

2007-09-18 Thread Jim Jewett

New submission from 
Jim Jewett
:

Under the http protocol, any 2xx response is OK.

urllib.py and urllib2.py hardcoded only response 200 (the most common).


http://bugs.python.org/issue912845 added 206 as acceptable to urllib2, but 
not any other 20x responses.  (It also didn't fix urllib.)

Suggested for 2.6, as it does change behavior.

(Also see duplicate http://bugs.python.org/issue971965 which I will try to 
close after opening this. )

--
components: Library (Lib)
messages: 56009
nosy: jimjjewett
severity: normal
status: open
title: urllib* 20x responses not OK?
type: behavior
versions: Python 2.6

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1177>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1177] urllib* 20x responses not OK?

2007-09-18 Thread Jim Jewett


Jim Jewett
 added the comment:

Jafo:  His fix is great for urllib2, but the same issue applies to the 
original urllib.

The ticket should not be closed until a similar fix is made to lines 330 and 
417 of urllib.py.

That is, change

"if errcode == 200:" to "if 200 <= errcode < 300:"

(Or, if rejecting the change, add a comment saying that it is left that way 
intentionally for backwards compatibility.)

-jJ

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1177>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1177] urllib* 20x responses not OK?

2007-09-24 Thread Jim Jewett


Jim Jewett
 added the comment:

The change still missed 
the httpS copy.  I'm 
attaching a minimal change.

I think it might be better 
to just combine the 
methods -- as was already 
done in Py3K.  
Unfortunately, the py3K 
code doesn't run cleanly 
in 2.5, and I haven't yet 
had a chance to test a 
backported equivalent.  
(Hopefully tonight.)

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1177>
__*** urllibhead.py 
--- urllib.py 
***
*** 435,441 
  # something went wrong with the HTTP status line
  raise IOError, ('http protocol error', 0,
  'got a bad status line', None)
! if errcode == 200:
  return addinfourl(fp, headers, "https:" + url)
  else:
  if data is None:
--- 435,443 
  # something went wrong with the HTTP status line
  raise IOError, ('http protocol error', 0,
  'got a bad status line', None)
! # According to RFC 2616, "2xx" code indicates that the client's
! # request was successfully received, understood, and accepted.
! if not (200 <= errcode < 300):
  return addinfourl(fp, headers, "https:" + url)
  else:
  if data is None:
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1209] IOError won't accept tuples longer than 3

2007-09-26 Thread Jim Jewett

New submission from 
Jim Jewett
:

EnvironmentError (including subclass IOError) has special treatment when 
constructed with a 2-tuple or 3-tuple.  A four-tuple turns off this special 
treatment (and was used by urllib for that reason).  As of 2.5, a four-tuple 
raises a TypeError instead of just turning off the special treatment.

--
components: Extension Modules
messages: 56150
nosy: jimjjewett
severity: normal
status: open
title: IOError won't accept tuples longer than 3
type: behavior
versions: Python 2.5

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1209>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1401] urllib2 302 POST

2007-11-14 Thread Jim Jewett

Jim Jewett added the comment:

> But you said that #2 solution was more RFC compliant... 
> Could you please quote the RFC part that describes this behaviour?

RFD2616
http://www.faqs.org/rfcs/rfc2616.html

section 4.3 Message Body ...

   The presence of a message-body in a request is signaled by the
   inclusion of a Content-Length or Transfer-Encoding header field in
   the request's message-headers. A message-body MUST NOT be included in
   a request if the specification of the request method (section 5.1.1)
   does not allow sending an entity-body in requests.

[I couldn't actually find a quote saying that GET has no body, but ... it 
doesn't.]

Section 10.3 Redirection 3xx says 

   The action
   required MAY be carried out by the user agent without interaction
   with the user if and only if the method used in the second request is
   GET or HEAD.

In other words, changing it to GET may not be quite pure, but leaving it as 
POST would technically mean that the user MUST confirm that the redirect is 
OK.  This MUST NOT becomes more explicit later, such as in 10.3.2 (301 Moved 
Permanently).  Section 10.3.3 (302 Found) says that 307 was added 
specifically to insist on keeping it a POST, and even 307 says it MUST NOT 
automatically redirect unless it can be confirmed by the user.

Which is why user agents change redirects to a GET and try that...

--
components: +XML -None
nosy: +jimjjewett

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1401>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1501] 0 ** 0 documentation

2007-11-26 Thread Jim Jewett

New submission from Jim Jewett:

http://docs.python.org/lib/typesnumeric.html contains a table listing the 
mathematical operators.  Please add a note to the final row (x ** y meaning 
x to the power y) indicating that Python has chosen to define 0**0==1
Note 6:  Python defines 0**0 to be 1.  For background, please see http://
en.wikipedia.org/wiki/Exponentiation#Zero_to_the_zero_power

This doc change should have prevented issue 1461; I *think* there have been 
similar issues in the past.

--
components: Documentation
messages: 57855
nosy: jimjjewett
severity: minor
status: open
title: 0 ** 0 documentation
type: rfe
versions: Python 2.5

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1501>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13604] update PEP 393 (match implementation)

2011-12-14 Thread Jim Jewett

New submission from Jim Jewett :

The implementation has a larger state.kind
Clarified wording on wstr_length and surrogate pairs.
Clarified that the canonical "data" format doesn't always have a data pointer.
Mentioned that calling PyUnicode_READY would finalize a string, so that it 
couldn't be resized.
Changed section head "Other macros" to "Finalization macro" and removed the 
non-existent PyUnicode_CONVERT_BYTES (there is a similarly named private macro).

--
files: pep-0393.txt.patch
keywords: patch
messages: 149497
nosy: Jim.Jewett
priority: normal
severity: normal
status: open
title: update PEP 393 (match implementation)
Added file: http://bugs.python.org/file23960/pep-0393.txt.patch

___
Python tracker 
<http://bugs.python.org/issue13604>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13604] update PEP 393 (match implementation)

2011-12-14 Thread Jim Jewett

Changes by Jim Jewett :


--
assignee:  -> docs@python
components: +Documentation
nosy: +docs@python
versions: +Python 3.3
Added file: http://bugs.python.org/file23961/pep-0393.txt

___
Python tracker 
<http://bugs.python.org/issue13604>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13604] update PEP 393 (match implementation)

2011-12-15 Thread Jim Jewett

Changes by Jim Jewett :


Added file: http://bugs.python.org/file23968/pep-0393.txt

___
Python tracker 
<http://bugs.python.org/issue13604>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13604] update PEP 393 (match implementation)

2011-12-15 Thread Jim Jewett

Jim Jewett  added the comment:

Updated to resolve most of Victor's concerns, but this meant enough changes 
that I'm not sure it quite counts as editorial only.

A few questions that I couldn't answer:

(1)  Upon string creation, do we want to *promise* to discard the UTF-8 and 
wstr, so that the caller can memory manage?

(2)  PyUnicode_AS_DATA(), Py_UNICODE_strncpy, Py_UNICODE_strncmp seemed to be 
there in the code I was looking at.

(3)  I can't justify the born-deprecated function "PyUnicode_AsUnicodeAndSize". 
 Perhaps rename it with a leading underscore?  Though I'm not sure it is really 
needed at all.

(4)  I tried to reword the "for compatibility" ... "redundant" part ... but I'm 
not sure I resolved it.

--

___
Python tracker 
<http://bugs.python.org/issue13604>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13604] update PEP 393 (match implementation)

2011-12-15 Thread Jim Jewett

Jim Jewett  added the comment:

>> So even if a third party module uses the legagy Unicode API, the PEP
>> 393 will still optimize the memory usage thanks to implicit calls to
>> PyUnicode_READY() (done everywhere in Python source code).

> ... unless they inspect a given Unicode string, in which case it
> will use twice the memory (or 1.5x).

Why is the utf-8 representation not cached when it is generated for ParseTuple 
et alia?

It seems like these parameters are likely to either be re-used as parameters 
(in which case caching makes sense) or not re-used at all (in which case, the 
whole string can go away).

> Well, I meant the resizing of strings that doesn't move the object
> in memory (i.e. unicode_resize).

This may easily fail because the new size can't be found at that location; 
wouldn't it be better to just encourage proper sizing in the first place?

>> (1)  Upon string creation, do we want to *promise* to discard
>> the UTF-8 and wstr, so that the caller can memory manage?

> I don't understand the question. Assuming "discards" means
> "releases" here, then there is no API which releases memory
> during creation of the string object - let alone that there is
> any promise to do so. I'm also not aware of any candidate buffer
> that you might want to release.

When a string is created from a wchar_t array, who is responsible for releasing 
the original wchar_t array?  As I read it now, Python doesn't release the 
buffer, and the caller can't because maybe Python just pointed to it as memory 
shared with the canonical representation.  

>> (2)  PyUnicode_AS_DATA(), Py_UNICODE_strncpy, Py_UNICODE_strncmp 
>> seemed to be there in the code I was looking at.

> That's very well possible. What's the question?

Victor listed them as missing.  I now suspect he meant "missing from the PEP 
list of deprecated functions and macros", and I just misunderstood.

--
Added file: http://bugs.python.org/file23970/pep-0393.txt

___
Python tracker 
<http://bugs.python.org/issue13604>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13604] update PEP 393 (match implementation)

2011-12-15 Thread Jim Jewett

Changes by Jim Jewett :


Added file: http://bugs.python.org/file23971/pep-0393v20111215.patch

___
Python tracker 
<http://bugs.python.org/issue13604>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13608] remove born-deprecated PyUnicode_AsUnicodeAndSize

2011-12-15 Thread Jim Jewett

New submission from Jim Jewett :

In reviewing issue 13604 (aligning PEP 393 with the implementation) Victor 
Stinner noticed that PyUnicode_AsUnicodeAndSize is new in 3.3, but that it is 
already deprecated (because it relies on the old PyUnicode type).  

This born-deprecated function is just a shortcut for PyUnicode_AsUnicode plus 
PyUnicode_GET_SIZE, and should be removed.

--
components: Unicode
messages: 149585
nosy: Jim.Jewett, ezio.melotti
priority: normal
severity: normal
status: open
title: remove born-deprecated PyUnicode_AsUnicodeAndSize
versions: Python 3.3

___
Python tracker 
<http://bugs.python.org/issue13608>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13604] update PEP 393 (match implementation)

2011-12-16 Thread Jim Jewett

Jim Jewett  added the comment:

>> Why is the utf-8 representation not cached when it is generated for
>> ParseTuple et alia?

My error -- I read something backwards.

>> When a string is created from a wchar_t array, who is responsible for
>> releasing the original wchar_t array?

> The caller.

OK, I'll document that.

>> As I read it now, Python
>> doesn't release the buffer, and the caller can't because maybe Python
>> just pointed to it as memory shared with the canonical
>> representation.

> But Python won't; it will always make a copy for itself.

I thought I found an example each way, but it is possible that the shared 
version was something python had already copied.  If not, I'll raise that as a 
separate issue to get the code changed.

(Note that I may not be able to look at this again until after Christmas, so 
I'm likely to go silent for a while.)

--
Added file: http://bugs.python.org/file23979/pep-0393.txt

___
Python tracker 
<http://bugs.python.org/issue13604>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13604] update PEP 393 (match implementation)

2011-12-16 Thread Jim Jewett

Changes by Jim Jewett :


Added file: http://bugs.python.org/file23980/pep-0393_20111216.txt.patch

___
Python tracker 
<http://bugs.python.org/issue13604>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13677] correct docstring for builtin compile

2011-12-29 Thread Jim Jewett

New submission from Jim Jewett :

The current docstring for compile suggests that the flags are strictly for 
selecting future statements.  These are not the only flags.

It also suggests that the source must be source code and the result will be 
bytecode, which isn't quite true.

I suggest changing:

"The flags argument, if present, controls which future statements influence the 
compilation of the code."

to:

"The flags argument, if present, largely controls which future 
statements influence the compilation of the code.  (Additional 
flags are documented in the AST module.)"

--
assignee: docs@python
components: Documentation
files: bltinmodule.c.patch
keywords: patch
messages: 150337
nosy: Jim.Jewett, docs@python
priority: normal
severity: normal
status: open
title: correct docstring for builtin compile
type: behavior
Added file: http://bugs.python.org/file24105/bltinmodule.c.patch

___
Python tracker 
<http://bugs.python.org/issue13677>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13677] correct docstring for builtin compile

2011-12-30 Thread Jim Jewett

Jim Jewett  added the comment:

I'm not sure we're looking at the same thing.  I was talking about the 
docstring that shows up at the interactive prompt in response to 
>>> help(compile)

Going to hg.python.org/cpython and selecting branches, then default, then 
browse, got me to
http://hg.python.org/cpython/file/7010fa9bd190/Python/bltinmodule.c
which still doesn't mention AST.  I also don't see any reference to "src" or 
"dst", or any "source" that looks like it should be capitalized.

I agree that there is (to my knowledge, at this time) only one additional flag. 
 I figured ast or future was needed to get the compilation constants, so it 
made sense to delegate -- but you seem to be reading something newer than I am.

--

___
Python tracker 
<http://bugs.python.org/issue13677>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13776] formatter_unicode.c still assumes ASCII

2012-01-12 Thread Jim Jewett

New submission from Jim Jewett :

http://docs.python.org/library/string.html#format-specification-mini-language 
defines

fill::=  

and the text also excludes '{'.  It does not require that the fill character be 
ASCII.

However, function parse_internal_render_format_spec 
http://hg.python.org/cpython/file/c2153ce1b5dd/Python/formatter_unicode.c#l277 
raises a ValueError if fill_char > 127.

I'm honestly not certain which of the three is correct, but they should be 
consistent, and if anything but '{' is excluded, it would be best to explain 
why.

--
components: Unicode
messages: 151128
nosy: Jim.Jewett, ezio.melotti
priority: normal
severity: normal
status: open
title: formatter_unicode.c still assumes ASCII
type: behavior

___
Python tracker 
<http://bugs.python.org/issue13776>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2012-01-12 Thread Jim Jewett

Jim Jewett  added the comment:

The currently applied patch ( http://hg.python.org/cpython/rev/f7e05d205a52 ) 
left some dead code in unicodeobject.c

function fixup ( 
http://hg.python.org/cpython/file/f7e05d205a52/Objects/unicodeobject.c#l9386 ) 
has a shortcut for when the fixer doesn't make any actual changes.  The removed 
fixers (like fixupper ) returned 0 rather than maxchar to indicate that.  The 
only remaining fixer, fix_decimal_and_space_to_ascii (line 8839), does not.  (I 
think fix_decimal_and_space_to_ascii *should* add a touched flag, but until it 
does, the shortcut dedup code is dead.)

Also, around line 10502, there is an #if 0 section with code that relied on one 
of the removed fixers; is it time to remove that section?

--
nosy: +Jim.Jewett

___
Python tracker 
<http://bugs.python.org/issue12736>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10576] Add a progress callback to gcmodule

2010-11-29 Thread Jim Jewett

Jim Jewett  added the comment:

I like the idea, but I do quibble with the signature.  As nearly as I can tell, 
you're assuming

(a)  Only one callback.  I would prefer a sequence of callbacks, to make 
cooperation easier.  (This does mean you would need a callback removal, instead 
of just setting it to None.)

(b)  The callback will be called once before collecting generations, and once 
after (with the number of objects that weren't collected).  Should these be 
separate callbacks, rather than the same one with a boolean?  And why does it 
need the number of uncollected objects?  (This might be a case where 
Practicality Beats Purity, but it is worth documenting.)

--
nosy: +jimjjewett

___
Python tracker 
<http://bugs.python.org/issue10576>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10576] Add a progress callback to gcmodule

2010-12-03 Thread Jim Jewett

Jim Jewett  added the comment:

Does anyone think that it is simpler to register two different 
callbacks than one? 

Moderately, yes.

Functions that actually help with cleanup should normally be run only in one 
phase; it is just stats-gathering and logging functions that might run both 
times, and I don't mind registering those twice.

For functions that are run only once (which I personally think is the more 
normal case), the choices are between

@register_gc
def my_callback(actually_run_flag, mydict):
if not actually_run_flag:
return
...

vs

@register_gc_before
def my_callback(mydict):
...

--

___
Python tracker 
<http://bugs.python.org/issue10576>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3025] batch/IDLE differ: print broken for chraracters>ascii

2008-06-01 Thread Jim Jewett

New submission from Jim Jewett <[EMAIL PROTECTED]>:

The str->Unicode change widened IDLE/batch discrepancy.

In python 2.x, bytes are printable.

>>> for i in range(256): print i, chr(i)

works fine.  In python 3, chr has become (the old) unichr, and whether a 
unicode character is printable depends on the environment.  In particular, 
under my Windows XP, the equivalent

>>> for i in range(256): print (i, chr(i))

will still work fine under IDLE, but will now crash with an 
UnicodeEncodeError when run from the command line.



Unfortunately, I'm not sure what the right solution actually is, other than 
a mention in the Whats New document.  

I believe the 2.5 code was using a system page to print those characters, as 
they often looked like letters rather than .  Copying that would 
probably be the wrong solution.

Limiting IDLE would add consistency, but might be a lot of work for the 
equivalent of a --pedantic flag.

PEP 3138 seems to be proposing a default stdout BackslashReplace, which may 
at least  help.

--
assignee: georg.brandl
components: Documentation, Unicode
messages: 67617
nosy: georg.brandl, jimjjewett
severity: normal
status: open
title: batch/IDLE differ: print broken for chraracters>ascii
type: behavior
versions: Python 3.0

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3025>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue775544] Tk.quit leads to crash in python.exe

2008-06-09 Thread Jim Jewett

Jim Jewett <[EMAIL PROTECTED]> added the comment:

Were you using IDLE at the time?

When I try this (Windows XP SP2), the button and its window do not go away 
(which is arguably a bug), but it does not crash.

If I then try to close the window using the little X (from the window 
manager),

(1)  A qb started from the command-line interface exits, as it should.
(2)  A qb started from within IDLE becomes non-responsive, and Windows 
asks whether or not I want to continue shutting it down.

--
nosy: +jimjjewett

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue775544>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3300] urllib.quote and unquote - Unicode issues

2008-08-06 Thread Jim Jewett

Jim Jewett <[EMAIL PROTECTED]> added the comment:

Is there still disagreement over anything except:

(1)  The type signature of quote and unquote (as opposed to the 
explicit "quote_as_bytes" or "quote_as string").

(2)  The default encoding (latin-1 vs UTF8), and (if UTF-8) what to do 
with invalid byte sequences?

(3)  Would waiting for 3.1 cause too many compatibility problems?

--
nosy: +jimjjewett

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3300>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3300] urllib.quote and unquote - Unicode issues

2008-08-06 Thread Jim Jewett

Jim Jewett <[EMAIL PROTECTED]> added the comment:

Matt pointed out that the email package assumes Latin-1 rather than UTF-8; I 
assume Bill could patch his patch the same way Matt did, and this would 
resolve the email tests.  (Unless you pronounce to stick with Latin-1)

The cookiejar failure probably has the same root cause; that test is 
encoding (non-ASCII) Latin-1 characters, and urllib.parse.py/Quoter assumes 
Latin-1.

So I see some evidence (probably not enough) for sticking with Latin-1 
instead of UTF-8.  But I don't see any evidence that fixing the semantics 
(encoded results should be bytes) at the same time made the conversion any 
more painful.  

On the other hand, Matt shows that some of those extra str->byte code 
changes might never need to be done at all, except for purity.

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3300>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3300] urllib.quote and unquote - Unicode issues

2008-08-06 Thread Jim Jewett

Jim Jewett <[EMAIL PROTECTED]> added the comment:

> http://codereview.appspot.com/2827/diff/1/5#newcode1450
> Line 1450: "%3c%3c%0Anew%C3%A5/%C3%A5",
> I'm guessing this test broke otherwise?  

Yes; that is one of the breakages you found in Bill's patch.  (He didn't 
modify the test.)

> Given that this references an RFC,
> is it correct to just fix it this way?

Probably.  Looking at http://www.faqs.org/rfcs/rfc2965.html

(1)  That is not among the exact tests in the RFC.
(2)  The RFC does not specify charset for the cookie in general, but the 
Comment field MUST be in UTF-8, and the only other reference I could find to 
a specific charset was "possibly in a server-selected printable ASCII 
encoding."

Whether we have to use Latin-1 (or document charset) in practice for 
compatibility reasons, I don't know.

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3300>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3300] urllib.quote and unquote - Unicode issues

2008-08-09 Thread Jim Jewett

Jim Jewett <[EMAIL PROTECTED]> added the comment:

Matt,

Bill's main concern is with a policy decision; I doubt he would object to 
using your code once that is resolved.

The purpose of the quoting functions is to turn a string (representing the 
human-readable version) into bytes (that go over the wire).  If everything 
is ASCII, there isn't any disagreement -- but it also isn't obvious that 
they're bytes instead of characters.  So people started (well, continued, 
since it dates to pre-unicode C) treating them as though they were strings.

The fact that ASCII (and therefore most wire protocols) looks the same as 
bytes or as characters was one of the strongest arguments against splitting 
the bytes and string types.  Now that this has been done, Bill feels we 
should be consistent.  (You feel wire-protocol bytes should be treated as 
strings, if only as bytestrings, because the libraries use them that way -- 
but this is a policy decision.)

To quote the final paragraph of 1.2.1
"""
 In local or regional contexts and with improving technology, users
   might benefit from being able to use a wider range of characters;
   such use is not defined by this specification.  Percent-encoded
   octets (Section 2.1) may be used within a URI to represent characters
   outside the range of the US-ASCII coded character set if this
   representation is allowed by the scheme or by the protocol element in
   which the URI is referenced.  Such a definition should specify the
   character encoding used to map those characters to octets prior to
   being percent-encoded for the URI.
"""

So the mapping to bytes (or "octets") for non-ASCII isn't defined (here), 
and if you want to use it, you need to specify charset.  But in practice, 
people do use it without specifying a charset.  Which charset should be 
assumed?  The old code (and test cases) assumed Latin-1.  You want to 
assume UTF-8 (though you took the document charset when available -- which 
might also make sense).

___
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3300>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue1657] [patch] epoll and kqueue wrappers for the select module

2008-03-20 Thread Jim Jewett

Jim Jewett <[EMAIL PROTECTED]> added the comment:

Is pyepoll a good prefix?  To me, it looks a lot like the _Py and Py 
reservered namespaces, but not quite...

--
nosy: +jimjjewett

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1657>
__
___
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2679] email.feedparser regex duplicate

2008-04-24 Thread Jim Jewett

New submission from Jim Jewett <[EMAIL PROTECTED]>:

feedparser defines four regexs for end-of-line, but two are redundant.

NLCRE checks for the three common line endings.
NLCRE_crack also captures the line ending.
NLCRE_eol also adds a $ to ensure it is at the end.
NLCRE_bol ... is identical to NLCRE_crack.

It should either use a ^ to insist on line-start, or be explicitly the 
same.  (e.g., NLCRE_bol=NLCRE_crack.)  (It gets away with not listing the ^ 
because the current code only uses NLCRE_bol.match.

(Actually, if the regexes are considered private, then the current code 
could just use the bound methods directly ... setting NLCRE_bol to the
 .match method, NLCRE_eol to the .search method, and NLCRE_crack to the
 .split method.)

--
components: Library (Lib)
messages: 65723
nosy: jimjjewett
severity: normal
status: open
title: email.feedparser regex duplicate
versions: Python 2.6, Python 3.0

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue2679>
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2636] Regexp 2.6 (modifications to current re 2.2.2)

2008-04-24 Thread Jim Jewett

Jim Jewett <[EMAIL PROTECTED]> added the comment:

> These features are to bring the Regexp code closer in line with Perl 5.10

Why 5.1 instead of 5.8 or at least 5.6?  Is it just a scope-creep issue?

> as well as add a few python-specific

because this also adds to the scope.


> 2) Make named matches direct attributes 
> of the match object; i.e. instead of m.group('foo'), 
> one will be able to write simply m.foo.

> 3) (maybe) make Match objects subscriptable, such 
> that m[n] is equivalent to m.group(n) and allow slicing.

(2) and (3) would both be nice, but I'm not sure it makes sense to do 
*both* instead of picking one.

> 5) Add a well-formed, python-specific comment modifier, 
> e.g. (?P#...);  

[handles parens in comments without turning on verbose, but is slower]

Why?  It adds another incompatibility, so it has to be very useful or 
clear.  What exactly is the advantage over just turning on verbose?

> 9) C-Engine speed-ups. ...
> a number of Macros are being eliminated where appropriate.

Be careful on those, particular on str/unicode and different compile options.

--
nosy: +jimjjewett

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue2636>
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue2636] Regexp 2.6 (modifications to current re 2.2.2)

2008-04-24 Thread Jim Jewett

Jim Jewett <[EMAIL PROTECTED]> added the comment:

Python 2.6 isn't the last, but Guido has said that there won't be a 2.10.

> Match object is a C-struct with python binding
> and I'm not exactly sure how to add either feature to it

I may be misunderstanding -- isn't this just a matter of writing the 
function and setting it in the tp_as_sequence and tp_as_mapping slots?

> Larry Wall and Guido agreed long ago that we, the python
> community, own all expressions of the form (?P...)

Cool -- that reference should probably be added to the docs.  For someone 
trying to learn or translate regular expressions, it helps to know that (?P
 ...) is explicitly a python extension (even if Perl adopts it later).

Definately put the example in the doc.  

r'He(?# 2 (TWO) ls)llo' should match "Hello" but it doesn't.  Maybe 
even without the change, as doco on the current situation.

Does VERBOSE really have to be the first flag, or does it just have to be on 
the whole pattern instead of an internal switch?

I'm not sure I fully understand what you said about template.  Is this a 
special undocumented switch, or just an internal optimization mode that 
should be triggered whenever the repeat operators don't happen to occur?

__
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue2636>
__
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4882] Behavior of backreferences to named groups in regular expressions unclear

2009-01-09 Thread Jim Jewett

Jim Jewett  added the comment:

That sounds like a good idea, particularly since it is a bit different 
from Perl.  Please do write up the a clarification.

Typically, I have either attached a file with the suggested wording, or 
included it in a comment from which a commiter could cut-and-paste.

(If Georg has different preferences on how to submit the patch, they 
should probably go into a FAQ anyhow.)

--
nosy: +jimjjewett

___
Python tracker 
<http://bugs.python.org/issue4882>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue4888] misplaced (or misleading) assert in ceval.c

2009-01-13 Thread Jim Jewett

Jim Jewett  added the comment:

I agree with Raymond.  A comment *might* be sufficient, but ... in some 
sense, that is the purpose of an assert.

The loop is reasonably long; it already includes macros which could 
(but currently don't) change the value, and function calls which might 
plausibly (but don't) reset a "why" variable.  The why variable is 
techically local, but the scope is still pretty large, so that isn't 
clear at first.

It took me some work to verify the assertion, and I'm not at all 
confident that a later change wouldn't violate it.  Nor am I confident 
that the symptoms would make for straightforward debugging.  (Would it 
look like stack corruption?  Would it take several more opcodes before 
a problem was visible?)

--
nosy: +jimjjewett

___
Python tracker 
<http://bugs.python.org/issue4888>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24275] lookdict_* give up too soon

2021-03-31 Thread Jim Jewett


Jim Jewett  added the comment:

What is the status on this?  If you are losing interest, would you like someone 
else to turn your patch into a pull request?

--

___
Python tracker 
<https://bugs.python.org/issue24275>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue44548] ttk Indeterminate Progressbar Not Animating Correctly After `start`

2021-07-01 Thread Jim Jewett


Jim Jewett  added the comment:

It sounds like the fix is a configuration change already included in the next 
version, so ... I think that counts as a fix.

--
nosy: +Jim.Jewett
resolution:  -> fixed
status: open -> pending

___
Python tracker 
<https://bugs.python.org/issue44548>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24275] lookdict_* give up too soon

2021-01-30 Thread Jim Jewett


Jim Jewett  added the comment:

This was originally "can be reopened if a patch is submitted" and Hristo Venev 
has now done so. Therefore, I am reopening.

--
resolution: rejected -> remind
stage:  -> patch review
status: closed -> open
versions: +Python 3.10

___
Python tracker 
<https://bugs.python.org/issue24275>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24275] lookdict_* give up too soon

2021-01-30 Thread Jim Jewett


Jim Jewett  added the comment:

Based on Hristo's timing, it appears to be a clear win.  

A near-wash for truly string-only dicts that shouldn't be effected; a near-wash 
for looking up non-(exact-)strings, and a nearly 40% speedup for the target 
case of looking up but not inserting a non-string or string subclass, then 
looking up strings thereafter. 

Additional comments:

Barring objections, I will promote from patch review to commit review when I've 
had a chance to look more closely.  I don't have commit privs, but I think some 
of the others following this issue do.

The test looks pretty good enough -- good enough that I wonder if I'm missing 
something on the parts that seem odd.  It would be great if you either cleaned 
them up or commented to explain why:

Why is the first key vx1, which seems, if anything, like a variable? 
 Why not k1 or string_key?

Why is the first key built up as vx='x'; vx += '1' instead of just k1="x1"?

Using a str subclass in the test is a great idea, and you've created a truly 
minimal one.  It would probably be good to *also* test with a non-string, like 
3 or 42.0.  I can't imagine this affecting things (unless you missed an eager 
lookdict demotion somewhere), but it would be good to have that path documented 
against regression.

This seems like a test that could probably be rolled into a bigger testfile for 
the actual commit.  I don't have the name of such an appropriate file at hand 
right now, but will try to find it on a deeper review.

--

___
Python tracker 
<https://bugs.python.org/issue24275>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue39542] Cleanup object.h header

2020-07-11 Thread Jim Jewett


Jim Jewett  added the comment:

Raymond, did you replace the screenshot with a later one showing that things 
are fixed now?  The timestamp suggests it went up at the same time as your 
comment, but what I see in the .png file is that the two are identical other 
than addresses.

--
nosy: +Jim.Jewett

___
Python tracker 
<https://bugs.python.org/issue39542>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41212] Emoji Unicode failing in standard release of Python 3.8.3 / tkinter 8.6.8

2020-07-11 Thread Jim Jewett


Jim Jewett  added the comment:

@Ben Griffin -- Unicode has defined astral characters for a while, but they 
were explicitly intended for rare characters, with any living languages 
intended for the basic plane.  It is only the most recent releases of unicode 
that have broken the "most people won't need this" expectation, so it wasn't 
unreasonable for languages targeting memory-constrained devices to make astral 
support at best a compile-time operation.  

I've seen a draft for an upcoming spec update of an old but still-supported 
language (extended Gerber, for photoplotting machines) that "handles" this 
simply by clarifying that their unicode support is limited to characters < 65K. 
 Given that their use of unicode is essentially limited to comments, and there 
is plenty of hardware that can't be updated ... this is may well be correct.

Python itself does the right thing, and tcl can't do the right thing anyhow 
without font support ... so this may be fixed in less time than it would take 
to replace Tk/Tcl.  If you need a faster workaround, consider a 
private-use-area and private font.

--
nosy: +Jim.Jewett

___
Python tracker 
<https://bugs.python.org/issue41212>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41217] Obsolete note for default asyncio event loop on Windows

2020-07-11 Thread Jim Jewett


Jim Jewett  added the comment:

Looks good to me.

--
nosy: +Jim.Jewett

___
Python tracker 
<https://bugs.python.org/issue41217>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41246] IOCP Proactor same socket overlapped callbacks

2020-07-11 Thread Jim Jewett


Jim Jewett  added the comment:

Looks good to me.  

I at first worried that the different function names were useful metadata that 
was getting lost -- but the names were already duplicated in several cases.  
*If* that is still a concern for the committer, then instead of repeating the 
code (as current production does), each section should just say 
newname=origname before registering the static method (as the patch does), and 
should bind a distinct name for each usage.

--
nosy: +Jim.Jewett
versions: +Python 3.10

___
Python tracker 
<https://bugs.python.org/issue41246>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41220] add optional make_key argument to lru_cache

2020-07-11 Thread Jim Jewett


Jim Jewett  added the comment:

Going back to Raymond's analysis, this is useful when at least some of the 
parameters either do not change the result, or are not hashable.

At a minimum, you need to figure out which parameters those are, and whether to 
drop them or transform them.

Is this already sufficiently rare or tricky that a subclass is justified, 
instead of trying to shoehorn things into a single key method?

--
nosy: +Jim.Jewett

___
Python tracker 
<https://bugs.python.org/issue41220>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41405] python 3.9.0b5 test

2020-07-27 Thread Jim Jewett


Jim Jewett  added the comment:

Is this a platform where 3.8 was working?

The curses test seems to think you have too many color-pairs defined, and this 
might well be part of a semi-compatible curses library. I guess I would add 
some output to the test showing how many (and which) color pairs it thinks 
there are.

The pwd complaint is correct, but seems like it is complaining about the 
interface between python and your OS.

The tkinter problem is really a failure to round a floating point, and I would 
be surprised if python had made changes there recently.  I would be slightly 
less surprised if something in the compile chain of tk for your system 
hard-coded a specific rounding format.

--
nosy: +Jim.Jewett

___
Python tracker 
<https://bugs.python.org/issue41405>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41409] deque.pop(index) is not supported

2020-07-27 Thread Jim Jewett


Jim Jewett  added the comment:

It may well have been intentional, as deques should normally be mutated only at 
the ends.  But Raymond did make changes to conform to the ABC, so this should 
probably be supported too.  Go ahead and include docstrings and/or discouraging 
it, though, except for i=0 and i=-1

--
nosy: +Jim.Jewett

___
Python tracker 
<https://bugs.python.org/issue41409>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40841] Provide mimetypes.sniff API as stdlib

2020-07-27 Thread Jim Jewett


Jim Jewett  added the comment:

The standard itself says that it only applies to content served over http; if 
the content is retrieved by ftp or from a file system, then you should trust 
that.  I don't notice that in the code you pointed to.

So maybe filetype is the right answer if the data isn't coming over the 
network?  For whatwg demonstration code, it is reasonable to assume that, but 
in python -- at a minimum, you should document the assumption prominently in 
the docs and docstring.

--
nosy: +Jim.Jewett

___
Python tracker 
<https://bugs.python.org/issue40841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue18280] Documentation is too personalized

2020-07-27 Thread Jim Jewett


Jim Jewett  added the comment:

I won't speak of nroff or troff in particular, but many programs had trouble 
distinguishing the end of a sentence from an honorific abbreviation, such as 
Mr. Spock or Dr. Seuss.

--
nosy: +Jim.Jewett

___
Python tracker 
<https://bugs.python.org/issue18280>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41407] Tricky behavior of builtin-function map

2020-07-27 Thread Jim Jewett


Jim Jewett  added the comment:

Why would you raise StopIteration if you didn't want to stop the nearest 
iteration loop?  I agree that the result of your sample code seems strange, but 
that is because it is strange code.

I agree with Steven D'Aprano that changing it would cause more pain than it 
would remove.

Unless it gets a lot more support by the first week of August, I recommend 
closing this request as rejected.

--
nosy: +Jim.Jewett
status: open -> pending

___
Python tracker 
<https://bugs.python.org/issue41407>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31904] Python should support VxWorks RTOS

2020-07-27 Thread Jim Jewett


Jim Jewett  added the comment:

Is it safe to say that there is an now intent to support VxWorks within the 
main tree, with Wind River agreeing to be primary support?

And this ticket has become a tracking ticket for the status on getting it 
there, small PR by small PR plus buildbot?

--
nosy: +Jim.Jewett

___
Python tracker 
<https://bugs.python.org/issue31904>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41391] Make test_unicodedata pass when running without network

2020-07-27 Thread Jim Jewett


Jim Jewett  added the comment:

Looks Good To Me

--
nosy: +Jim.Jewett

___
Python tracker 
<https://bugs.python.org/issue41391>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue40841] Provide mimetypes.sniff API as stdlib

2020-07-27 Thread Jim Jewett


Jim Jewett  added the comment:

There are a zillion reasons a filename could be wrong -- but the standard
says to trust the filesystem.  So if it sniffs based on contents, it isn't
quite following the standard.  It is probably still a useful tool, but it
won't be the One Right Way, and it isn't even clear that it should replace
current heuristics.

On Mon, Jul 27, 2020 at 7:22 PM Guido van Rossum 
wrote:

>
> Guido van Rossum  added the comment:
>
> Whether the data was retrieved over a network has nothing to do with it.
>
> There are complementary ways of guessing what data you are working with --
> guess based on the filename extension or sniff based on the contents of the
> file (or downloaded data).
>
> There are a zillion reasons why the filename could be a lie -- e.g. a user
> could pick the wrong extension, or rename a file, or a tool could save a
> file using the wrong extension or no extension at all. Then again sometimes
> the contents of the file might not be enough, e.g.
> ```
> foo() // bar
> ```
> is both valid Python and valid JavaScript. :-)
>
> --
>
> ___
> Python tracker 
> <https://bugs.python.org/issue40841>
> ___
>

--

___
Python tracker 
<https://bugs.python.org/issue40841>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41405] python 3.9.0b5 test

2020-07-27 Thread Jim Jewett


Jim Jewett  added the comment:

Then I suspect they also exist in even earlier versions, and are actually tied 
to your development setup.  That should still be fixed, but it is probably not 
in Python's own code.  It might be in python's build process, which is still on 
us.  Or it might be in your distribution, or in a dependency like Tk, or in 
your personal C compiler or setup.

Could you look to see what your system's actual passwd file says, and how tcl 
rounds outside of python, and how many color pairs your curses supports or has?

--
versions: +Python 3.8, Python 3.9

___
Python tracker 
<https://bugs.python.org/issue41405>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13828] Further improve casefold documentation

2020-08-24 Thread Jim Jewett

Jim Jewett  added the comment:

Unicode probably won't make the correction, because of backwards
compatibility.  I do support the sentence suggested in Thorsten's most
recent reply.  Is expanding ligatures the only other normalization it does?

Ideally, we should also mention that it shifts to the canonical case, which
is usually (but not always) lowercase.  I think Cherokee is one that folds
to the upper case.

On Mon, Aug 24, 2020 at 11:02 AM Thorsten  wrote:

>
> Thorsten  added the comment:
>
> I see. I found the documents. That's an issue. That usage is incorrect. It
> is still valid to upper case "ß" to SS since "ẞ" is fairly new as an
> official German character, but the other way around is not valid.
>
> As such the current sentence in documentation also just does not make
> sense.
>
> >"Since it is already lowercase, lower() would do nothing to 'ß'"
>
> Exactly. Why would it? It is nonsensical to change an already lowercase
> character with a lowercase function.
>
> Suggest to update to:
>
> "For example, the Unicode standard for German lower case letter 'ß'
> prescribes full casefolding to 'ss'. Since it is already lowercase, lower()
> would do nothing to 'ß'; casefold() converts it to 'ss'.
> In addition to full lowercasing, this function also expands ligatures, for
> example, 'fi' becomes 'fi'."
>
> --
>
> ___
> Python tracker 
> <https://bugs.python.org/issue13828>
> ___
>

--

___
Python tracker 
<https://bugs.python.org/issue13828>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41246] IOCP Proactor same socket overlapped callbacks

2020-08-29 Thread Jim Jewett


Change by Jim Jewett :


--
stage: patch review -> commit review

___
Python tracker 
<https://bugs.python.org/issue41246>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue12736] Request for python casemapping functions to use full not simple casemaps per Unicode's recommendation

2012-01-15 Thread Jim Jewett

Jim Jewett  added the comment:

Why was the delta-processing removed from the casing functions?

As best I can tell, the whole point of going through multiple levels of 
indirection (courtesy splitbins) is to maximize compression and minimize the 
amount of cache that unicode might occupy.

By using deltas, only one record is needed for each combination of (upper - 
lower, upper - title), which is generally only one or two combinations per 
script.  

Without deltas, nearly every cased letter needs its own record, and the index 
tables also get bigger. (It seems to be about 2.6 times as large, but cache 
effects may be worse, since letters from the same script will no longer be in 
the same record or the same index chain.)

If it is a concern about not enough room for flags, then the decimal/digit 
chars could be combined.  They are always the same, unless the number isn't 
decimal (in which case the flag is enough).

--

___
Python tracker 
<http://bugs.python.org/issue12736>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13793] hasattr, delattr, getattr fail with unnormalized names

2012-01-15 Thread Jim Jewett

New submission from Jim Jewett :

The documentation for hasattr, getattr, and delattr state that they are 
equivalent to object.attribute access; this isn't quite true, because 
object.attribute uses a NFKC-normalized version of the string as only the 
secondary location, while hasattr, getattr, and delattr (assuming an object 
rather than an Identifier or string) don't seem to do the normalization at all.

I think the simplest fix would be to normalize and retry when hasattr, getattr, 
and delattr fail with a string, but I'm not sure that normalization shouldn't 
be the only string tried. 

>>> o.º
Traceback (most recent call last):
  File "", line 1, in 
o.º
AttributeError: 'Object' object has no attribute 'o'
>>> o.o
Traceback (most recent call last):
  File "", line 1, in 
o.o
AttributeError: 'Object' object has no attribute 'o'
>>> o.º=[]
>>> hasattr(o, "º")
False
>>> getattr(o, "º")
Traceback (most recent call last):
  File "", line 1, in 
getattr(o, "º")
AttributeError: 'Object' object has no attribute 'º'
>>> delattr(o, "º")
Traceback (most recent call last):
  File "", line 1, in 
delattr(o, "º")
AttributeError: º
>>> o.º
[]
>>> o.º is o.o
True
>>> o.o
[]
>>> del o.º
>>> o.o
Traceback (most recent call last):
  File "", line 1, in 
o.o
AttributeError: 'Object' object has no attribute 'o'

>>> o.º = 5
>>> hasattr(o, "º")
False
>>> hasattr(o, "o")
True
>>> hasattr(o, "o")
True
>>> o.º
5
>>> delattr(o, "o")
>>> o.º

--
components: Unicode
messages: 151320
nosy: Jim.Jewett, ezio.melotti
priority: normal
severity: normal
status: open
title: hasattr, delattr, getattr fail with unnormalized names

___
Python tracker 
<http://bugs.python.org/issue13793>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13793] hasattr, delattr, getattr fail with unnormalized names

2012-01-16 Thread Jim Jewett

Jim Jewett  added the comment:

Why is normalization in getattr unacceptable?  I won't pretend to *like* it, 
but the difference between two canonically equal strings really is (by 
definition) just a representational issue.

Would it be OK to normalize in object's own implementation, so that custom 
classes could avoid the normalization, but it would happen by default?

--

___
Python tracker 
<http://bugs.python.org/issue13793>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13165] Integrate stringbench in the Tools directory

2012-01-16 Thread Jim Jewett

Jim Jewett  added the comment:

The URL got mangled in at least my browser, so I'm repasting it on its own line:

http://svn.python.org/projects/sandbox/trunk/stringbench

--
nosy: +Jim.Jewett

___
Python tracker 
<http://bugs.python.org/issue13165>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-17 Thread Jim Jewett

Jim Jewett  added the comment:

To be more explicit about Martin A. Lemburg's msg151121 (which I agree with):

Count the collisions on a single lookup. 
If they exceed a threshhold, do something different.

Martin's strawman proposal was threshhold=1000, and raise.  It would be just as 
easy to say "whoa!  5 collisions -- time to use the alternative hash instead" 
(and, possibly, to issue a warning).  

Even that slight tuning removes the biggest objection, because it won't ever 
actually fail.

Note that the use of a (presumably stronger 2nd) hash wouldn't come into play 
until (and unless) there was a problem for that specific key in that specific 
dictionary.  For the normal case, nothing changes -- unless we take advantage 
of the existence of a 2nd hash to simplify the first few rounds of collision 
resolution.  (Linear probing is more cache-friendly, but also more vulnerable 
to worst-case behavior -- but if probing stops at 4 or 8, that may not matter 
much.)  For quick scripts, the 2nd hash will almost certainly never be needed, 
so startup won't pay the penalty.

The only down side I see is that the 2nd (presumably randomized) hash won't be 
cached without another slot, which takes more memory and shouldn't be done in a 
bugfix release.

--
nosy: +Jim.Jewett

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13820] 2.6 is no longer in the future

2012-01-18 Thread Jim Jewett

New submission from Jim Jewett :

http://docs.python.org/reference/lexical_analysis.html

Changed in version 2.5: Both as and with are only recognized when the 
with_statement future feature has been enabled. It will always be enabled in 
Python 2.6. See section The with statement for details. Note that using as and 
with as identifiers will always issue a warning, even when the with_statement 
future directive is not in effect.


That was reasonable wording for 2.5 itself, but at this point, I think it would 
be simpler to add a Changed in version 2.6 entry.  Perhaps:

Changed in version 2.5: Using as or with as identifiers triggers a warning.  
Using them as statements requires the with_statement future feature.
Changed in Python 2.6: as and with became full keywords.

--
assignee: docs@python
components: Documentation
messages: 151595
nosy: Jim.Jewett, docs@python
priority: normal
severity: normal
status: open
title: 2.6 is no longer in the future
type: enhancement
versions: Python 2.7

___
Python tracker 
<http://bugs.python.org/issue13820>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13821] misleading return from isidentifier

2012-01-18 Thread Jim Jewett

New submission from Jim Jewett :

Python identifiers are in NFKC form; string method .isidentifier() returns true 
on strings that are not in that form.  In some contexts, these non-canonical 
strings will be replaced with their NFKC equivalent, but in other contexts 
(such as the builtins hasattr, getattr, delattr) they will not.


>>> cha=chr(170)
>>> cha
'ª'

>>> cha.isidentifier()
True

>>> uc.normalize("NFKC", cha)
'a'

>>> obj.ª = 5
>>> hasattr(obj, "ª")
False
>>> obj.a
5

--
components: Unicode
messages: 151597
nosy: Jim.Jewett, ezio.melotti
priority: normal
severity: normal
status: open
title: misleading return from isidentifier

___
Python tracker 
<http://bugs.python.org/issue13821>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13821] misleading return from isidentifier

2012-01-18 Thread Jim Jewett

Jim Jewett  added the comment:

My preference would be for non_NFKC.isidentifier() to return False, but that 
may be a problem for backwards compatibility.

It *may* be worth adding an asidentifier() method that returns either False or 
the canonicalized string that should be used instead.

At a minimum, the documentation (including docstring) should warn that the 
method doesn't check for NFKC form, and that if the input is not ASCII, the 
caller should first ensure this by calling str1=unicodedata.normalize("NFKC", 
str1)

--

___
Python tracker 
<http://bugs.python.org/issue13821>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13821] misleading return from isidentifier

2012-01-18 Thread Jim Jewett

Jim Jewett  added the comment:

@Benjamin -- the catch is, if it isn't already in NFKC form, then python won't 
really accept it as an identifier.  Sometimes it will silently canonicalize it 
for you so that it seems to work, but other times it won't.  And program 
calling isidentifier is likely to be a program that uses the strings directly 
for access, instead of always routing them through the parser.

--

___
Python tracker 
<http://bugs.python.org/issue13821>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13828] Further improve casefold documentation

2012-01-19 Thread Jim Jewett

New submission from Jim Jewett :

> http://hg.python.org/cpython/rev/0b5ce36a7a24
> changeset:   74515:0b5ce36a7a24


> +   Casefolding is similar to lowercasing but more aggressive because it is
> +   intended to remove all case distinctions in a string. For example, the 
> German
> +   lowercase letter ``'ß'`` is equivalent to ``"ss"``. Since it is already
> +   lowercase, :meth:`lower` would do nothing to ``'ß'``; :meth:`casefold`
> +   converts it to ``"ss"``.

Perhaps add the recommendation to canonicalize as well.

A complete, but possibly too long, try is below:


Casefolding is similar to lowercasing but more aggressive because it is 
intended to remove all case distinctions in a string. For example, the German 
lowercase letter ``'ß'`` is equivalent to ``"ss"``. Since it is already 
lowercase, :meth:`lower` would do nothing to ``'ß'``; :meth:`casefold` converts 
it to ``"ss"``.  Note that most case-insensitive matches should also match 
compatibility equivalent characters.  

The casefolding algorithm is described in section 3.13 of the Unicode Standard. 
 Per D146, a compatibility caseless match can be achieved by

from unicodedata import normalize
def caseless_compat(string):
nfd_string = normalize("NFD", string)
nfkd1_string = normalize("NFKD", nfd_string.casefold())
return normalize("NFKD", nfkd1_string.casefold())

--
assignee: docs@python
components: Documentation
messages: 151644
nosy: Jim.Jewett, benjamin.peterson, docs@python
priority: normal
severity: normal
status: open
title: Further improve casefold documentation
versions: Python 3.3

___
Python tracker 
<http://bugs.python.org/issue13828>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13828] Further improve casefold documentation

2012-01-19 Thread Jim Jewett

Jim Jewett  added the comment:

Frankly, I do think that sample code is too long, but correctness matters ... 
perhaps a better solution would be to add either a method or a unicodedata 
function that does the work, then the extra note could just say

Note that most case-insensitive matches should also match compatibility 
equivalent characters; see unicodedata.compatibity_casefold

--

___
Python tracker 
<http://bugs.python.org/issue13828>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13832] tokenization assuming ASCII whitespace; missing multiline case

2012-01-19 Thread Jim Jewett

New submission from Jim Jewett :

Parser/parsetok.c was recently changed (e.g. 
http://hg.python.org/cpython/rev/2bd7f40108b4 ) to raise an error if multiple 
statements were found in a single-statement compile call.  It sensibly ignores 
trailing whitespace and comments.  Unfortunately,

(1)  It looks only at (c == ' ' || c == '\t' || c == '\n' || c == '\014') as 
opposed to using Py_UNICODE_ISSPACE(ch)
(2)  It assumes that a "#" means the rest of the line is OK, instead of looking 
for additional linebreaks.

Not sure whether to mark this a bug or an enhancement, since it is already 
strictly better than the 3.2 behavior of never warning about extra text.

--
components: Interpreter Core
messages: 151652
nosy: Jim.Jewett
priority: normal
severity: normal
status: open
title: tokenization assuming ASCII whitespace; missing multiline case
versions: Python 3.3

___
Python tracker 
<http://bugs.python.org/issue13832>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13832] tokenization assuming ASCII whitespace; missing multiline case

2012-01-20 Thread Jim Jewett

Jim Jewett  added the comment:

Ignoring non-ascii whitespace is defensible, and I agree that it should match 
the rest of the parser.  Ignoring 2nd lines is still a problem, and supposedly 
part of what got fixed.  Test case:

s="""x=5  # comment
x=6
"""
compile(s, "", 'single')

--

___
Python tracker 
<http://bugs.python.org/issue13832>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-20 Thread Jim Jewett

Jim Jewett  added the comment:

Marc-Andre Lemburg:
>> So you get the best of both worlds and randomization would only
>> kick in when it's really needed to keep the application running.

Charles-François Natali
> The only argument in favor the collision counting is that it will not
> break applications relying on dict order:

There is also the "taxes suck" argument; if hashing is made complex,
then every object (or at least almost every string) pays a price, even
if it will never be stuck in a dict big enough to matter.

With collision counting, there are no additional operations unless and
until there is at least one collision -- in other words, after the
base hash algorithm has already started to fail for that particular
piece of data.

In fact, the base algorithm can be safely simplified further,
precisely because it does not need to be quite as adequate for
reprobes on data that does have at least one collision.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-20 Thread Jim Jewett

Jim Jewett  added the comment:

On Fri, Jan 20, 2012 at 7:58 AM, STINNER Victor
> If the hash output depends on an argument, the result cannot be
> cached.

They can still be cached in a separate dict based on id, rather than
string contents.

It may also be possible to cache them in the dict itself; for a
string-only dict, the hash of each entry is already cached on the
object, and the cache member of the entry is technically redundant.
Entering a key with the alternative hash can also switch the lookup
function to one that handles that possibility, just as entering a
non-string key currently does.

> It would require to add an
> optional argument to hash functions, or add a new function to some
> (or all?) builtin types.

For backports, the alternative hashing could be done privately within
dict and set, and would not require new slots on other types.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-23 Thread Jim Jewett

Jim Jewett  added the comment:

On Mon, Jan 23, 2012 at 4:39 PM, Marc-Andre Lemburg
 wrote:

> Running (part of (*)) the test suite with debugging enabled on a 64-bit
> machine shows that slot collisions are much more frequent than
> hash collisions, which only account for less than 0.01% of all
> collisions.

Even 1 in 10,000 seems pretty high, though I suppose it is a result of
non-random input.  (For a smalldict with 8 == 2^3 slots, on a 64-bit
machine, true hash collisions "should" only account for 1 in 2^61 slot
collisions.)

> It also shows that slot collisions in the low 1-10 range are
> most frequent, with very few instances of a dict lookup
> reaching 20 slot collisions (less than 0.0002% of all
> collisions).

Thus the argument that collisions > N implies (possibly malicious)
data that really needs a different hash -- and that this dict instance
in particular should take the hit to use an alternative hash.  (Do
note that this alternative hash could be stored in the hash member of
the PyDictEntry; if anything actually *equal* to the key comes along,
it will have gone through just as many collisions, and therefore also
have been rehashed.)

> The great number of cases with 1 or 2 slot collisions surprised
> me. It seems that there's potential for improvement of
> the perturbation formula left.

In retrospect, this makes sense.

for (perturb = hash; ; perturb >>= PERTURB_SHIFT) {
i = (i << 2) + i + perturb + 1;

If two objects collided then they have the same last few last few bits
in their hashes -- which means they also have the same last few bits
in their initial perturb.  And since the first probe is to slot 6i+1,
it funnels down to only even consider half the slots until the second
probe.

Also note that this explains why Randomization could make the Django
tests fail, even though 64-bit users haven't complained.  The initial
hash(&mask) is the same, and the first probe is the same, and (for a
small enough dict) so are the next several.  In a dict with 2^12
slots, the first 6 tries will be the same ... so I doubt the test
cases have sufficiently large amounts of sufficiently unlucky data to
notice very often -- unless the hash itself is changed, as in the
patch.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-25 Thread Jim Jewett

Jim Jewett  added the comment:

On Wed, Jan 25, 2012 at 6:06 AM, Dave Malcolm 
added the comment:

>  hybrid-approach-dmalcolm-2012-01-25-001.patch

> As per haypo's random-8.patch, a randomization seed is read at startup.

Why not wait until it is needed?  I suspect a lot of scripts will
never need it for any dict, so why add the overhead to startup?

> Once a dict has transitioned to paranoid mode, it isn't using
> PyObject_Hash anymore, and thus isn't using cached object values

The alternative hashes could be stored in an id-keyed dict

 performing a more expensive calculation, but I believe this
calculation is essentially constant-time.
>
> This preserves hash() and dict order for the cases where you're not under 
> attack, and gracefully handles the attack without having to raise an 
> exception: it doesn't introduce any new exception types.
>
> It preserves ABI, assuming no-one else is reusing ma_smalltable.
>
> It is suitable for backporting to 3.2, 2.7, and earlier (I'm investigating 
> fixing this going all the way back to Python 2.2)
>
> Under the old implementation, there were 4 types of PyDictObject, given these 
> two booleans:
>  * "small vs large" i.e ma_table == ma_smalltable vs ma_table != ma_smalltable
>  * "all keys are str" vs arbitary keys i.e ma_lookdict == lookdict_unicode vs 
> lookdict
>
> Under this implementation, this doubles to 8 kinds, adding the boolean:
>  * normal hash vs randomized hash (normal vs "paranoid").
>
> This is expressed via the ma_lookdict callback, adding two new variants, 
> lookdict_unicode_paranoid and lookdict_paranoid
>
> Note that if a paranoid dict goes small again (ma_table == ma_smalltable), it 
> stays paranoid.  This is for simplicity: it avoids having to rebuild all of 
> the non-randomized me_hash values again (which could fail).
>
> Naturally the patch adds selftests.  I had to add some diagnostic methods to 
> support them; dict gains _stats() and _make_paranoid() methods, and sys gains 
> a _getrandomizedhash() method.  These could be hidden more thoroughly if need 
> be (see DICT_PROTECTION_TRACKING in dictobject.c).  Amongst other things, the 
> selftests measure wallclock time taken for various dict operations (and so 
> might introduce failures on a heavily-loaded machine, I guess).
>
> Hopefully this approach is a viable way forward.
>
> Caveats and TODO items:
>
> TODO: I haven't yet tuned the safety threshold.  According to 
> http://bugs.python.org/issue13703#msg151850:
>> slot collisions are much more frequent than
>> hash collisions, which only account for less than 0.01% of all
>> collisions.
>>
>> It also shows that slot collisions in the low 1-10 range are
>> most frequent, with very few instances of a dict lookup
>> reaching 20 slot collisions (less than 0.0002% of all
>> collisions).
>
> This suggests that the threshold of 32 slot/hash collisions per lookup may 
> already be high enough.
>
> TODO: in a review of an earlier version of the complexity detection idea, 
> Antoine Pitrou suggested that make the protection scale factor be a run-time 
> configurable value, rather than a #define.  This isn't done yet.
>
> TODO: run more extensive tests (e.g. Django and Twisted), monitoring the 
> worst-case complexity that's encountered
>
> TODO: not yet benchmarked and optimized.  I want to get feedback on the 
> approach before I go in and hand-optimize things (e.g. by hand-inlining 
> check_iter_count, and moving the calculations out of the loop etc).  I 
> believe any performance issues ought to be fixable, in that the we can get 
> the cost of this for the "we're not under attack" case to be negligible, and 
> the "under attack" case should transition from O(N^2) to O(N), albeit it with 
> a larger constant factor.
>
> TODO: this doesn't cover sets, but assuming this approach works, the patch 
> can be extended to cover it in an analogous way.
>
> TODO: should it cover PyMemoryViewObject, buffer object, etc?
>
> TODO: should it cover the hashing in Modules/expat/xmlparse.c?  FWIW I rip 
> this code out when doing my downstream builds in RHEL and Fedora, and instead 
> dynamically link against a system copy of expat
>
> TODO: only tested on Linux so far (which is all I've got).  Fedora 15 x86_64 
> fwiw
>
>  Doc/using/cmdline.rst     |    6
>  Include/bytesobject.h     |    2
>  Include/object.h          |    8
>  Include/pythonrun.h       |    2
>  Include/unicodeobject.h   |    2
>  Lib/os.py                 |   17 --
>  Lib/test/regrtest.py      |    5
>  Lib/test/test_dict.py     |  298 +
&g

[issue13703] Hash collision security issue

2012-01-25 Thread Jim Jewett

Jim Jewett  added the comment:

Sorry; hit the wrong key... intended message below:

On Wed, Jan 25, 2012 at 6:06 AM, Dave Malcolm 
added the comment:

[lots of good stuff]

>  hybrid-approach-dmalcolm-2012-01-25-001.patch

> As per haypo's random-8.patch, a randomization seed is read at
> startup.

Why not wait until it is needed?  I suspect a lot of scripts will
never need it for any dict, so why add the overhead to startup?

> Once a dict has transitioned to paranoid mode, it isn't using
> PyObject_Hash anymore, and thus isn't using cached object values

The alternative hashes could be stored in an id-keyed
WeakKeyDictionary; that would handle at least the normal case of using
exactly the same string for the lookup.

> Note that if a paranoid dict goes small again
> (ma_table == ma_smalltable), it stays paranoid.

As I read it, that couldn't happen, because paranoid dicts couldn't
shrink at all.  (Not letting them shrink beneath 2*PyDict_MINSIZE does
seem like a reasonable solution.)

Additional TODOs...

The checks for Unicode and Dict should not be exact; it is OK to do on
a subclass so long as they are using the same lookdict (and, for
unicode, the same eq).

Additional small strings should be excluded from the new hash, to
avoid giving away the secret.  At a minimum, single-char strings
should be excluded, and I would prefer to exclude all strings of
length <= N (where N defaults to 4).

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-25 Thread Jim Jewett

Jim Jewett  added the comment:

On Wed, Jan 25, 2012 at 1:05 PM,  Antoine Pitrou 
added the comment:

> It looks like that approach will break any non-builtin type (in either C
> or Python) which can compare equal to bytes or str objects. If that's
> the case, then I think the likelihood of acceptance is close to zero.

(1)  Isn't that true of *any* patch that changes hashing?  (Thus the
PYTHONHASHSEED=0 escape hatch.)

(2)  I think it would still work for the lookdict_string (or
lookdict_unicode) case ... which is the normal case, and also where
most vulnerabilities should appear.

(3)  If the alternate hash is needed for non-string keys, there is no
perfect resolution, but I suppose you could get closer with

if obj == str(obj):
newhash=hash(str(obj))

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue6056] socket.setdefaulttimeout affecting multiprocessing Manager

2012-01-25 Thread Jim Jewett

Jim Jewett  added the comment:

The wording in 138415 suggested this patch was changing socket to not support 
timeouts -- which would be unacceptable.  

But the actual patch only seems to touch multiprocessing/connection.py -- a far 
more reasonable change.

Unfortunately, the patch no longer applies to the development tip.  I *think* 
the places you wanted to change are still there, and just moved.

(1)  Is it sufficiently clear that this is not-a-feature to justify a backport?

(2)  Are the problems already fixed by some of the other changes?  (It doesn't 
look like it, but I'm not sure.)

(3)  Can you produce an updated patch?  (The current tip is 
http://hg.python.org/cpython/file/fec45282dc28/Lib/multiprocessing/connection.py
  )

(4)  If I understand the intent, then s.setblocking(True) would be slightly 
more clear than s.settimeout(None), though that change obviously isn't 
essential.

--
nosy: +Jim.Jewett

___
Python tracker 
<http://bugs.python.org/issue6056>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13867] misleading comment in weakrefobject.h

2012-01-25 Thread Jim Jewett

New submission from Jim Jewett :

http://hg.python.org/cpython/file/fec45282dc28/Include/weakrefobject.h#l54

The comment makes sense -- but doesn't appear to be true, so perhaps it is the 
macro that should change.


 
/* This macro calls PyWeakref_CheckRef() last since that can involve a
   function call; this makes it more likely that the function call
   will be avoided. */
#define PyWeakref_Check(op) \
(PyWeakref_CheckRef(op) || PyWeakref_CheckProxy(op))

--
assignee: docs@python
components: Documentation, Extension Modules
messages: 151983
nosy: Jim.Jewett, docs@python
priority: normal
severity: normal
status: open
title: misleading comment in weakrefobject.h

___
Python tracker 
<http://bugs.python.org/issue13867>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue10042] total_ordering

2012-01-25 Thread Jim Jewett

Jim Jewett  added the comment:

I like Nick Coghlan's suggestion in msg140493, but I think he was giving up too 
soon in the "or" cases, and I think the confusion could be slightly reduced by 
some re-spellings around return values and comments about short-circuiting.
   
def not_op(op, other):
# "not a < b" handles "a >= b"
# "not a <= b" handles "a > b"
# "not a >= b" handles "a < b"
# "not a > b" handles "a <= b"
op_result = op(other)
if op_result is NotImplemented:
return NotImplemented
return not op_result

def op_or_eq(op, self, other):
# "a < b or a == b" handles "a <= b"
# "a > b or a == b" handles "a >= b"
op_result = op(other)
if op_result is NotImplemented
return self.__eq__(other) or NotImplemented
if op_result:
return True
return self.__eq__(other)

def not_op_and_not_eq(op, self, other):
# "not (a < b or a == b)" handles "a > b"
# "not a < b and a != b" is equivalent
# "not (a > b or a == b)" handles "a < b"
# "not a > b and a != b" is equivalent
op_result = op(other)
if op_result is NotImplemented:
return NotImplemented
if op_result:
return False
return self.__ne__(other)

def not_op_or_eq(op, self, other):
# "not a <= b or a == b" handles "a >= b"
# "not a >= b or a == b" handles "a <= b"
op_result = op(other)
if op_result is NotImplemented:
return self.__eq__(other) or NotImplemented
if op_result:
return self.__eq__(other)
return True

def op_and_not_eq(op, self, other):
# "a <= b and not a == b" handles "a < b"
# "a >= b and not a == b" handles "a > b"
op_result = op(other)
if op_result is NotImplemented:
return NotImplemented
if op_result:
return self.__ne__(other)
return False

--
nosy: +Jim.Jewett

___
Python tracker 
<http://bugs.python.org/issue10042>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13870] false comment in collections/__init__.py ordered dict

2012-01-25 Thread Jim Jewett

New submission from Jim Jewett :

http://hg.python.org/cpython/file/tip/Lib/collections/__init__.py#l37 states 
that the prev/next links are weakref proxies; as of 
http://hg.python.org/cpython/diff/3977dc349ae7/Lib/collections.py this is no 
longer true of the next links.  

It could be fixed by changing

# The prev/next links are weakref proxies (to prevent circular references).

to 

# The prev links are weakref proxies (to prevent circular references).

--
components: Library (Lib)
files: collections_init.patch
keywords: patch
messages: 151996
nosy: Jim.Jewett
priority: normal
severity: normal
status: open
title: false comment in collections/__init__.py ordered dict
Added file: http://bugs.python.org/file24326/collections_init.patch

___
Python tracker 
<http://bugs.python.org/issue13870>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13871] namedtuple does not normalize field names when sanitizing

2012-01-25 Thread Jim Jewett

New submission from Jim Jewett :

collections.namedtuple raises a ValueError if any of the field names are not 
valid identifiers, or are duplicates.

It does not normalize the identifiers when checking for duplicates.

(Similar issue with the typename)

>>> namedtuple("dup_fields", ["a", "a"])
Traceback (most recent call last):
  File "", line 1, in 
namedtuple("dup_fields", ["a", "a"])
  File "C:\python32\lib\collections.py", line 345, in namedtuple
raise ValueError('Encountered duplicate field name: %r' % name)
ValueError: Encountered duplicate field name: 'a'



>>> namedtuple("nfk_tester", ["a", "ª"])
Traceback (most recent call last):
  File "", line 1, in 
namedtuple("nfk_tester", ["a", "ª"])
  File "C:\python32\lib\collections.py", line 365, in namedtuple
raise SyntaxError(e.msg + ':\n\n' + class_definition)
  File "", line None
SyntaxError: duplicate argument 'a' in function definition:
...



and 


>>> namedtuple("justª", "ª")
Traceback (most recent call last):
  File "", line 1, in 
namedtuple("justª", "ª")
  File "C:\python32\lib\collections.py", line 366, in namedtuple
result = namespace[typename]
KeyError: 'justª'

--
messages: 151997
nosy: Jim.Jewett
priority: normal
severity: normal
status: open
title: namedtuple does not normalize field names when sanitizing

___
Python tracker 
<http://bugs.python.org/issue13871>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-27 Thread Jim Jewett

Jim Jewett  added the comment:

On Thu, Jan 26, 2012 at 8:19 PM, Antoine Pitrou  wrote:

> If I read your [Martin v. Löwis' ] patch correctly, collisions will
> produce additional allocations ... That's a pretty massive
> change in memory consumption for string dicts

Not in practice.

The point I first missed is that this triggers only when the hash is
*fully* equal; if the hashes are merely equal after masking, then
today's try-another-slot approach will still be used, even for
strings.

Per ( http://bugs.python.org/issue13703#msg151850 ) Marc-Andre
Lemburg's measurements, full-hash equality explains only 1 in 10,000
collisions.  From a performance standpoint, we can almost ignore a
case that rare; it is almost certainly dwarfed by resizing.

I *am* a bit concerned that the possible contents of a dictentry
change; this could cause easily-missed-in-testing breakage for
anything that treats table as an array.  That said, it doesn't seem
much worse than the search finger, and there seemed to be recent
consensus that even promising an exact dict -- subclasses not allowed
-- didn't mean that direct access was sanctioned.  So it still seems
safer than changing the de-facto iteration order.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-29 Thread Jim Jewett

Jim Jewett  added the comment:

Given PYTHONHASHSEED, what is the point of PYTHONHASHRANDOMIZATION?

Alternative:

On startup, python reads a config file with the seed (which defaults to zero).

Add a function to write a random value to that config file for the next startup.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-01-30 Thread Jim Jewett

Jim Jewett  added the comment:

On Mon, Jan 30, 2012 at 12:31 PM,  Dave Malcolm 
added the comment:

> It's useful for the selftests, so I've kept PYTHONHASHSEED.

The reason to read PYTHONHASHSEED was so that multiple members of a
cluster could use the same hash.

It would have been nice to have fewer environment variables, but I'll
grant that it is hard to say "use something random that we have *not*
precomputed" without either a config file or a magic value for
PYTHONHASHSEED.

-jJ

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-06 Thread Jim Jewett

Jim Jewett  added the comment:

On Mon, Feb 6, 2012 at 8:12 AM, Marc-Andre Lemburg
 wrote:
>
> Marc-Andre Lemburg  added the comment:
>
> Antoine Pitrou wrote:
>>
>> The simple collision counting approach leaves a gaping hole open, as
>> demonstrated by Frank.

> Could you elaborate on this ?

> Note that I've updated the collision counting patch to cover both
> possible attack cases I mentioned in 
> http://bugs.python.org/issue13703#msg150724.
> If there's another case I'm unaware of, please let me know.

The problematic case is, roughly,

(1)  Find out what N will trigger collision-counting countermeasures.
(2)  Insert N-1 colliding entries, to make it as slow as possible.
(3)  Keep looking up (or updating) the N-1th entry, so that the
slow-as-possible-without-countermeasures path keeps getting rerun.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-06 Thread Jim Jewett

Jim Jewett  added the comment:

On Mon, Feb 6, 2012 at 12:07 PM, Marc-Andre Lemburg
 wrote:
>
> Marc-Andre Lemburg  added the comment:
>
> Jim Jewett wrote:

>> The problematic case is, roughly,

>> (1)  Find out what N will trigger collision-counting countermeasures.
>> (2)  Insert N-1 colliding entries, to make it as slow as possible.
>> (3)  Keep looking up (or updating) the N-1th entry, so that the
>> slow-as-possible-without-countermeasures path keeps getting rerun.

> Since N is constant, I don't see how such an "attack" could be used
> to trigger the O(n^2) worst-case behavior.

Agreed; it tops out with a constant, but if it takes only 16 bytes of
input to force another run through a 1000-long collision, that may
still be too much leverage.

> BTW: If you set the limit N to e.g. 100 (which is reasonable given
> Victor's and my tests),

Agreed.  Frankly, I think 5 would be more than reasonable so long as
there is a fallback.

> the time it takes to process one of those
> sets only takes 0.3 ms on my machine. That's hardly usable as basis
> for an effective DoS attack.

So it would take around 3Mb to cause a minute's delay...

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-06 Thread Jim Jewett

Jim Jewett  added the comment:

On Mon, Feb 6, 2012 at 1:53 PM, Frank Sievertsen  wrote:

>>> BTW: If you set the limit N to e.g. 100 (which is reasonable given
>>> Victor's and my tests),

>> So it would take around 3Mb to cause a minute's delay...

> How did you calculate that?

16 bytes/entry * 3300 entries/second * 60 seconds/minute

But if there is indeed a way to cut that 16 bytes/entry, that is worse.

Switching dict implementations at 5 collisions is still acceptable,
except from a complexity standpoint.

-jJ

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13958] Comment _PyUnicode_FromId

2012-02-06 Thread Jim Jewett

New submission from Jim Jewett :

Add a comment explaining why _PyUnicode_FromId can (and should) assume 
ASCII-only identifiers.


/* PEP3131 guarantees that all python-internal identifiers
   are ASCII-only.  Violating this would break some supported
   C compilers. */

See http://mail.python.org/pipermail/python-dev/2012-February/116234.html

--
components: Unicode
messages: 152775
nosy: Jim.Jewett, ezio.melotti
priority: normal
severity: normal
status: open
title: Comment _PyUnicode_FromId
versions: Python 3.3

___
Python tracker 
<http://bugs.python.org/issue13958>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13958] Comment _PyUnicode_FromId

2012-02-06 Thread Jim Jewett

Jim Jewett  added the comment:

On Mon, Feb 6, 2012 at 4:25 PM, Martin v. Löwis  wrote:

> Martin v. Löwis  added the comment:

> This has nothing to do with PEP 3131. Python could (and does)
> support non-ASCII identifiers just fine, regardless of C compiler
> limitations.

I *think* you're saying that the _Py_Identifier( ) is a smaller set
than identifiers in general.  Would the following be more accurate?

/* PEP3131 does allow non-ASCII identifiers in user code, but
   limits their use within the implementation itself.
   In particular, a _Py_Identifier may be passed directly to
   C code; such identifiers are restricted to ASCII to avoid
   breaking some supported C compilers. */

--

___
Python tracker 
<http://bugs.python.org/issue13958>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13958] Comment _PyUnicode_FromId

2012-02-06 Thread Jim Jewett

Jim Jewett  added the comment:

And is there a way to characterize the compilers that would break?  Is
it a few specific compilers, or "compilers that do not implement UTF8,
which is not required by the C standard", or ...

--

___
Python tracker 
<http://bugs.python.org/issue13958>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13958] Comment _PyUnicode_FromId

2012-02-09 Thread Jim Jewett

Jim Jewett  added the comment:

After clarification, the original change was backed out.

These are C Identifiers, and nothing beyond ASCII is guaranteed, but other 
characters are in practice possible.

--
resolution:  -> fixed
status: open -> closed

___
Python tracker 
<http://bugs.python.org/issue13958>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13977] importlib simplification

2012-02-09 Thread Jim Jewett

New submission from Jim Jewett :

http://hg.python.org/cpython/file/aba513307f78/Lib/importlib/_bootstrap.py#l974


  974 # The hell that is fromlist ...
  975 if not fromlist:
  976 # Return up to the first dot in 'name'. This is
complicated by the fact
  977 # that 'name' may be relative.
  978 if level == 0:
  979 return sys.modules[name.partition('.')[0]]
  980 elif not name:
  981 return module
  982 else:
  983 cut_off = len(name) - len(name.partition('.')[0])
  984 return sys.modules[module.__name__[:-cut_off]]

If level is 0, should name == module.__name__?

Yes.
 

If so, then I think that simplifies to

   if not name:
   return module
   genericname=module.__name__.rpartition(".")[0]
   return sys.modules[genericname]

Seems right. Can you file a bug and assign it to me?

--
messages: 152970
nosy: Jim.Jewett, brett.cannon
priority: normal
severity: normal
status: open
title: importlib simplification

___
Python tracker 
<http://bugs.python.org/issue13977>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-10 Thread Jim Jewett

Jim Jewett  added the comment:

On Fri, Feb 10, 2012 at 6:02 PM, STINNER Victor

>  - PYTHONHASHSEED doc is not clear: it should be mentionned
> that the variable is ignored if PYTHONHASHRANDOMIZATION
> is not set

*That* is why this two-envvar solution bothers me.

PYTHONHASHSEED has to be a string anyhow, so why not just get rid of
PYTHONHASHRANDOMIZATION?

Use PYTHONHASHSEED=random to use randomization.

Other values that cannot be turned into an integer will be (currently)
undefined.  (You may want to raise a fatal error, on the assumption
that errors should not pass silently.)

A missing PYTHONHASHSEED then has the pleasant interpretation of
defaulting to "0" for backwards compatibility.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue8087] Unupdated source file in traceback

2012-02-14 Thread Jim Jewett

Jim Jewett  added the comment:

Martin v. Löwis (loewis) wrote:

> Displaying a warning whenever the code has changed on disk is
> clearly unacceptable

As clarified, the request is only for when a traceback is being created (or 
perhaps even only for when one is being printed).  

I agree that we don't want to watch every file every time any code is run, but 
by the time a traceback is being displayed, any tight loops are ending.

Nick Coghlan (ncoghlan) wrote:

> There are a few different cases: ...
> 2. Source has been changed, but module has not been reloaded ...
> 3. Source has been changed, module has been reloaded, but object ...

Given that a traceback is being displayed, I think it is reasonable to rerun 
the find-module portion of import, and verify that there is not stale 
byte-code.  

Frankly, I think it would be worth storing a file timestamp on modules, and 
verifying that whatever-would-be-imported-if-imported-now matches that 
timestamp.  This would also catch case (3).

I also think that -- on traceback display -- it might be worth verifying that 
the code's __globals__ is the __globals__ associated with the module of that 
name in sys.modules.  This would warn about some intentional manipulations, but 
would catch case (3) even more accurately.

--
nosy: +Jim.Jewett

___
Python tracker 
<http://bugs.python.org/issue8087>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14014] codecs.StreamWriter.reset contract not fulfilled

2012-02-14 Thread Jim Jewett

New submission from Jim Jewett :

def reset(self):

""" Flushes and resets the codec buffers used for keeping state.

Calling this method should ensure that the data on the
output is put into a clean state, that allows appending
of new fresh data without having to rescan the whole
stream to recover state.

"""
pass

This does not ensure that the stream is flushed, as the docstring promises.  I 
believe the following would work better.


def reset(self):
""" Flushes and resets the codec buffers used for keeping state.

Calling this method should ensure that the data on the
output is put into a clean state, that allows appending
of new fresh data without having to rescan the whole
stream to recover state.

"""
if hasattr(self.stream, "flush"): self.stream.flush()

--
components: Unicode
messages: 153354
nosy: Jim.Jewett, ezio.melotti
priority: normal
severity: normal
status: open
title: codecs.StreamWriter.reset contract not fulfilled
type: behavior
versions: Python 3.2

___
Python tracker 
<http://bugs.python.org/issue14014>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14015] surrogateescape largely missing from documentation

2012-02-14 Thread Jim Jewett

New submission from Jim Jewett :

Recent discussion on the mailing lists and in http://bugs.python.org/issue13997 
make it clear that the best way to get python2 results for 
"ASCII-in-the-parts-I-might-process-or-change" is to replace 

f = open(fname)
with
f = open(fname, encoding="ascii", errors="surrogateescape")

Unfortunately, surrogateescape (let alone this recipe) is not easily 
discoverable.  

http://docs.python.org/dev/library/functions.html#open lists 5 error-handlers 
-- but not this one.  It says that other error handlers are possible if they 
are registered with 
http://docs.python.org/dev/library/codecs.html#codecs.register_error but I 
haven't found a way to determine which error handlers are already registered.

The codecs.register (as opposed to register_error) documentation does list it 
as a possible value, but that is the only reference.

The other 5 error handlers are also available as module-level functions within 
the codecs module, and have their own documenation sections within 
http://docs.python.org/dev/library/codecs.html

Neither help(open) nor import codecs; help(codecs) provides any hints of the 
existence of surrogateescape.  Both explicitly suggest that it does not exist, 
by enumerating other values.

--
assignee: docs@python
components: Documentation, Unicode
messages: 153359
nosy: Jim.Jewett, docs@python, ezio.melotti
priority: normal
severity: normal
status: open
title: surrogateescape largely missing from documentation
versions: Python 3.1, Python 3.2, Python 3.3

___
Python tracker 
<http://bugs.python.org/issue14015>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13997] Clearly explain the bare minimum Python 3 users should know about Unicode

2012-02-14 Thread Jim Jewett

Jim Jewett  added the comment:

See bugs/python.org/issue14015 for one reason that surrogateescape isn't better 
known.

--
nosy: +Jim.Jewett

___
Python tracker 
<http://bugs.python.org/issue13997>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13703] Hash collision security issue

2012-02-14 Thread Jim Jewett

Jim Jewett  added the comment:

On Mon, Feb 13, 2012 at 3:37 PM,  Dave Malcolm
 added the comment:

>  * added comments about the specialcasing of length 0:
>    /*
>      We make the hash of the empty string be 0, rather than using
>      (prefix ^ suffix), since this slightly obfuscates the hash secret
>    */

Frankly, other short strings may give away even more, because you can
put several into the same dict.

I would prefer that the randomization not kick in until strings are at
least 8 characters, but I think excluding length 1 is a pretty obvious
win.

--

___
Python tracker 
<http://bugs.python.org/issue13703>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14067] Avoid more stat() calls in importlib

2012-02-26 Thread Jim Jewett

Jim Jewett  added the comment:

As long as the interpreter knows about about files that *it* wrote, no repeat 
checks during startup seems utterly reasonable; sneaking in a new or changed 
file is inherently a race condition.

I think it would also be reasonable for general use, so long as there was also 
a way to say "for this particular directory, always check".

--
nosy: +Jim.Jewett

___
Python tracker 
<http://bugs.python.org/issue14067>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue13903] New shared-keys dictionary implementation

2012-02-29 Thread Jim Jewett

Jim Jewett  added the comment:

As of Feb 28, 2012, the PEP mentions an additional optimization of storing the 
values in an array indexed by (effectively) key insertion order, rather than 
key position. ("Alternative Implementation")

It states that this would reduce memory usage for the values array by 1/3.  1/3 
is a worst-case measurement; average is 1/2.  (At savings of less than 1/3, the 
keys would resize, to initial savings of 2/3.  And yes, that means in practice, 
the average savings would be greater than half, because the frequency of dicts 
of size N decreases with N.)

It states that the keys table would need an additional "values_size" field, but 
in the absence of dummies, this is just ma_used.

Note that this would also simplify resizing, as the values arrays would not 
have to be re-ordered, and would not have to be modified at all unless/until 
that particular instance received a value for a position beyond its current 
size.

--
nosy: +Jim.Jewett

___
Python tracker 
<http://bugs.python.org/issue13903>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14205] Raise an error if a dict is modified during a lookup

2012-03-06 Thread Jim Jewett

Jim Jewett  added the comment:

Can't this be triggered by non-malicious code that just happened to have a 
python comparison and get hit with a thread switch?

I'm not sure how often it happens, but today it would not be visible to the 
user; after the patch, users will see a sporadic failure that they can't easily 
defend against.

Would it be worth adding a counter to lookdict, so that one modification is OK, 
but 5 aren't?

--
nosy: +Jim.Jewett

___
Python tracker 
<http://bugs.python.org/issue14205>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue14205] Raise an error if a dict is modified during a lookup

2012-03-06 Thread Jim Jewett

Jim Jewett  added the comment:

On Tue, Mar 6, 2012 at 11:56 AM, Mark Shannon wrote:

> Jim Jewett:
>> Can't this be triggered by non-malicious code that just happened
>> to have a python comparison and get hit with a thread switch?

> So, they are writing to a dict in one thread while reading from the
> same dict in another thread, without any external locks and with
> keys written in Python.

Correct.  For example, it could be a configuration manager, or a
cache, or even a worklist.  If they're just adding new keys, or even
deleting other (==> NOT the one being looked up) keys, why should that
keep them from finding the existing, unchanged keys?

>> I'm not sure how often it happens, but today it would not be visible
>> to the user; after the patch, users will see a sporadic failure that
>> they can't easily defend against.

> I suspect, they are already seeing sporadic failures.

How?

The chain terminates as soon as the dict doesn't resize; without
malicious intent, the odds of hitting several resizes in a row are so
miniscule that it probably hasn't even slowed them down.

--

___
Python tracker 
<http://bugs.python.org/issue14205>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue7652] Merge C version of decimal into py3k.

2012-03-06 Thread Jim Jewett

Jim Jewett  added the comment:

(1)  I think this module would benefit greatly from a map explaining what each 
file does, and perhaps from some reorganization. 

As best I can yet tell, there are about ~130 files, over a dozen directories, 
but the only ones that directly affect the implementation are a subset (~33) of 
the *.c and *h files in Modules/_decimal/ (and not subdirectories).  

Even files that do affect the implementation, such as mpdecimal.c, also seem to 
have functions thrown in just for testing small pieces of functionality, such 
as Newton Division.

There may also be some code that really isn't needed, except possibly for 
backwards compatibility, and could be #ifdef'ed or at least commented.  For 
example, the comments above io.c function _mpd_strneq(const char *s, const char 
*l, const char *u, size_t n) mention working around the Turkish un/dotted-i 
problem when lowercasing -- but why is a decimal library even worried about 
casing?

(2)  Is assembly allowed?  If not, please make it clear that vcdiv64.asm is 
just an optional speedup, and that the code doesn't rely upon it.

(3)  Are there parts of this library that provide functionality NOT in the 
current decimal library?  If so, this should be at least documented, and 
perhaps either removed or exposed.

--
nosy: +Jim.Jewett

___
Python tracker 
<http://bugs.python.org/issue7652>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



  1   2   3   >