[issue2948] Unicode support for hashing algorithms

2008-05-23 Thread Marc-Andre Lemburg
Marc-Andre Lemburg <[EMAIL PROTECTED]> added the comment: On 2008-05-23 05:38, Raymond Hettinger wrote: > Raymond Hettinger <[EMAIL PROTECTED]> added the comment: > > I don't think this is the right thing to do. The hash algorithms are > defined in terms of bytes, but Unicode is an abstracted

[issue2948] Unicode support for hashing algorithms

2008-05-22 Thread Martin v. Löwis
Martin v. Löwis <[EMAIL PROTECTED]> added the comment: I'm rejecting this idea, for the reasons already given by others: the same string might have different hash values, depending on which encoding is chosen. Users will have to be explicit when hashing, just as they need to be explicit when they

[issue2948] Unicode support for hashing algorithms

2008-05-22 Thread Raymond Hettinger
Raymond Hettinger <[EMAIL PROTECTED]> added the comment: Only 2.6 should be marked. This is a feature request for an implicit conversion with a default encoding; it is not a bugfix. FWIW, here's a reference to an earlier discussion: http://mail.python.org/pipermail/python-list/2004-April/25863

[issue2948] Unicode support for hashing algorithms

2008-05-22 Thread Vasco Rodrigues
Changes by Vasco Rodrigues <[EMAIL PROTECTED]>: -- versions: +Python 2.4, Python 2.5 __ Tracker <[EMAIL PROTECTED]> __ ___ Python-bugs-list mail

[issue2948] Unicode support for hashing algorithms

2008-05-22 Thread Vasco Rodrigues
Vasco Rodrigues <[EMAIL PROTECTED]> added the comment: You could just make a check for unicode strings and issue the encode in the hash function. I understand the byte abstraction, but if you issue an encode on a unicode string with only ascii chars it gets converted to the same in ascii, result

[issue2948] Unicode support for hashing algorithms

2008-05-22 Thread Raymond Hettinger
Raymond Hettinger <[EMAIL PROTECTED]> added the comment: I don't think this is the right thing to do. The hash algorithms are defined in terms of bytes, but Unicode is an abstracted from a byte level encoding. It doesn't make sense to convert using an arbitrary encoding (such as UTF-8) becau

[issue2948] Unicode support for hashing algorithms

2008-05-22 Thread Vasco Rodrigues
New submission from Vasco Rodrigues <[EMAIL PROTECTED]>: The hashing algorithms don't support Unicode. Any Unicode text given to them is first tried to convert ascii and then hashed. Not all strings are convertible to ascii. Now that Unicode is becoming the default encoding, specially for the web