Re: unicode encoding usablilty problem

2005-02-25 Thread Vinay Sajip
> This will help in your code, but there is big pile of modules in stdlib > that are not unicode-friendly. From my daily practice come shlex > (tokenizer works only with encoded strings) and logging (you cann't > specify encoding for FileHandler). You can, of course, pass in a stream opened usi

Re: unicode encoding usablilty problem

2005-02-21 Thread "Martin v. Löwis"
aurora wrote: What is the processing of getting a PEP work out? Does the work and discussion carry out in the python-dev mailing list? I would be glad to help out especially on this particular issue. See PEP 1 for the PEP process. The main point is that discussion is *not* carried out on any sp

Re: unicode encoding usablilty problem

2005-02-21 Thread Dieter Maurer
"Fredrik Lundh" <[EMAIL PROTECTED]> writes on Sat, 19 Feb 2005 18:44:27 +0100: > "aurora" <[EMAIL PROTECTED]> wrote: > > > I don't want to mix them. But how could I find them? How do I know this > > statement can be > > potential problem > > > > if a==b: > > > > where a and b can be instantia

Re: unicode encoding usablilty problem

2005-02-21 Thread Fredrik Lundh
"aurora" <[EMAIL PROTECTED]> wrote: >> if you don't know what a and b comes from, how can you be sure that >> your program works at all? how can you be sure they're both strings? > > a and b are both string. how do you know that? >> if you have unit tests, why don't they include Unicode tests?

Re: unicode encoding usablilty problem

2005-02-20 Thread aurora
On Sun, 20 Feb 2005 15:01:09 +0100, Martin v. Löwis <[EMAIL PROTECTED]> wrote: Nick Coghlan wrote: Having "", u"", and r"" be immutable, while b"" was mutable would seem rather inconsistent. Yes. However, this inconsistency might be desirable. It would, of course, mean that the literal canno

Re: unicode encoding usablilty problem

2005-02-20 Thread aurora
On Sat, 19 Feb 2005 18:44:27 +0100, Fredrik Lundh <[EMAIL PROTECTED]> wrote: "aurora" <[EMAIL PROTECTED]> wrote: I don't want to mix them. But how could I find them? How do I know this statement can be potential problem if a==b: where a and b can be instantiated individually far away from

Re: unicode encoding usablilty problem

2005-02-20 Thread Nick Coghlan
Martin v. Löwis wrote: People also argue that with such an approach, we could as well tell users to use array.array for the mutable type. But then, people complain that it doesn't have all the library support that strings have. Indeed - I've got a data manipulating program that I figured I could ma

Re: unicode encoding usablilty problem

2005-02-20 Thread "Martin v. Löwis"
Nick Coghlan wrote: Having "", u"", and r"" be immutable, while b"" was mutable would seem rather inconsistent. Yes. However, this inconsistency might be desirable. It would, of course, mean that the literal cannot be a singleton. Instead, it has to be a display (?), similar to list or dict displ

Re: unicode encoding usablilty problem

2005-02-20 Thread Nick Coghlan
Martin v. Löwis wrote: How about b'' - 8bit string; '' unicode string and no automatic conversion. This has been proposed before, see PEP 332. The problem is that people often want byte strings to be mutable as well, so it is still unclear whether it is better to make the b prefix denote the cur

Re: unicode encoding usablilty problem

2005-02-20 Thread "Martin v. Löwis"
aurora wrote: Lots of errors. Amount them are gzip (binary?!) and strftime?? For gzip, this is not surprising. It contains things like self.fileobj.write('\037\213') which is not intended to denote characters. How about b'' - 8bit string; '' unicode string and no automatic conversion. This has

Re: unicode encoding usablilty problem

2005-02-20 Thread "Martin v. Löwis"
Thomas Heller wrote: Is it possible to specify a byte string literal when running with the -U option? Not literally. However, you can specify things like bytes = [0x47, 0x49, 0x4f, 0x50, 0x01, 0x00] bytes = ''.join((chr(x) for x in bytes)) Alternatively, you could rely on the 1:1 feature of Latin-1

Re: unicode encoding usablilty problem

2005-02-19 Thread Nick Coghlan
Thomas Heller wrote: =?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> writes: We have come up with a transition strategy, allowing existing libraries to widen their support from byte strings to character strings. This isn't a simple task, so many libraries still expect and return by

Re: unicode encoding usablilty problem

2005-02-19 Thread Fredrik Lundh
"aurora" <[EMAIL PROTECTED]> wrote: > I don't want to mix them. But how could I find them? How do I know this > statement can be > potential problem > > if a==b: > > where a and b can be instantiated individually far away from this line of > code that put them > together? if you don't kno

Re: unicode encoding usablilty problem

2005-02-19 Thread Alexander Schremmer
On Fri, 18 Feb 2005 21:43:52 +0100, Thomas Heller wrote: >> Eventually, the primary string type should be the Unicode >> string. If you are curious how far we are still off that goal, >> just try running your program with the -U option. > > Not very far - can't even call functions ;-) > def

Re: unicode encoding usablilty problem

2005-02-18 Thread aurora
On Fri, 18 Feb 2005 21:16:01 +0100, Martin v. Löwis <[EMAIL PROTECTED]> wrote: I'd like to point out the historical reason: Python predates Unicode, so the byte string type has many convenience operations that you would only expect of a character string. We have come up with a transition strateg

Re: unicode encoding usablilty problem

2005-02-18 Thread aurora
On Fri, 18 Feb 2005 20:18:28 +0100, Walter Dörwald <[EMAIL PROTECTED]> wrote: aurora wrote: > [...] In Java they are distinct data type and the compiler would catch all incorrect usage. In Python, the interpreter seems to 'help' us to promote binary string to unicode. Things works fine, u

Re: unicode encoding usablilty problem

2005-02-18 Thread Neil Hodgson
Martin v. Löwis: > Eventually, the primary string type should be the Unicode > string. If you are curious how far we are still off that goal, > just try running your program with the -U option. Tried both -U and sys.setdefaultencoding("undefined") on a couple of my most used programs and saw a

Re: unicode encoding usablilty problem

2005-02-18 Thread Thomas Heller
=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> writes: > Eventually, the primary string type should be the Unicode > string. If you are curious how far we are still off that goal, > just try running your program with the -U option. Not very far - can't even call functions ;-) c:

Re: unicode encoding usablilty problem

2005-02-18 Thread Thomas Heller
=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> writes: > We have come up with a transition strategy, allowing existing > libraries to widen their support from byte strings to character > strings. This isn't a simple task, so many libraries still expect > and return byte strings, w

Re: unicode encoding usablilty problem

2005-02-18 Thread Jarek Zgoda
Walter Dörwald napisał(a): Is there a scheme for Python developer to use so that they are safe from incorrect mixing? Put the following: import sys sys.setdefaultencoding("undefined") in a file named sitecustomize.py somewhere in your Python path and Python will complain whenever there's an impl

Re: unicode encoding usablilty problem

2005-02-18 Thread "Martin v. Löwis"
aurora wrote: The Java has a much more usable model with unicode used internally and encoding/decoding decision only need twice when dealing with input and output. In addition to Fredrik's comment (that you should use the same model in Python) and Walter's comment (that you can enforce it by s

Re: unicode encoding usablilty problem

2005-02-18 Thread Thomas Heller
=?ISO-8859-15?Q?Walter_D=F6rwald?= <[EMAIL PROTECTED]> writes: > aurora wrote: > > > [...] >> In Java they are distinct data type and the compiler would catch all >> incorrect usage. In Python, the interpreter seems to 'help' us to >> promote binary string to unicode. Things works fine, unit tes

Re: unicode encoding usablilty problem

2005-02-18 Thread Jarek Zgoda
Fredrik Lundh napisał(a): This brings up another issue. Most references and books focus exclusive on entering unicode literal and using the encode/decode methods. The fallacy is that string is such a basic data type use throughout the program, you really don't want to make a individual decisio

Re: unicode encoding usablilty problem

2005-02-18 Thread Walter Dörwald
aurora wrote: > [...] In Java they are distinct data type and the compiler would catch all incorrect usage. In Python, the interpreter seems to 'help' us to promote binary string to unicode. Things works fine, unit tests pass, all until the first non-ASCII characters come in and then the prog

Re: unicode encoding usablilty problem

2005-02-18 Thread aurora
On Fri, 18 Feb 2005 19:24:10 +0100, Fredrik Lundh <[EMAIL PROTECTED]> wrote: that's how you should do things in Python too, of course. a unicode string uses unicode internally. decode on the way in, encode on the way out, and things just work. the fact that you can mess things up by mixing u

Re: unicode encoding usablilty problem

2005-02-18 Thread Fredrik Lundh
anonymous coward <[EMAIL PROTECTED]> wrote: > This brings up another issue. Most references and books focus exclusive on > entering unicode > literal and using the encode/decode methods. The fallacy is that string is > such a basic data type > use throughout the program, you really don't wa

unicode encoding usablilty problem

2005-02-18 Thread aurora
I have long find the Python default encoding of strict ASCII frustrating. For one thing I prefer to get garbage character than an exception. But the biggest issue is Unicode exception often pop up in unexpected places and only when a non-ASCII or unicode character first found its way into the