> This will help in your code, but there is big pile of modules in stdlib
> that are not unicode-friendly. From my daily practice come shlex
> (tokenizer works only with encoded strings) and logging (you cann't
> specify encoding for FileHandler).
You can, of course, pass in a stream opened usi
aurora wrote:
What is the processing of getting a PEP work out? Does the work and
discussion carry out in the python-dev mailing list? I would be glad to
help out especially on this particular issue.
See PEP 1 for the PEP process. The main point is that discussion is
*not* carried out on any sp
"Fredrik Lundh" <[EMAIL PROTECTED]> writes on Sat, 19 Feb 2005 18:44:27 +0100:
> "aurora" <[EMAIL PROTECTED]> wrote:
>
> > I don't want to mix them. But how could I find them? How do I know this
> > statement can be
> > potential problem
> >
> > if a==b:
> >
> > where a and b can be instantia
"aurora" <[EMAIL PROTECTED]> wrote:
>> if you don't know what a and b comes from, how can you be sure that
>> your program works at all? how can you be sure they're both strings?
>
> a and b are both string.
how do you know that?
>> if you have unit tests, why don't they include Unicode tests?
On Sun, 20 Feb 2005 15:01:09 +0100, Martin v. Löwis <[EMAIL PROTECTED]>
wrote:
Nick Coghlan wrote:
Having "", u"", and r"" be immutable, while b"" was mutable would seem
rather inconsistent.
Yes. However, this inconsistency might be desirable. It would, of
course, mean that the literal canno
On Sat, 19 Feb 2005 18:44:27 +0100, Fredrik Lundh <[EMAIL PROTECTED]>
wrote:
"aurora" <[EMAIL PROTECTED]> wrote:
I don't want to mix them. But how could I find them? How do I know
this statement can be
potential problem
if a==b:
where a and b can be instantiated individually far away from
Martin v. Löwis wrote:
People also argue that with such an approach, we could as well
tell users to use array.array for the mutable type. But then,
people complain that it doesn't have all the library support that
strings have.
Indeed - I've got a data manipulating program that I figured I could ma
Nick Coghlan wrote:
Having "", u"", and r"" be immutable, while b"" was mutable would seem
rather inconsistent.
Yes. However, this inconsistency might be desirable. It would, of
course, mean that the literal cannot be a singleton. Instead, it has
to be a display (?), similar to list or dict displ
Martin v. Löwis wrote:
How about
b'' - 8bit string; '' unicode string
and no automatic conversion.
This has been proposed before, see PEP 332. The problem is that
people often want byte strings to be mutable as well, so it is
still unclear whether it is better to make the b prefix denote
the cur
aurora wrote:
Lots of errors. Amount them are gzip (binary?!) and strftime??
For gzip, this is not surprising. It contains things like
self.fileobj.write('\037\213')
which is not intended to denote characters.
How about
b'' - 8bit string; '' unicode string
and no automatic conversion.
This has
Thomas Heller wrote:
Is it possible to specify a byte string literal when running with the -U option?
Not literally. However, you can specify things like
bytes = [0x47, 0x49, 0x4f, 0x50, 0x01, 0x00]
bytes = ''.join((chr(x) for x in bytes))
Alternatively, you could rely on the 1:1 feature of Latin-1
Thomas Heller wrote:
=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> writes:
We have come up with a transition strategy, allowing existing
libraries to widen their support from byte strings to character
strings. This isn't a simple task, so many libraries still expect
and return by
"aurora" <[EMAIL PROTECTED]> wrote:
> I don't want to mix them. But how could I find them? How do I know this
> statement can be
> potential problem
>
> if a==b:
>
> where a and b can be instantiated individually far away from this line of
> code that put them
> together?
if you don't kno
On Fri, 18 Feb 2005 21:43:52 +0100, Thomas Heller wrote:
>> Eventually, the primary string type should be the Unicode
>> string. If you are curious how far we are still off that goal,
>> just try running your program with the -U option.
>
> Not very far - can't even call functions ;-)
>
def
On Fri, 18 Feb 2005 21:16:01 +0100, Martin v. Löwis <[EMAIL PROTECTED]>
wrote:
I'd like to point out the
historical reason: Python predates Unicode, so the byte string type
has many convenience operations that you would only expect of
a character string.
We have come up with a transition strateg
On Fri, 18 Feb 2005 20:18:28 +0100, Walter Dörwald <[EMAIL PROTECTED]>
wrote:
aurora wrote:
> [...]
In Java they are distinct data type and the compiler would catch all
incorrect usage. In Python, the interpreter seems to 'help' us to
promote binary string to unicode. Things works fine, u
Martin v. Löwis:
> Eventually, the primary string type should be the Unicode
> string. If you are curious how far we are still off that goal,
> just try running your program with the -U option.
Tried both -U and sys.setdefaultencoding("undefined") on a couple of my
most used programs and saw a
=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> writes:
> Eventually, the primary string type should be the Unicode
> string. If you are curious how far we are still off that goal,
> just try running your program with the -U option.
Not very far - can't even call functions ;-)
c:
=?ISO-8859-15?Q?=22Martin_v=2E_L=F6wis=22?= <[EMAIL PROTECTED]> writes:
> We have come up with a transition strategy, allowing existing
> libraries to widen their support from byte strings to character
> strings. This isn't a simple task, so many libraries still expect
> and return byte strings, w
Walter Dörwald napisał(a):
Is there a scheme for Python developer to use so that they are safe
from incorrect mixing?
Put the following:
import sys
sys.setdefaultencoding("undefined")
in a file named sitecustomize.py somewhere in your Python path and
Python will complain whenever there's an impl
aurora wrote:
The Java
has a much more usable model with unicode used internally and
encoding/decoding decision only need twice when dealing with input and
output.
In addition to Fredrik's comment (that you should use the same model
in Python) and Walter's comment (that you can enforce it by s
=?ISO-8859-15?Q?Walter_D=F6rwald?= <[EMAIL PROTECTED]> writes:
> aurora wrote:
>
> > [...]
>> In Java they are distinct data type and the compiler would catch all
>> incorrect usage. In Python, the interpreter seems to 'help' us to
>> promote binary string to unicode. Things works fine, unit tes
Fredrik Lundh napisał(a):
This brings up another issue. Most references and books focus exclusive on entering unicode
literal and using the encode/decode methods. The fallacy is that string is such a basic data type
use throughout the program, you really don't want to make a individual decisio
aurora wrote:
> [...]
In Java they are distinct data type and the compiler would catch all
incorrect usage. In Python, the interpreter seems to 'help' us to
promote binary string to unicode. Things works fine, unit tests pass,
all until the first non-ASCII characters come in and then the prog
On Fri, 18 Feb 2005 19:24:10 +0100, Fredrik Lundh <[EMAIL PROTECTED]>
wrote:
that's how you should do things in Python too, of course. a unicode
string
uses unicode internally. decode on the way in, encode on the way out, and
things just work.
the fact that you can mess things up by mixing u
anonymous coward <[EMAIL PROTECTED]> wrote:
> This brings up another issue. Most references and books focus exclusive on
> entering unicode
> literal and using the encode/decode methods. The fallacy is that string is
> such a basic data type
> use throughout the program, you really don't wa
I have long find the Python default encoding of strict ASCII frustrating.
For one thing I prefer to get garbage character than an exception. But the
biggest issue is Unicode exception often pop up in unexpected places and
only when a non-ASCII or unicode character first found its way into the
27 matches
Mail list logo