from:"Hallvard B Furuseth"

try: except :

2006-01-10 Thread Hallvard B Furuseth

I'd like an 'except ' statement
Is there a defined way to do that, for Python 2.2 and above?
'except None:' works for now, but I don't know if that's safe:

for ex in ZeroDivisionError, None:
try:
1/0
except ex:
print "Ignored first exception."

I could just use
except ZeroDivisionError:
if not :
raise
print "Ignored first exception."
but the variant above gets a bit neater.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: try: except :

2006-01-10 Thread Hallvard B Furuseth

Paul Rubin writes:
>Hallvard B Furuseth <[EMAIL PROTECTED]> writes:
>> 'except None:' works for now, but I don't know if that's safe:
>>
>> for ex in ZeroDivisionError, None:
>> try:
>> 1/0
>> except ex:
>> print "Ignored first exception."
>
> class NeverRaised(Exception): pass
>
> for ex in ZeroDivisionError, NeverRaised:

Heh.  Simple enough.  Unless some obstinate person raises it anyway...

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: try: except :

2006-01-13 Thread Hallvard B Furuseth

Thanks for the help.

Tom Anderson writes:
>> class NeverRaised(Exception):
>>def __init__(self, *args):
>>raise RuntimeError('NeverRaised should never be raised')
>
> Briliant! Although i'd be tempted to define an UnraisableExceptionError
> to signal what's happened. Or ...

A package we are using has ProgrammingError (like AssertionError except
__debug__ doesn't disable it), and RealityError when something *really*
can't happen:-)

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python regular expression question!

2006-09-20 Thread Hallvard B Furuseth

"unexpected" <[EMAIL PROTECTED]> writes:

> I'm trying to do a whole word pattern match for the term 'MULTX-'
>
> Currently, my regular expression syntax is:
>
> re.search(('^')+(keyword+'\\b')

\b matches the beginning/end of a word (characters a-zA-Z_0-9).
So that regex will match e.g. MULTX-FOO but not MULTX-.

Incidentally, in case the keyword contains regex special characters
(like '*') you may wish to escape it: re.escape(keyword).

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Sorting strings containing special characters (german 'Umlaute')

2007-03-02 Thread Hallvard B Furuseth

[EMAIL PROTECTED] writes:
> For sorting the letter "Ä" is supposed to be treated like "Ae",
> therefore sorting this list should yield
> l = ["Aber, "Ärger", "Beere"]

Are you sure?  Maybe I'm thinking of another language, I thought Ä shold
be sorted together with A, but after A if the words are otherwise equal.
E.g. Antwort, Ärger, Beere.  A proper strcoll handles that by
translating "Ärger" to e.g. ["Arger", ],
then it can sort first by the un-accentified name and then by the rest.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: class attrdict

2007-03-09 Thread Hallvard B Furuseth

Alex Martelli writes:
> (...)
>> class Namespace(object):
> (...)
> I might, if it weren't for the redundant "if" and the horribly buggy
> interference between separate instances -- which is why I wrote it,
> almost six years ago and without the bugs, as
>  .

Nice one.  I gotta dig up the Cookbook again.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: class attrdict

2007-03-09 Thread Hallvard B Furuseth

Alex Martelli writes:
> You make a good point.  I do like being able to say foo.bar=baz rather
> than foo['bar']=baz in certain cases -- not so much to save 3 chars, but
> to avoid excessive punctuation; however, I don't really need this AND
> all of dict's power at the same time, so, I don't inherit from dict:-).

Yes.  Attribute syntax looks nicer, in particular one implements a sort
of private "variables collected in a dict" thing (e.g. SQL field names)
but still wants some dict functionality.

Another variant I thought of would be to prefix dict methods with '_'
(except those that already start with '__') and (if implemented as a
dict subtype) also override the original names with a "sorry, use
_" error method.

(Posting a bit sporatically currently, disappearing for a week again
now.)

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: LDAP/LDIF Parsing

2007-02-02 Thread Hallvard B Furuseth

Bruno Desthuilliers writes:
> class LdapObject(object):
>(...)
>def __getattr__(self, name):
>  try:
>data = self._record[name]
>  except KeyError:
>raise AttributeError(
>  "object %s has no attribute %s" % (self, name)
> )

Note that LDAP attribute descriptions may be invalid Python
attribute names.  E.g.
{...
 'title;lang-en': ['The Boss']
 'title;lang-no': ['Sjefen']}
So you'd have to call getattr() explicitly to get at all the attributes
this way.

>  else:
># all LDAP attribs are multivalued by default,
># even when the schema says they are monovalued
>if len(data) == 1:
>   return data[0]
>else:
>   return data[:]

IMHO, this just complicates the client code since the client needs to
inserts checks of isinstance(return value, list) all over the place.
Better to have a separate method which extracts just the first value of
an attribute, if you want that.

-- 
Regards,
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: LDAP/LDIF Parsing

2007-02-02 Thread Hallvard B Furuseth

Bruno Desthuilliers writes:
>Hallvard B Furuseth a écrit :
>>>  else:
>>># all LDAP attribs are multivalued by default,
>>># even when the schema says they are monovalued
>>>if len(data) == 1:
>>>   return data[0]
>>>else:
>>>   return data[:]
>> IMHO, this just complicates the client code since the client needs to
>> inserts checks of isinstance(return value, list) all over the place.
>> Better to have a separate method which extracts just the first value of
>> an attribute, if you want that.
>
> Most of the times, in a situation such as the one described by the OP,
> one knows by advance if a given LDAP attribute will be used as
> monovalued or multivalued. Well, this is at least my own experience...

But if the attribute is multivalued, you don't know if it will contain
just one value or not.  If you expect telephoneNumber to be multivalued,
but receive just one value '123',
  for value in foo.telephoneNumber: print value
will print
1
2
3

BTW, Cruelemort, remember that attribute names are case-insensitive.  If
you ask the server to for attribute "cn", it might still return "CN".

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

class attrdict

2007-03-02 Thread Hallvard B Furuseth

Does this class need anything more?
Is there any risk of a lookup loop?
Seems to work...

class attrdict(dict):
"""Dict where d['foo'] also can be accessed as d.foo"""
def __init__(self, *args, **kwargs):
self.__dict__ = self
dict.__init__(self, *args, **kwargs)
def __repr__(self):
return dict.__repr__(self).join(("attrdict(", ")"))

>>> a = attrdict([(1,2)], a=3, b=4)
>>> a
attrdict({'a': 3, 1: 2, 'b': 4})
>>> a = attrdict([(1,2)], b=3, c=4)
>>> a
attrdict({1: 2, 'c': 4, 'b': 3})
>>> a.b
3
>>> a.d = 5
>>> a['d']
5
>>> a.e
Traceback (most recent call last):
  File "", line 1, in ?
AttributeError: 'attrdict' object has no attribute 'e'
>>> a.__getattr__ = 'xyzzy'
>>> a.__getattribute__ = 'xyzzy'
>>> a.__setattr__ = 'xyzzy'
>>> a.__delattr__ = 'xyzzy'
>>> a.c
4
>>> a[1]
2
>>> del a.c
>>>

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Dispatching operations to user-defined methods

2006-05-02 Thread Hallvard B Furuseth

I'm wondering how to design this:

An API to let a request/response LDAP server be configured so a
user-defined Python module can handle and/or modify some or all
incoming operations, and later the outgoing responses (which are
generated by the server).  Operations have some common elements,
and some which are distinct to the operation type (search, modify,
compare, etc).  So do responses.

There are also some "operations" used internally by the server -
like checking if the current session has access to perform the
operation it requested.

The server will have various internal classes for operations and
data in operations, and methods for the user to access them.

One obvious implementation would be to provide a class Operation,
let the user define a single subclass of this, and have the server
call request_search(), response_search(), request_modify(),
check_access() etc in that subclass.

Then I suppose the server would turn individual operations into
instance of internal subclasses of the user's subclass - class
SearchOperation(), ModifyOperation()
etc.  Begins to look a bit messy now.

And I'd like to try several methods - first try if the user defined
response_search_entry() (one type of Search operation response),
then response_search(), then response(), and after that give up.
For that one, the Pythonic approach seems to be to define a class
hierarchy for these, let the user subclass the subclasses and define
request() and response() methods for them, and let Python handle the
search for which request() method to use for which operation.  And
the server must keep track of which subclasses the user defined.
This too feels a bit messy to me.

Also there are plenty of operation parameters the user might wish to
dispatch on, e.g. maybe he can handle Entry but not Referral search
responses.  I imagine it's more efficient to dispatch in a Python C
module than to leave that to the user.  But I may be getting too
ambitious now.

Anyway, ideas?  Am I overlooking something obvious?

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Dispatching operations to user-defined methods

2006-05-03 Thread Hallvard B Furuseth

I wrote:
> I'm wondering how to design this:
> (...)
> One obvious implementation would be to provide a class Operation,
> let the user define a single subclass of this, and have the server
> call request_search(), response_search(), request_modify(),
> check_access() etc in that subclass.
>
> Then I suppose the server would turn individual operations into
> instance of internal subclasses of the user's subclass - class
> SearchOperation(), ModifyOperation()
> etc.  Begins to look a bit messy now.
> (...)

  Or two classes - one operation class and one class
subclassed by the user.

Still need some way to let the user provide both general and more
specialized methods though.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Dispatching operations to user-defined methods

2006-05-08 Thread Hallvard B Furuseth

Michele Simionato writes:
> Apparently Guido fell in love with generic functions, so
> (possibly) in future Python versions you will be able to
> solve dispatching problems in in an industrial strenght way.

Looks interesting, I'll keep an eye on that.

> Sometimes however the simplest possible way is enough, and you
> can use something like this :
>
> class SimpleDispatcher(object):
> (...)

That doesn't make use of any subclass hierarchies the user defines
though.  But maybe it's just as well to scan his class for names
once he has defined it, and build the dispatch table myself.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Modifying PyObject.ob_type

2006-05-08 Thread Hallvard B Furuseth

I've got some fixed-size types with identical object layout defind in C.
The only differences are: Which methods they have, the name, and some
are subtypes of others.

Can I modify the ob_type of their instances, to switch between which of
these types an object has?

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Embedding python in package using autoconf

2006-05-15 Thread Hallvard B Furuseth

I want to use Python as an extension language in a package
which uses autoconf.  That means its and Python's autoconf
#defines can conflict, so I can't safely #include both
 and the package's own include files:-(

Do anyone have a safe way to #include at least 
without ?  E.g. copy the files (and 's
PyAPI_FUNC/PyAPI_DATA) and rename the autoconf macros - and
fail compilation if that wouldn't work?

Currently I have two sets of source files - one set which
only #includes Python.h and one set which only #includes
the package's files.
They communicate through a single .h file with a bunch of
enums for various functions and struct members, and wrapper
functions to use these by their enum value.  It has a few
cheats like declaring "struct _object;" (that's PyObject)
to make life simpler, but it's still rather tedious.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python 3.0 - is this true?

2008-11-09 Thread Hallvard B Furuseth

Steven D'Aprano writes:
> How often do you care about equality ignoring order for lists containing 
> arbitrary, heterogeneous types?

Arbitrary, I never have.  Different types of my choice, a few times.
I was only interested in there being some _sort_ order (and the same in
different runs of the program), not in what the sort order was.

However string sorting is partly arbitrary too.  It depends on the
code points in the string's character set.  u'\u00c5' < u'\u00d8'
(u'Å' < u'Ø') is wrong in Norwegian.  '0' < 'B' < 'a' < '~' can be
wrong depending on the application, as can '10' < '5'.

So I don't quite buy the argument about arbitrary sorting.  If you have
different types, at least you likely know that you are doing something
weird.  OTOH it's quite common - even normal - to produce poor sort
results because one depends on the built-in arbitrary string sort order.

In any case, would it be possible to add a cmp= function which did more
or less what the current default cmp does now?  Then it's available when
someone wants it, and the "arbitrariness" of different types is not
inflicted on people who don't.  (In my case I _think_ it's just a "nice
to have" to me and nothing more, but I don't remember for sure.)

-- 
Hallvard
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python 3.0 - is this true?

2008-11-09 Thread Hallvard B Furuseth

Terry Reedy writes:
> If you want to duplicate 2.x behavior, which does *not* work for all
> types...
>
> def py2key(item): return (str(type(item)), item)

Nope.
  sorted((-1, 2, True, False)) == [-1, False, True, 2]
  sorted((-1, 2, True, False), key=py2key) == [False, True, -1, 2]
Might often be good enough though.  But uses more memory.

-- 
Hallvard
--
http://mail.python.org/mailman/listinfo/python-list

Python package to read .7z archives?

2010-08-04 Thread Hallvard B Furuseth

Is there an equivalent of zipfile.py for .7z archives?
I have one which extracts an archive member by running 7z e -so,
but that's a *slow* way to read one file at a time.

Google found me some python interfaces to lzma, but apparently they
only handle single compressed files, not .7z archives.

(Actually another archive format would be fine if it is competitive.
I'm just looking to compress my .zips better.  I need a Python module
to extract members reasonably fast, but slow compression would be OK.)

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python package to read .7z archives?

2010-08-04 Thread Hallvard B Furuseth

Giampaolo Rodolà  writes:
> 2010/8/4 Hallvard B Furuseth :
>> Is there an equivalent of zipfile.py for .7z archives?
>> I have one which extracts an archive member by running 7z e -so,
>> but that's a *slow* way to read one file at a time.
>>
>> Google found me some python interfaces to lzma, but apparently they
>> only handle single compressed files, not .7z archives.
>>
>> (Actually another archive format would be fine if it is competitive.
>> I'm just looking to compress my .zips better.  I need a Python module
>> to extract members reasonably fast, but slow compression would be OK.)
>
> http://bugs.python.org/issue5689

[For lzma/xz compressed tar archives]

Thanks, but extraction of individual members from .tar.xz looks
inherently slow.  To locate the member, you need to decompress
the entire portion of the archive preceding the member.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: No trees in the stdlib?

2009-06-26 Thread Hallvard B Furuseth

Stefan Behnel writes:
>João Valverde wrote:
>> Besides some interface glitches, like returning None
>> on delete if I recall correctly.
>
> That's actually not /that/ uncommon. Operations that change an object are
> not (side-effect free) functions, so it's just purity if they do not have a
> return value.

It's purity that they don't return the modified tree/dict/whatever.
They can still return the deleted element and remain pure.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: [RELEASED] Python 3.1 final

2009-06-28 Thread Hallvard B Furuseth

Benjamin Peterson writes:
>Nobody  nowhere.com> writes:
>> On Sun, 28 Jun 2009 19:21:49 +, Benjamin Peterson wrote:
>> 1. Does Python offer any assistance in doing so, or do you have to
>> manually convert the surrogates which are generated for unrecognised bytes?
>
> fs_encoding = sys.getfilesystemencoding()
> bytes_argv = [arg.encode(fs_encoding, "surrogateescape") for arg in sys.argv]
>
>> 2. How do you do this for non-invertible encodings (e.g. ISO-2022)?
>
> What's a non-invertible encoding? I can't find a reference to the term.

Different ISO-2022 strings can map to the same Unicode string.
Thus you can convert back to _some_ ISO-2022 string, but it won't
necessarily match the original.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: [RELEASED] Python 3.1 final

2009-06-29 Thread Hallvard B Furuseth

Nobody  writes:
>On Sun, 28 Jun 2009 14:36:37 +0200, Martin v. Löwis wrote:
>> See PEP 383.
>
> Okay, that's useful, except that it may have some bugs:
> (...)
> Assuming that this gets fixed, it should make most of the problems with
> 3.0 solvable. OTOH, it wouldn't have killed them to have added e.g.
> sys.argv_bytes and os.environ_bytes.

That's hopeless to keep track of across modules if something modifies
sys.argv or os.environ.

If the current scheme for recovering the original bytes proves
insufficient, what could work is a string type which can have an
attribute with the original bytes (if the source was bytes).  And/or
sys.argv and os.environ maintaining the correspondence when feasible.

Anyway, I haven't looked at whether any of this is a problem, so don't
mind me:-)  As long as it's definitely possible to tell python once
and for all not to apply locales and string conversions, instead of
having to keep track of an ever-expanding list of variables to tame
it's bytes->character conversions (as happened with Emacs).

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Help to find a regular expression to parse po file

2009-07-06 Thread Hallvard B Furuseth

gialloporpora writes:
> I would like to extract string from a PO file. To do this I have created
> a little python function to parse po file and extract string:
>
> import re
> regex=re.compile("msgid (.*)\\nmsgstr (.*)\\n\\n")
> m=r.findall(s)

I don't know the syntax of a po file, but this works for the
snippet you posted:

arg_re = r'"[^\\\"]*(?:\\.[^\\\"]*)*"'
arg_re = '%s(?:\s+%s)*' % (arg_re, arg_re)
find_re = re.compile(
r'^msgid\s+(' + arg_re + ')\s*\nmsgstr\s+(' + arg_re + ')\s*\n', re.M)

However, can \ quote a newline? If so, replace \\. with \\[\s\S] or
something.
Can there be other keywords between msgid and msgstr?  If so,
add something like (?:\w+\s+\s*\n)*? between them.
Can msgstr come before msgid? If so, forget using a single regexp.
Anything else to the syntax to look out for?  Single quotes, maybe?

Is it a problem if the regexp isn't quite right and doesn't match all
cases, yet doesn't report an error when that happens?

All in all, it may be a bad idea to sqeeze this into a single regexp.
It gets ugly real fast.  Might be better to parse the file in a more
regular way, maybe using regexps just to extract each (keyword, "value")
pair.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

harmful str(bytes)

2010-10-07 Thread Hallvard B Furuseth

I've been playing a bit with Python3.2a2, and frankly its charset
handling looks _less_ safe than in Python 2.

The offender is bytes.__str__: str(b'foo') == "b'foo'".
It's often not clear from looking at a piece of code whether
some data is treated as strings or bytes, particularly when
translating from old code.  Which means one cannot see from
context if str(s) or "%s" % s will produce garbage.

With 2. conversion Unicode <-> string the equivalent operation did
not silently produce garbage: it raised UnicodeError instead.  With old
raw Python strings that was not a problem in applications which did not
need to convert any charsets, with python3 they can break.

I really wish bytes.__str__ would at least by default fail.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: harmful str(bytes)

2010-10-08 Thread Hallvard B Furuseth

Arnaud Delobelle writes:
>Hallvard B Furuseth  writes:
>> I've been playing a bit with Python3.2a2, and frankly its charset
>> handling looks _less_ safe than in Python 2.
>> (...)
>> With 2. conversion Unicode <-> string the equivalent operation did
>> not silently produce garbage: it raised UnicodeError instead.  With old
>> raw Python strings that was not a problem in applications which did not
>> need to convert any charsets, with python3 they can break.
>>
>> I really wish bytes.__str__ would at least by default fail.
>
> I think you misunderstand the purpose of str().  It is to provide a
> (unicode) string representation of an object and has nothing to do with
> converting it to unicode:

That's not the point - the point is that for 2.* code which _uses_ str
vs unicode, the equivalent 3.* code uses str vs bytes.  Yet not the
same way - a 2.* 'str' will sometimes be 3.* bytes, sometime str.  So
upgraded old code will have to expect both str and bytes.

In 2.*, str<->unicode conversion failed or produced the equivalent
character/byte data.  Yes, there could be charset problems if the
defaults were set up wrong, but that's a smaller problem than in 3.*.
In 3.*, the bytes->str conversion always _silently_ produces garbage.

And lots of code use both, and need to convert back and forth.  In
particular code 3.* code converted from 2.*, or using modules converted
from 2.*.  There's a lot of such code, and will be for a long time.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: harmful str(bytes)

2010-10-08 Thread Hallvard B Furuseth

Antoine Pitrou writes:
>Hallvard B Furuseth  wrote:
>> The offender is bytes.__str__: str(b'foo') == "b'foo'".
>> It's often not clear from looking at a piece of code whether
>> some data is treated as strings or bytes, particularly when
>> translating from old code.  Which means one cannot see from
>> context if str(s) or "%s" % s will produce garbage.
>
> This probably comes from overuse of str(s) and "%s". They can be useful
> to produce human-readable messages, but you shouldn't have to use them
> very often.

Maybe Python 3 has something better, but they could be hard to avoid in
Python 2.  And certainly our site has plenty of code using them, whether
we should have avoided them or not.

>> I really wish bytes.__str__ would at least by default fail.
>
> Actually, the implicit contract of __str__ is that it never fails, so
> that everything can be printed out (for debugging purposes, etc.).

Nope:

$ python2 -c 'str(u"\u1000")'
Traceback (most recent call last):
  File "", line 1, in ?
UnicodeEncodeError: 'ascii' codec can't encode character u'\u1000' in position 
0: ordinal not in range(128)

And the equivalent:

$ python2 -c 'unicode("\xA0")'
Traceback (most recent call last):
  File "", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: ordinal 
not in range(128)

In Python 2, these two UnicodeEncodeErrors made our data safe from code
which used str and unicode objects without checking too carefully which
was which.  Code which sort the types out carefully enough would fail.

In Python 3, that safety only exists for bytes(str), not str(bytes).

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: harmful str(bytes)

2010-10-08 Thread Hallvard B Furuseth

Steven D'Aprano writes:
>On Fri, 08 Oct 2010 15:31:27 +0200, Hallvard B Furuseth wrote:
>> That's not the point - the point is that for 2.* code which _uses_ str
>> vs unicode, the equivalent 3.* code uses str vs bytes.  Yet not the same
>> way - a 2.* 'str' will sometimes be 3.* bytes, sometime str.  So
>> upgraded old code will have to expect both str and bytes.
>
> I'm sorry, this makes no sense to me. I've read it repeatedly, and I 
> still don't understand what you're trying to say.

OK, here is a simplified example after 2to3:

try:from urlparse import urlparse, urlunparse # Python 2.6
except: from urllib.parse import urlparse, urlunparse # Python 3.2a

foo, bar = b"/foo", b"bar" # Data from network, bar normally empty

# Statement inserted for 2.3 when urlparse below said TypeError
if isinstance(foo, bytes): foo = foo.decode("ASCII")

p = list(urlparse(foo))
if bar: p[3] = bar
print(urlunparse(p))

2.6 prints "/foo;bar", 3.2a prints "/foo;b'bar'"

You have a module which receives some strings/bytes, maybe data which
originates on the net or in a database.  The module _and its callers_
may date back to before the 'bytes' type, maybe before 'unicode'.
The module is supposed to work with this data and produce some 'str's
or bytes to output.  _Not_ a Python representation like "b'bar'".

The module doesn't always know which input is 'bytes' and which is
'str'.  Or the callers don't know what it expects, or haven't kept
track.  Maybe the input originated as bytes and were converted to
str at some point, maybe not.

Look at urrlib.parse.py and its isinstance(, )
calls.  urlencode() looks particularly gross, though that one has code
which could be factored out.  They didn't catch everything either, I
posted this when a 2to3'ed module of mine produced URLs with "b'bar'".

In the pre-'unicode type' Python (was that early Python 2, or should
I have said Python 1?) that was a non-issue - it Just Worked, sans
possible charset issues.

In Python 2 with unicode, the module would get it right or raise an
exception.  Which helps the programmer fix any charset issues.

In Python 3, the module does not raise an exception, it produces
"b'bar'" when it was supposed to produce "bar".

>> In 2.*, str<->unicode conversion failed or produced the equivalent
>> character/byte data.  Yes, there could be charset problems if the
>> defaults were set up wrong, but that's a smaller problem than in 3.*. In
>> 3.*, the bytes->str conversion always _silently_ produces garbage.
>
> So you say, but I don't see it. Why is this garbage?

To the user of the module, stuff with Python syntax is garbage.  It
was supposed to be text/string data.

>>>> b = b'abc\xff'
>>>> str(b)
> "b'abc\\xff'"
>
> That's what I would expect from the str() function called with a bytes 
> argument. Since decoding bytes requires a codec, which you haven't given, 
> it can only return a string representation of the bytes.
>
> If you want to decode bytes into a string, you need to specify a codec:

Except I didn't intend to decode anything - I just intended to output
the contents of the string - which was stored in a 'bytes' object.
But __str__ got called because a lot of code does that.  It wasn't
even my code which did it.

There's often no obvious place to decide when to consider a stream of
data as raw bytes and when to consider it text, and no obvious time
to convert between bytes and str.  When writing a program, one simply
has to decide.  Such as network data (bytes) vs urllib URLs (str)
in my program.  And the decision is different from what one would
decide for when to use str and when to use unicode in Python 2.

In this case I'll bugreport urlunparse to python.org, but there'll be
a _lot_ of such code around.  And without an Exception getting raised,
it'll take time to find it.  So it looks like it'll be a long time
before I dare entrust my data to Python 3, except maybe with modules
written from scratch.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: [Python-ideas] [Python-Dev] Inclusive Range

2010-10-08 Thread Hallvard B Furuseth

Jed Smith  writes:
 a = [1, 2, 3, 4, 5, 6]
 a[::-1]
> [6, 5, 4, 3, 2, 1]

Nice.  Is there a trick to get a "-0" index too?
Other than doing 'i or len(L)' instead of 'i', that is.

>>> L = [1,2,3,4,5]
>>> L[2:-2], L[2:-1], L[2:-0]  # not quite right:-)
([3], [3, 4], [])

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: harmful str(bytes)

2010-10-11 Thread Hallvard B Furuseth

Terry Reedy writes:
>On 10/8/2010 9:31 AM, Hallvard B Furuseth wrote:
>> That's not the point - the point is that for 2.* code which _uses_ str
>> vs unicode, the equivalent 3.* code uses str vs bytes.  Yet not the
>> same way - a 2.* 'str' will sometimes be 3.* bytes, sometime str.  So
>> upgraded old code will have to expect both str and bytes.
>
> If you want to interconvert code between 2.6/7 and 3.x, use unicode and
> bytes in the 2.x code. Bytes was added to 2.6/7 as a synonym for str
> explicitly and only for conversion purposes.

That's what I did, see article .
And that's exactly what broke as described, because bytes.__str__
have different meanings in 2.x and 3.x: the raw contents vs the repr.
So a library function which did %s output a different result.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: harmful str(bytes)

2010-10-11 Thread Hallvard B Furuseth

Antoine Pitrou writes:
>Hallvard B Furuseth  wrote:
>>Antoine Pitrou writes:
>>>Hallvard B Furuseth  wrote:
>>>> The offender is bytes.__str__: str(b'foo') == "b'foo'".
>>>> It's often not clear from looking at a piece of code whether
>>>> some data is treated as strings or bytes, particularly when
>>>> translating from old code.  Which means one cannot see from
>>>> context if str(s) or "%s" % s will produce garbage.
>>>
>>> This probably comes from overuse of str(s) and "%s". They can be useful
>>> to produce human-readable messages, but you shouldn't have to use them
>>> very often.
>> 
>> Maybe Python 3 has something better, but they could be hard to avoid in
>> Python 2.  And certainly our site has plenty of code using them, whether
>> we should have avoided them or not.
>
> It's difficult to answer more precisely without knowing what you're
> doing precisely.

I'd just posted an example in article :

urllib.parse.urlunparse(('', '', '/foo', b'bar', '', '')) returns
"/foo;b'bar'" instead of raising an exception or returning 2.6's correct
"/foo;bar".

> But if you already have str objects, you don't have to
> call str() or format them using "%s", so implicit __str__ calls are
> avoided.

Except it's quite normal to output strings with %s.  Above, a library
did it for me.  Maybe also to depend on the fact that str.__str__() is a
noop, so one can call str() just in case some variable needs to be
unpacked to a plain string.   urllib.parse is an example of that too.

>>> Actually, the implicit contract of __str__ is that it never fails, so
>>> that everything can be printed out (for debugging purposes, etc.).
>> 
>> Nope:
>> 
>> $ python2 -c 'str(u"\u1000")'
>> Traceback (most recent call last):
> [...]
>> $ python2 -c 'unicode("\xA0")'
>> Traceback (most recent call last):
>
> Sure, but so what?

So your statement above was wrong, which you made in response to my
suggested solution.

> This mainly shows that unicode support was broken in
> Python 2, because:

...because Python 2 was designed so there was no way to avoid poor
unicode support one way or other.  Python 3 has not fixed this, it has
just moved the problem elsewhere.

> 1) it tried to do implicit bytes<->unicode coercion by using some
> process-wide default encoding

I had completely forgotten that.  I've been lucky (with my sysadmins
maybe:-) and lived with ASCII default encoding.  Checking around I see
now Python2 site.py used my locale for the encoding, as if that had any
relevance for my data...

> 2) some unicode objects didn't have a succesful str()
>
> Python 3 fixes both these issues. Fixing 1) means there's no automatic
> coercion when trying to mix bytes and unicode.

Fine, so programs will have to do it themselves...

> (...)
> And fixing 2) means bytes object get a meaningful str() in all
> circumstances, which is much better for debug output.

Except str() on such data has a different meaning than it did before, so
equivalent programs *silently* produce different results.  Which is why
I started this thread.

> If you don't think that 2) is important, then perhaps you don't deal
> with non-ASCII data a lot. Failure to print out exception messages (or
> log entries, etc.) containing non-ASCII characters is a big annoyance
> with Python 2 for many people (including me).

I'm Norwegian.  I do deal with non-ASCII and I agree failures in error
messages are annoying.

OTOH if the same bug that previously caused an error in an error,
instead quietly munges my data, that's worse than annoying.  I've dealt
with that too, and the fix is to use another tool.  (Ironically, in one
case it meant moving from Perl to Python, and now Python has followed
Perl...)

>> In Python 2, these two UnicodeEncodeErrors made our data safe from code
>> which used str and unicode objects without checking too carefully which
>> was which.
>
> That's false, since implicit coercion can actually happen everywhere.

Right, it was true as long as my encoding was ASCII.

> And it only fails when there's non-ASCII data involved, meaning the
> unsuspecting Anglo-saxon developer doesn't understand why his/her users
> complain.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: harmful str(bytes)

2010-10-11 Thread Hallvard B Furuseth

Terry Reedy writes:
>On 10/8/2010 9:45 AM, Hallvard B Furuseth wrote:
>>> Actually, the implicit contract of __str__ is that it never fails, so
>>> that everything can be printed out (for debugging purposes, etc.).
>>
>> Nope:
>>
>> $ python2 -c 'str(u"\u1000")'
>> Traceback (most recent call last):
>>File "", line 1, in ?
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\u1000' in 
>> position 0: ordinal not in range(128)
>
> This could be considered a design bug due to 'str' being used both to
> produce readable string representations of objects (perhaps one that
> could be eval'ed) and to convert unicode objects to equivalent string
> objects. which is not the same operation!

Indeed, the eager str() and the lack of a more narrow str function is
one root of the problem.  I'd put it more more generally: Converting an
object which represents a string, to an actual str.  *And* __str__ may
be intended for Python-independent representations like 23 -> "23".

I expect that's why quite a bit of code calls str() just in case, which
is another root of the problem.  E.g.  urlencode(), as I said.  The code
might not need to, but str('string') is a noop so it doesn't hurt.
Maybe that's why %s does too, instead of demanding that the user calls
str() if needed.

> The above really should have produced '\u1000'! (the equivavlent of what
> str(bytes) does today). The 'conversion to equivalent str object' option
> should have required an explicit encoding arg rather than defaulting to
> the ascii codec. This mistake has been corrected in 3.x, so Yep.

If there were a __plain_str__() method which was supposed to fail rather
than start to babble Python syntax, and if there were not plenty of
Python code around which invoked __str__, I'd agree.

As it is, this "correction" instead is causing code which previously
produced the expected non-Python-related string output, to instead
produce Pythonesque repr() stuff.  See below.

>> And the equivalent:
>>
>> $ python2 -c 'unicode("\xA0")'
>> Traceback (most recent call last):
>>File "", line 1, in ?
>> UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0: 
>> ordinal not in range(128)
>
> This is an application bug: either bad string or missing decoding arg.

Exactly.  And Python 2 caught the bug.  (Since I had Ascii default
decoding, I'd forgotten Python could pick another default.)

For an app which handles Unicode vs. raw bytes, the equivalent Python 3
code is str(b"\xA0").  That's the *same* application bug, in equivalent
application code, and Python 3 does not catch it.  This time the bug is
spelled str() instead, which is much more likely than old unicode() to
happen somewhere thanks to the str()-related misdesign discussed above.

Article  in this thread has an example.

And that's the third root of the problem above.  Technically it's the
same problem that an application bug can do str(None) where it should be
using a string, and produce garbage text.  The difference is that Python
forces programs to deal with these two different character/octet string
types, sometimes swapping back and forth between them.  And it's not
necessarily obvious from the code which type is in use where.  Python 3
has not changed that, it has strengthened it by removing the default
conversion.

Yet while the programmer now needs to be _more_ careful about this
before, Python 3 has removed the exception which caught this particular
bug instead of doing something to make it easier to find such bugs.

That's why I suggested making bytes.__str__ fail by default, annoying
as it would be.  But I don't know how annoying it'd be.  Maybe there
could be an option to disable it.

>> In Python 2, these two UnicodeEncodeErrors made our data safe from code
>> which used str and unicode objects without checking too carefully which
>> was which.  Code which sort the types out carefully enough would fail.
>>
>> In Python 3, that safety only exists for bytes(str), not str(bytes).
>
> If you prefer the buggy 2.x design (and there are *many* tracker bug
> reports that were fixed by the 3.x change), stick with it.

Bugs even with ASCII default encoding?  Looking closer at setencoding()
in site.py, it doesn't seem to do anything, it's "if 0"ed out.

As I think I've made clear, I certainly don't feel like entrusting
Python 3 with my raw string data just yet.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: harmful str(bytes)

2010-10-11 Thread Hallvard B Furuseth

Stefan Behnel writes:
>Hallvard B Furuseth, 11.10.2010 21:50:
>> Fine, so programs will have to do it themselves...
>
> Yes, they can finally handle bytes and Unicode data correctly and
> safely. Having byte data turn into Unicode strings unexpectedly makes
> the behaviour of your code hardly predictable and fairly error prone. In
> Python 3, it's now possible to do the conversion safely at well defined
> points in your code and rely on the runtime to bark at you when
> something slips through or is mistreated. Detecting errors early makes
> your code better.
>
> That's a huge improvement. It didn't come for free and the current
> Python 3 releases still have their rough edges. But there are few left
> and the situation is constantly improving. You can help out if you want.

I quite agree with most of that - just not about it being safe, see my
reply to Terry Reedy.  Hence my suggestion to change or disable
bytes.__str__.  And yes, I'll be submitting some fixes or bug reports.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: [Python-ideas] [Python-Dev] Inclusive Range

2010-10-12 Thread Hallvard B Furuseth

Steven D'Aprano writes:
> On Fri, 08 Oct 2010 22:10:35 +0200, Hallvard B Furuseth wrote:
>> Jed Smith  writes:
>>>>>> a = [1, 2, 3, 4, 5, 6]
>>>>>> a[::-1]
>>> [6, 5, 4, 3, 2, 1]
>> 
>> Nice.  Is there a trick to get a "-0" index too? Other than doing 'i or
>> len(L)' instead of 'i', that is.
>
> What exactly are you expecting? I don't understand why you think that 
> L[-0] and L[0] would be different, when -0 == 0.

I don't think that, and I expected just what happened.
As Arnaud Delobelle had answered:  I can use 'i or None'.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: My first Python program

2010-10-12 Thread Hallvard B Furuseth

Seebs writes:
> http://github.com/wrpseudo/pseudo/blob/master/makewrappers

>self.f = file(path, 'r')
>if not self.f:
>return None

No.  Failures tend to raise exceptions, not return error codes.
Except in os.path.exists() & co.

$ python
>>> open("nonesuch")   
Traceback (most recent call last):
  File "", line 1, in 
IOError: [Errno 2] No such file or directory: 'nonesuch'
>>> 

So,
import errno
...
try:
self.f = file(path, 'r')
except IOError:
if e.errno != errno.ENOENT: raise# if you are picky
return None

Nitpicks:

> if not section in self.sections:

  if section not in self.sections:

> list = map(lambda x: x.call(), self.args)
> return ', '.join(list)

  return ', '.join([x.call() for x in self.args])

> self.type, self.name = None, None

Actually you can write self.type = self.name = None,
though assignment statements are more limited than in C.
(And I think they're assigned left-to-right.)

>  match = re.match('(.*)\(\*([a-zA-Z0-9_]*)\)\((.*)\)', text)

Make a habit of using r'' for strings with lots of backslashes,
like regexps.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: My first Python program

2010-10-12 Thread Hallvard B Furuseth

I wrote:
> except IOError:
> if e.errno != errno.ENOENT: raise# if you are picky

Argh, I meant "except IOError, e:".  That's for Python 2 but not
Python 3.  "except IOError as e:" works on Python 2.6 and above.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: My first Python program

2010-10-13 Thread Hallvard B Furuseth

Ethan Furman writes:
>Seebs wrote:
>>On 2010-10-12, Hallvard B Furuseth  wrote:
>>>> self.type, self.name = None, None
>>
>>> Actually you can write self.type = self.name = None,
>>> though assignment statements are more limited than in C.
>>> (And I think they're assigned left-to-right.)
>
> Python 2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit
> (Intel)] on win32
> Type "help", "copyright", "credits" or "license" for more information.
>
> --> a = 2
> --> b = 7
> --> c = 13
> --> a = b = c = 'right to left'
> --> a, b, c
> ('right to left', 'right to left', 'right to left')

Eek.  I just meant to remark it's quite different from C where it means
a=(b=(c='...')), the assignments even happen in left-to-right _order_.
In this Python version anyway.  Not that you'd be setting a string to a
variable:-)

   >>> class Foo(str):
   ...   def __setattr__(*args): print "%s.%s = %s" % args
   ... 
   >>> f, g = Foo("f"), Foo("g")
   >>> f.x = g.y = 3
   f.x = 3
   g.y = 3

>>>>  match = re.match('(.*)\(\*([a-zA-Z0-9_]*)\)\((.*)\)', text)
>>
>>> Make a habit of using r'' for strings with lots of backslashes,
>>> like regexps.
>>
>> Hmm.  There's an interesting question -- does this work as-is? I'm
>> assuming it must or it would have blown up on me pretty badly, so
>> presumably those backslashes are getting passed through untouched
>> already.  But if that's just coincidence (they happen not to be valid
>> \-sequences), I should definitely fix that.
>
> Unknown backslash sequences are passed through as-is.

Personally I don't want to need to remember, I'm already confusing the
backslash rules of different languges.  Often you can just ask Python
what it thinks of such things, as I did with open("nonesuch"), and
then either imitate the answer or use it to help you zoom in on it in
the doc now that you know the rough answer to look for.  So,

$ python
>>> '(.*)\(\*([a-zA-Z0-9_]*)\)\((.*)\)'
'(.*)\\(\\*([a-zA-Z0-9_]*)\\)\\((.*)\\)'

Thus I'd personally spell it that way or with r'':

r'(.*)\(\*([a-zA-Z0-9_]*)\)\((.*)\)'

Note that [a-zA-Z0-9_] is equivalent to \w or [\w] in Python 2 unless
you give a LOCALE or UNICODE flag, and in Python 3 if you give the
ASCII flag.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: My first Python program

2010-10-14 Thread Hallvard B Furuseth

Seebs writes:

>> You can't really rely on the destructor __del__ being called.
>
> Interesting.  Do I just rely on files getting closed?

Sometimes, but that's not it.  Think Lisp, not C++.  __del__ is not that
useful. Python is garbage-collected and variables have dynamic lifetime,
so the class cannot expect __del__ to be called in a timely manner.
Destructors have several issues, see __del__ in the Python reference.

A class which holds an OS resource like a file, should provide a context
manager and/or a release function, the latter usually called in a
'finally:' block.  When the caller doesn't bother with either, the class
often might as well depend on the destructor in 'file'.

Still, open().read() is common.  open().write() is not.  The C
implementation of Python is reference-counted on top of GC, so the file
is closed immediately.  But this way, exceptions from close() are lost.
Python cannot propagate them up the possibly-unrelated call chain.

Some other points:

For long strings, another option is triple-quoting as you've seen in doc
strings: print """foo
bar""".

class SourceFile(object):
def emit(self, template, func = None):
# hey, at least it's not a global variable, amirite?
self.file.write(SourceFile.copyright)
def main():
SourceFile.copyright = copyright_file.read()

emit() can use self.copyright instead of SourceFile.copyright.

I've written such code, but I suppose the proper way is to use a
classmethod to set it, so you can see in the class how the copyright
gets there.  SourceFile.() and self.() both
get called with the class as 1st argument.

class SourceFile(object):
def setup_copyright(cls, fname):
cls.copyright = open(fname).read()
setup_copyright = classmethod(setup_copyright)
# In python >= 2.4 you can instead say @classmethod above the def.
def main():
SourceFile.setup_copyright('guts/COPYRIGHT')

SourceFile.__repr__() looks like it should be a __str__().  I haven't
looked at how you use it though.  But __repr__ is supposed to
look like a Python expression to create the instance: repr([2]) = '[2]',
or a generic '': repr(id) = ''.

"How new are list comprehensions?"

Python 2.0, found as follows:
- Google python list comprehensions.
- Check the PEP (Python Enhancement Proposal) which shows up.  PEPs
  are the formal documents for info to the community, for the Python
  development process, etc.  :
Title:  List Comprehensions
Status: Final
Type:   Standards Track
Python-Version: 2.0

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: My first Python program

2010-10-14 Thread Hallvard B Furuseth

Seebs writes:

>> For long strings, another option is triple-quoting as you've seen in doc
>> strings: print """foo
>> bar""".
>
> I assume that this inserts a newline, though, and in this case I don't
> want that.

True.
$ python
>>> """foo  
... bar"""
'foo\nbar'
>>> """foo\
... bar"""
'foobar'
>>> "foo\ 
... bar"
'foobar'
>>> "foo"  "bar"
'foobar'

>> SourceFile.__repr__() looks like it should be a __str__().  I haven't
>> looked at how you use it though.  But __repr__ is supposed to
>> look like a Python expression to create the instance: repr([2]) = '[2]',
>> or a generic '': repr(id) = ''.
>
> Ahh!  I didn't realize that.  I was using repr as the "expand on it enough
> that you can see what it is" form -- more for debugging than for
> something parsable.

No big deal, then, except in the "idiomatic Python" sense.
__str__ for the informal(?) string representation of the object,
  def __repr__(self):
  return "<%s object with %s>" % (self.__class__.__name__, )
and you have a generic 2nd case, but looks like it'll be unusually long
in this case, or just define some ordinary member name like info() or
debug().

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

PEP 380 - the 'yield from' proposal

2010-10-15 Thread Hallvard B Furuseth

Regarding http://www.python.org/dev/peps/pep-0380/,
"Syntax for Delegating to a Subgenerator":

The first call can only be .next(), there's no way to provide an initial
value to .send().  That matches common use, but an initial .send() is
possible if .next() was called before "yield from".  So I suggest:

RESULT = yield from EXPR [with SEND_FIRST] # default SEND_FIRST=None

The code under Formal Semantics uses .throw and .close = None as
equivalent to absent attributes, which is not what the textual
description says.

I think the code should delete values when it no longer has use for
them, so they can be garbage-collected as quickly as possible.

So the formal semantics would be something like

i, snd, yld, throw = iter(EXPR), SEND_FIRST, None, None
res = absent = object() # Internal marker, never exposed
try:
while res is absent:
try:
yld = (i.next() if snd is None else i.send(snd))
except StopIteration as e:
res = e.value
else:
snd = absent  # 'del snd', but that could break 'finally:'
while yld is not absent:
try:
snd = yield yld
yld = absent
except:
del yld
if throw is None: # optional statement
throw = getattr(i, 'throw', absent)
if throw is absent:
getattr(i, 'close', bool)() # bool()=dummy
raise
x = sys.exc_info()
try:
yld = throw(*x)
except StopIteration as e:
if e is x[1] or isinstance(x[1], GeneratorExit):
raise
res = e.value
finally:
del x
finally:
del i, snd, throw
RESULT = res
del res

Maybe it's excessive to specify all the 'del's, but I'm thinking the
'with' statement might have destroyed or set to None the 'as '
variable today if the spec had taken care to specify deletes.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Python documentation too difficult for beginners

2010-11-03 Thread Hallvard B Furuseth

Steven D'Aprano writes:
> On Tue, 02 Nov 2010 03:42:22 -0700, jk wrote:
>> The former is difficult to find (try searching for 'open' in the search
>> box and see what you get).
>
> A fair point -- the built-in open comes up as hit #30, whereas searching 
> for open in the PHP page brings up fopen as hit #1. But the PHP search 
> also brings up many, many hits -- ten pages worth.
>
> But in any case, the Python search functionality could be smarter. If I 
> had a complaint about the docs, that would be it. Fortunately, I have 
> google :)

Actually that was one of the hair-tearing attitudes I heard a web search
guru complain about.  The smartest part of the search engine is the
people running it, so why not apply their brains directly?  Read the log
like you did, look for poor results (like "open"), put in exceptions by
hand.  This might be a fraction of the work it takes to program that
kind of smarts into the engine.  Or you might discover a group of
exceptions to put in - like all Python keywords.  That makes it at least
partially programmed, which may be preferable.

-- 
Hallvard
-- 
http://mail.python.org/mailman/listinfo/python-list

40 matches

Mail list logo