Re: [Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

2009-04-24 Thread Lino Mastrodomenico
2009/4/22 "Martin v. Löwis" :
> To convert non-decodable bytes, a new error handler "python-escape" is
> introduced, which decodes non-decodable bytes using into a private-use
> character U+F01xx, which is believed to not conflict with private-use
> characters that currently exist in Python codecs.

Why not use U+DCxx for non-UTF-8 encodings too?

Overall I like the PEP: I think it's the best proposal so far that
doesn't put an heavy burden on applications that only want to do
simple things with the API.

-- 
Lino Mastrodomenico
--
http://mail.python.org/mailman/listinfo/python-list


Re: Code that ought to run fast, but can't due to Python limitations.

2009-07-05 Thread Lino Mastrodomenico
2009/7/5 Hendrik van Rooyen :
> I cannot see how you could avoid a python function call - even if he
> bites the bullet and implements my laborious scheme, he would still
> have to fetch the next character to test against, inside the current state.
>
> So if it is the function calls that is slowing him down, I cannot
> imagine a solution using less than one per character, in which
> case he is screwed no matter what he does.

A simple solution may be to read the whole input HTML file in a
string. This potentially requires lots of memory but I suspect that
the use case by far most common for this parser is to build a DOM (or
DOM-like) tree of the whole document. This tree usually requires much
more memory that the HTML source itself.

So, if the code duplication is acceptable, I suggest keeping this
implementation for cases where the input is extremely big *AND* the
whole program will work on it in "streaming", not just the parser
itself.

Then write a simpler and faster parser for the more common case when
the data is not huge *OR* the user will keep the whole document in
memory anyway (e.g. on a tree).

Also: profile, profile a lot. HTML pages are very strange beasts and
the bottlenecks may be in innocent-looking places!

-- 
Lino Mastrodomenico
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: memoization module?

2009-07-05 Thread Lino Mastrodomenico
2009/7/5 kj :
> Is there a memoization module for Python?  I'm looking for something
> like Mark Jason Dominus' handy Memoize module for Perl.

Check out the "memoized" class example here:

  <http://wiki.python.org/moin/PythonDecoratorLibrary#Memoize>

-- 
Lino Mastrodomenico
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: missing 'xor' Boolean operator

2009-07-16 Thread Lino Mastrodomenico
2009/7/16 Hendrik van Rooyen :
> "Hrvoje Niksic"  wrote:
>
>
>> Note that in Python A or B is in fact not equivalent to not(not A and
>> not B).
>
> De Morgan would turn in his grave.

If this can make him happier, in Python (not (not a and not b)) *is*
equivalent to bool(a or b). (Modulo crazy things like redefining
"bool" or having a __bool__ with side effects.)

In the first expression you implicitly request a bool because you use
"not", in the second one you do this with an explicit "bool".

-- 
Lino Mastrodomenico
-- 
http://mail.python.org/mailman/listinfo/python-list