richard wrote:
> Leon wrote:
> > example:
> >         s = ' ' --->  
> 
> That's technically not HTML encoding, that's replacing a perfectly valid
> space character with a *non-breaking* space character.

How can you tell?

s = 'Â' # non-breaking space
s = ' ' # normal space
s = 'á' # em-space

But you might want to do something like:

def escapechar(s):
    import htmlentitydefs
    n = ord(s)
    if n < 128:
        return s.encode('ascii')
    elif n in htmlentitydefs.codepoint2name:
        return '&%s;' % htmlentitydefs.codepoint2name[n]
    else:
        return '&#%d;' % ord(s)

This requires unicode strings, because unicode encodings have multi-byte
characters. Demonstration:

>>> f(u'Ã')
'&ograve;'
>>> f(u'Å')
'&#351;'
>>> f(u's')
's'

yours,
Gerrit Holl.

-- 
Weather in Lulea / Kallax, Sweden 13/12 10:20:
        -15.0ÂC   wind 0.9 m/s NNW (34 m above NAP)
-- 
In the councils of government, we must guard against the acquisition of
unwarranted influence, whether sought or unsought, by the
military-industrial complex. The potential for the disastrous rise of
misplaced power exists and will persist.
    -Dwight David Eisenhower, January 17, 1961
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to