Re: xhtml encoding question

2012-02-02 Thread Ulrich Eckhardt
Am 02.02.2012 12:02, schrieb Peter Otten: Ulrich Eckhardt wrote: >>> u'abc'.translate({u'a': u'A'}) u'abc' I would call this a chance to improve Python. According to the documentation, using a string [as key] is invalid, but it neither raises an exception nor does it do the obvious and acce

Re: xhtml encoding question

2012-02-02 Thread Peter Otten
Ulrich Eckhardt wrote: > Am 01.02.2012 10:32, schrieb Peter Otten: >> It doesn't matter for the OP (see Stefan Behnel's post), but If you want >> to replace characters in a unicode string the best way is probably the >> translate() method: >> > print u"\xa9\u2122" >> ©™ > u"\xa9\u2122".tra

Re: xhtml encoding question

2012-02-01 Thread Stefan Behnel
Tim Arnold, 01.02.2012 19:15: > On 2/1/2012 3:26 AM, Stefan Behnel wrote: >> Tim Arnold, 31.01.2012 19:09: >>> I have to follow a specification for producing xhtml files. >>> The original files are in cp1252 encoding and I must reencode them to >>> utf-8. >>> Also, I have to replace certain charact

Re: xhtml encoding question

2012-02-01 Thread Tim Arnold
On 2/1/2012 3:26 AM, Stefan Behnel wrote: Tim Arnold, 31.01.2012 19:09: I have to follow a specification for producing xhtml files. The original files are in cp1252 encoding and I must reencode them to utf-8. Also, I have to replace certain characters with html entities.

Re: xhtml encoding question

2012-02-01 Thread Ulrich Eckhardt
Am 01.02.2012 10:32, schrieb Peter Otten: It doesn't matter for the OP (see Stefan Behnel's post), but If you want to replace characters in a unicode string the best way is probably the translate() method: print u"\xa9\u2122" ©™ u"\xa9\u2122".translate({0xa9: u"©", 0x2122: u"™"}) u'©™' Ye

Re: xhtml encoding question

2012-02-01 Thread Peter Otten
Ulrich Eckhardt wrote: > Am 31.01.2012 19:09, schrieb Tim Arnold: >> high_chars = { >> 0x2014:'—', # 'EM DASH', >> 0x2013:'–', # 'EN DASH', >> 0x0160:'Š',# 'LATIN CAPITAL LETTER S WITH CARON', >> 0x201d:'”', # 'RIGHT DOUBLE QUOTATION MARK', >> 0x201c:'“', # 'LEFT DOUBLE QUOTATI

Re: xhtml encoding question

2012-02-01 Thread Ulrich Eckhardt
Am 31.01.2012 19:09, schrieb Tim Arnold: high_chars = { 0x2014:'—', # 'EM DASH', 0x2013:'–', # 'EN DASH', 0x0160:'Š',# 'LATIN CAPITAL LETTER S WITH CARON', 0x201d:'”', # 'RIGHT DOUBLE QUOTATION MARK', 0x201c:'“', # 'LEFT DOUBLE QUOTATION MARK', 0x2019:"’", # 'RIGHT SINGLE

Re: xhtml encoding question

2012-02-01 Thread Stefan Behnel
Tim Arnold, 31.01.2012 19:09: > I have to follow a specification for producing xhtml files. > The original files are in cp1252 encoding and I must reencode them to utf-8. > Also, I have to replace certain characters with html entities. > > I think I've got this right, but I'd like to hear if there

xhtml encoding question

2012-01-31 Thread Tim Arnold
I have to follow a specification for producing xhtml files. The original files are in cp1252 encoding and I must reencode them to utf-8. Also, I have to replace certain characters with html entities. I think I've got this right, but I'd like to hear if there's something I'm doing that is dangero