"Shiao" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
Hello,
I'm trying to build a regex in python to identify punctuation
characters in all the languages. Some regex implementations support an
extended syntax \p{P} that does just that. As far as I know, python re
doesn't. Any idea of a possible alternative?
Apart from manually including the punctuation character range for each
and every language, I don't see how this can be done.
Thank in advance for any suggestions.
John
You can always build your own pattern. Something like (Python 3.0rc2):
import unicodedata
Po=''.join(chr(x) for x in range(65536) if unicodedata.category(chr(x)) ==
'Po')
import re
r=re.compile('['+Po+']')
x='我是美國人。'
x
'我是美國人。'
r.findall(x)
['。']
-Mark
--
http://mail.python.org/mailman/listinfo/python-list