Working on extension of genericwiki.py plugin for PyBlosxom and I have problems with UTF-8 and RE. When I have this wiki line, it does break URL too early:
[http://en.wikipedia.org/wiki/Petr_Chelcický Petr Chelcický's] work(s) into English. and creates [<a href="http://en.wikipedia.org/wiki/Petr_Chel";>http://en.wikipedia.org/wiki/Petr_Chel</a>cický Petr Chelcický's] The RE genericwiki uses for parsing this: # WikiName pattern used in your wiki wikinamepattern = r'\b(([A-Z]\w+){2,})\b' # original mailurlpattern = r'mailto\:[\"[EMAIL PROTECTED]' newsurlpattern = r'news\:(?:\w+\.){1,}\w+' fileurlpattern = r'(?:http|https|file|ftp):[/-_.\w-]+[\/\w][?&+=%\w/-_.#]*' [...] # Turn '[xxx:address label]' into labeled link body = re.sub(r'\[(' + fileurlpattern + '|' + mailurlpattern + '|' + newsurlpattern + ')\s+(.+?)\]', r'<a href="\1">\2</a>', body,re.U) I have tried to test RE and UTF-8 in Python generally and the results are even more confusing (done with locale cs_CZ.UTF-8 in konsole): >> locale.getpreferredencoding() 'UTF-8' >>> print re.sub("(\w*)","X","[Chelcický]",re.L) X[X?Xý] >>> print re.sub("(\w*)","X","[Chelcický]",re.UNICODE) X[X?X?X]X >>> I would expect that both print commands should give just plain X, but apparently Python doesn't undestand that. What's the problem? Thanks for any reply, Matej -- http://mail.python.org/mailman/listinfo/python-list