On Sep 29, 8:17 am, Eric Abrahamsen <[EMAIL PROTECTED]> wrote: > Is it possible to use the re module to find runs of characters within > a certain Unicode range? > > I'm writing a Markdown extension to go over text and wrap blocks of > consecutive Chinese characters in <span class="char"></span> tags for > nice styling in an HTML page. The available hooks appear to be a pre- > processor (which is a "for line in lines" situation) or an inline > pattern (which uses regular expressions). The regular expression > solution would be much simpler and faster, but something tells me > there's no way to use a regex to find character ranges... Chinese > characters appear to fall between 19968 and 40959 using ord(), and I > suppose I can go that route if necessary, but I think it would be ugly. > > Any hints or suggestions would be appreciated! > > Eric
Eric - This sounds similar to what zhpy (http://pyparsing.wikispaces.com/ WhosUsingPyparsing#Zhpy) does to extract Chinese words from code, to generate executable English Python. You might give that a look. -- Paul -- http://mail.python.org/mailman/listinfo/python-list