Re: Using re to find unicode ranges

Paul McGuire Mon, 29 Sep 2008 07:51:12 -0700

On Sep 29, 8:17 am, Eric Abrahamsen <[EMAIL PROTECTED]> wrote:
> Is it possible to use the re module to find runs of characters within  
> a certain Unicode range?
>
> I'm writing a Markdown extension to go over text and wrap blocks of  
> consecutive Chinese characters in <span class="char"></span> tags for  
> nice styling in an HTML page. The available hooks appear to be a pre-
> processor (which is a "for line in lines" situation) or an inline  
> pattern (which uses regular expressions). The regular expression  
> solution would be much simpler and faster, but something tells me  
> there's no way to use a regex to find character ranges... Chinese  
> characters appear to fall between 19968 and 40959 using ord(), and I  
> suppose I can go that route if necessary, but I think it would be ugly.
>
> Any hints or suggestions would be appreciated!
>
> Eric


Eric -

This sounds similar to what zhpy (http://pyparsing.wikispaces.com/
WhosUsingPyparsing#Zhpy) does to extract Chinese words from code, to
generate executable English Python.  You might give that a look.

-- Paul
--
http://mail.python.org/mailman/listinfo/python-list

Re: Using re to find unicode ranges

Reply via email to