> On Behalf Of Jason Evans
> Parsers typically deal with tokens rather than individual 
> characters, so the scanner that creates the tokens is the 
> main thing that Unicode matters to.  I have written 
> Unicode-aware scanners for use with Parsing-based parsers, 
> with no problems.  This is pretty easy to do, since Python 
> has built-in support for Unicode strings.

The only caveat being that since Chinese and Japanese scripts don't
typically delimit "words" with spaces, I think you'd have to pass the text
through a tokenizer (like ChaSen for Japanese) before using PyParsing.

Regards,
Ryan Ginstrom

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to