> On Behalf Of Jason Evans > Parsers typically deal with tokens rather than individual > characters, so the scanner that creates the tokens is the > main thing that Unicode matters to. I have written > Unicode-aware scanners for use with Parsing-based parsers, > with no problems. This is pretty easy to do, since Python > has built-in support for Unicode strings.
The only caveat being that since Chinese and Japanese scripts don't typically delimit "words" with spaces, I think you'd have to pass the text through a tokenizer (like ChaSen for Japanese) before using PyParsing. Regards, Ryan Ginstrom -- http://mail.python.org/mailman/listinfo/python-list