I tested his code in a file test2.py: # -*- coding: UTF-8 -*-
from pyparsing import Word text = "Καλημέρα, κόσμε!".decode('utf-8') alphas = u''.join(unichr(x) for x in xrange(0x386, 0x3ce)) greet = Word(alphas) + u',' + Word(alphas) + u'!' greeting = greet.parseString(text) print greeting After run this, I got the following result: [u'\u039a\u03b1\u03bb\u03b7\u03bc\u03ad\u03c1\u03b1', u',', u'\u03ba\u03cc\u03c3\u03bc\u03b5', u'!'] I use windows xp sp2 simple Chinese, python 2.41,my code is as below: # -*- coding: UTF-8 -*- from pyparsing import CharsNotIn text = u"简体中文测试, 繁體中文測試!" greet = CharsNotIn(u',!') + u',' + CharsNotIn(u',!') + u'!' greeting = greet.parseString(text) for x in greeting: print x.encode("cp936") #or x.encode("gbk") And the result is as below: 简体中文测试 , 繁體中文測試 ! Everything works just correctly. On 8/4/05, saddle <[EMAIL PROTECTED]> wrote: > the code what posted by Rober Kern > > from pyparsing import Word > text = "Καλημ?ρα, κ?σμε!".decode('utf-8') > alphas = u''.join(unichr(x) for x in xrange(0x386, 0x3ce)) > greet = Word(alphas) + u',' + Word(alphas) + u'!' > greeting = greet.parseString(text) > print greeting > > > my system default is cp936, Simp Chinese. > > On Thu, 4 Aug 2005 17:33:16 +0800 > could ildg <[EMAIL PROTECTED]> д��: > > could.net> So what's you code? > could.net> and what's you system default encoding? > could.net> > could.net> On 8/4/05, saddle <[EMAIL PROTECTED]> wrote: > could.net> > hello, but i can't run the script. could u told me what's the > trick pls? > could.net> > here is the error output. > could.net> > > could.net> > D:\python\test>pyp > could.net> > sys:1: DeprecationWarning: Non-ASCII character '\xce' in file > D:\python\test\py > could.net> > .py on line 3, but no encoding declared; see > http://www.python.org/peps/pep-026 > could.net> > .html for details > could.net> > Traceback (most recent call last): > could.net> > File "D:\python\test\pyp.py", line 9, in ? > could.net> > greeting = greet.parseString(text) > could.net> > File "C:\Python24\Lib\site-packages\pyparsing.py", line 616, > in parseString > could.net> > loc, tokens = self.parse( instring.expandtabs(), 0 ) > could.net> > File "C:\Python24\Lib\site-packages\pyparsing.py", line 558, > in parse > could.net> > loc,tokens = self.parseImpl( instring, loc, doActions ) > could.net> > File "C:\Python24\Lib\site-packages\pyparsing.py", line 1387, > in parseImpl > could.net> > loc, exprtokens = e.parse( instring, loc, doActions ) > could.net> > File "C:\Python24\Lib\site-packages\pyparsing.py", line 562, > in parse > could.net> > loc,tokens = self.parseImpl( instring, loc, doActions ) > could.net> > File "C:\Python24\Lib\site-packages\pyparsing.py", line 873, > in parseImpl > could.net> > raise exc > could.net> > pyparsing.ParseException: Expected "," (at char 5), (line:1, > col:6) > could.net> > On Thu, 4 Aug 2005 17:24:23 +0800 > could.net> > could ildg <[EMAIL PROTECTED]> д��: > could.net> > > could.net> > could.net> OK, I make it. > could.net> > could.net> It's right, it can work fine with unicode. > could.net> > could.net> pyparsing is great. > could.net> > could.net> Thanks. > could.net> > could.net> > could.net> > could.net> On 8/4/05, could ildg <[EMAIL PROTECTED]> wrote: > could.net> > could.net> > I want to parse some Chinese words. > could.net> > could.net> > It seems that pyparsing doesn't work for me. > could.net> > could.net> > Thank you. > could.net> > could.net> > I have to use re directly, although it's harder, > but it'll always work. > could.net> > could.net> > > could.net> > could.net> > On 8/4/05, Robert Kern <[EMAIL PROTECTED]> wrote: > could.net> > could.net> > > could ildg wrote: > could.net> > could.net> > > > pyparsing is very convenient to use. But I want > to find some a py tool > could.net> > could.net> > > > to parse non-English strings. Does pyparsing > support UNICODE strings? > could.net> > could.net> > > > If not, can someone tell me what py tool can do > it? Thanks in advance. > could.net> > could.net> > > > could.net> > could.net> > > Try it! > could.net> > could.net> > > > could.net> > could.net> > > # vim:fileencoding=utf-8 > could.net> > could.net> > > > could.net> > could.net> > > from pyparsing import Word > could.net> > could.net> > > > could.net> > could.net> > > text = "��������, �����!".decode('utf-8') > could.net> > could.net> > > alphas = u''.join(unichr(x) for x in > xrange(0x386, 0x3ce)) > could.net> > could.net> > > > could.net> > could.net> > > greet = Word(alphas) + u',' + Word(alphas) + u'!' > could.net> > could.net> > > greeting = greet.parseString(text) > could.net> > could.net> > > print greeting > could.net> > could.net> > > > could.net> > could.net> > > -- > could.net> > could.net> > > Robert Kern > could.net> > could.net> > > [EMAIL PROTECTED] > could.net> > could.net> > > > could.net> > could.net> > > "In the fields of hell where the grass grows high > could.net> > could.net> > > Are the graves of dreams allowed to die." > could.net> > could.net> > > -- Richard Harter > could.net> > could.net> > > > could.net> > could.net> > > -- > could.net> > could.net> > > > http://mail.python.org/mailman/listinfo/python-list > could.net> > could.net> > > could.net> > could.net> -- > could.net> > could.net> http://mail.python.org/mailman/listinfo/python-list > could.net> > > could.net> > > could.net> > > could.net> -- > could.net> http://mail.python.org/mailman/listinfo/python-list > > > -- http://mail.python.org/mailman/listinfo/python-list