[issue14068] problem with re split

2012-02-21 Thread Ezio Melotti
Ezio Melotti added the comment: As long as you don't mix str and unicode everything works. With strings: >>> s = '与清新。阿德莱' >>> re.split('。', s) ['\xe4\xb8\x8e\xe6\xb8\x85\xe6\x96\xb0', '\xe9\x98\xbf\xe5\xbe\xb7\xe8\x8e\xb1'] >>> s.split('。') ['\xe4\xb8\x8e\xe6\xb8\x85\xe6\x96\xb0', '\xe9\x98\xb

[issue14068] problem with re split

2012-02-21 Thread harvey yang
harvey yang added the comment: i see. thanks :) -- ___ Python tracker ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail

[issue14068] problem with re split

2012-02-21 Thread Ramchandra Apte
Ramchandra Apte added the comment: The problem is not in re, it is because you are passing '。' to re.split which in Python 2.x is actually passed as '\xe3\x80\x82'. You should pass u'。' to re.compile. Could we raise a SyntaxError when in a progam a unicode character is in a bytes string? Pytho

[issue14068] problem with re split

2012-02-21 Thread Amaury Forgeot d'Arc
Amaury Forgeot d'Arc added the comment: When you used str split, you had to use a unicode string: content.split(u'。') It's the same with re.split: when I use pattern = re.compile('。') nothing is split, it works much better with: pattern = re.compile(u'。') -- nosy: +amaury.forgeotd

[issue14068] problem with re split

2012-02-21 Thread harvey yang
harvey yang added the comment: i am not use it to split whitespace or newline. i use it to split Chinese full stop. and the result is showed at the earlier message. -- ___ Python tracker _

[issue14068] problem with re split

2012-02-21 Thread Ramchandra Apte
Ramchandra Apte added the comment: 启朗.杨, are you using text.split() in the second case? text.split() splits on any whitespace character including newlines while re.split(text," ") splits on a space. -- nosy: +ramchandra.apte ___ Python tracker

[issue14068] problem with re split

2012-02-20 Thread 启朗 杨
启朗 杨 added the comment: sure,here is an simple string from the news1.xml -- Added file: http://bugs.python.org/file24589/news1.xml ___ Python tracker ___ ___

[issue14068] problem with re split

2012-02-20 Thread Ezio Melotti
Ezio Melotti added the comment: Can you paste (or upload) a minimal working example (with a short sample string) that uses re.split and str.split and shows how re.split is failing? -- ___ Python tracker _

[issue14068] problem with re split

2012-02-20 Thread 启朗 杨
启朗 杨 added the comment: i use python to handle some string. here is my code: # -*- coding: utf-8 -*- from lxml import etree import collectcorpus import re doc = etree.parse("news1.xml") root = doc.getroot() children = root.getchildren() flag = 1 for child in children: if flag == len(chi

[issue14068] problem with re split

2012-02-20 Thread 启朗 杨
New submission from 启朗 杨 : i use python to handle some string. here is my code: # -*- coding: utf-8 -*- from lxml import etree import collectcorpus import re doc = etree.parse("/home/harveyang/workspace/corpus/newsscrapy/news1.xml") root = doc.getroot() children = root.getchildren() flag = 1