[web2py] Re: Something is wrong when IS_IN_SET contains Chinease characters

hywang Mon, 29 Mar 2010 20:13:05 -0700

thanks for your kindly help.


On 3月30日, 上午1时47分, Yarko Tymciurak <resultsinsoftw...@gmail.com>
wrote:
> On Mar 29, 10:54 am, Yarko Tymciurak <resultsinsoftw...@gmail.com>
> wrote:
>
> > anyway,  I am sure this is about encoding to unicode - someone who has
> > done this will hopefully add comments.
>
> For example, looking 
> athttp://docs.python.org/library/codecs.html#standard-encodings
>
> and searching for Chinese, from your (pasted) example, I found two
> decodings that result in unicode results (that is, the codecs
> recognize):
>
> In [37]: value=r"老李"
> In [38]: value
> Out[38]: '\xe8\x80\x81\xe6\x9d\x8e'
> In [39]: value.decode('gbk')
> Out[39]: u'\u9470\u4f79\u6f55'
> In [40]: value.decode('gb18030')
> Out[40]: u'\u9470\u4f79\u6f55'
>
> IMPORTANT:  both of these results show a unicode result  (i.e.
> u'xxxx')
>
> I am not sure if you need to set LOCALE for your environment / browser
> so that the regular expression to work as it is (but, with this
> encoding, it correctly produces the unicode match when callred with
> re.UNICODE flag - but this is without locale set  (off the top of my
> head, I am not sure of the proper way to setlocale within an
> interpreter, to test this...)
>
> In [44]: val=value.decode('gbk')
> In [45]: re.compile(r"[\w\-:]+",re.U).findall(val)
> Out[45]: [u'\u9470\u4f79\u6f55']
>
> I hope this helps begin to show the beginning of the way:   All your
> strings in your app need to be converted to unicode (one way or
> another), and your locale set (normally provided from the browser, in
> the request).
>
> - Yarko
>
>
>
> > On Mar 29, 10:04 am, Yarko Tymciurak <resultsinsoftw...@gmail.com>
> > wrote:
>
> > > On Mar 29, 8:33 am, hywang <why00...@163.com> wrote:
>
> > > > -------model file is like this ---------------------
> > > > db.define_table('options_contain_chinease',
> > > >     Field('student_name', requires = IS_IN_SET(["Jim","小长","老李"],
> > > > multiple=True)),
>
> > > Using this last string from your IS_IN_SET example (I hope my copy/
> > > paste did this correctly into iPython!):
>
> > > In [31]: value=r"老李"
> > > In [32]: value
> > > Out[32]: '\xe8\x80\x81\xe6\x9d\x8e'
> > > In [33]: str(value)
> > > Out[33]: '\xe8\x80\x81\xe6\x9d\x8e'
> > > In [34]: re.compile(r"[\w\-:]+").findall(value)
> > > Out[34]: []
> > > In [35]: re.compile(r"[\w\-:]+").findall(value, re.U)
> > > Out[35]: []
> > > In [36]: re.compile(r"[\w\-:]+",re.U).findall(value)
> > > Out[36]: ['\xe8', '\xe6']
> > > In [37]: re.compile(r"[\w\-:]+",re.U).findall(value,re.U)
> > > Out[37]: []
>
> > > --->
>
> > > So it would seem you may need to setup something with LOCALE;  I have
> > > played around with this for just a little bit, but am not sure what it
> > > takes (zh-CN?  zh-cn?  zh_CN.gb2312?   etc.)
>
> > > Maybe others can add to this...
>
> > > Regards,
> > > - Yarko
>
> > > > )
> > > > db.options_contain_chinease.student_name.widget =
> > > > CheckboxesWidget.widget
>
> > > > ------controller file is like this ---------------------
> > > > def options_contain_chinease():
> > > >     form = SQLFORM(db.options_contain_chinease)
> > > >     if form.accepts(request.vars, session):
> > > >         pass
> > > >     return dict(form=form)
>
> > > > if checked one item and submit, everything is ok, however, when
> > > > checked more than one items and submit the form, an error will occur .
> > > > Is it a bug ?
>
> > > > thanks !

-- 
You received this message because you are subscribed to the Google Groups 
"web2py-users" group.
To post to this group, send email to web...@googlegroups.com.
To unsubscribe from this group, send email to 
web2py+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/web2py?hl=en.

[web2py] Re: Something is wrong when IS_IN_SET contains Chinease characters

Reply via email to