On Wed, Oct 10, 2012 at 12:52 AM, 郎咸武 <langxian...@gmail.com> wrote: > > > *2)To put a Object to <<"user1">> bucket. The data is utf8 format.* > > (trends@jason-lxw)123> f(O), O=riakc_obj:new(<<"user1">>, > <<"jason5">>,list_to_binary(mochijson:encode({struct, [{name, > binary_to_list(unicode:characters_to_binary("爱"))},{sex,"male"}]})), > "application/json"). > {riakc_obj,<<"user1">>,<<"jason5">>,undefined,[], > {dict,1,16,16,8,80,48, > {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...}, > > {{[],[],[],[],[],[],[],[],[],[],[[<<...>>|...]],[],[],...}}}, > <<"{\"name\":\"\\u00e7\\u0088\\u00b1\",\"sex\":\"male\"}">>} > (((trends@jason-lxw)124> riakc_pb_socket:put(Pid, O). > > ok > > First, let's start with your data and make sure it's getting stored properly.
3> UC = unicode:characters_to_binary("爱"). <<231,136,177>> Okay, so Erlang properly decoded this into a 3-byte unicode sequence. What does mochijson2 think? (I noticed you are using mochison, I recommend using mochijson2). 4> mochijson2:encode({struct, [{name, UC}]}). [123,[34,"name",34],58,[34,"\\u7231",34],125] Good, mochijson2 properly interpreted this as u7231. A quick lookup on the web verifies this is correct: http://www.fileformat.info/info/unicode/char/7231/index.htm. But notice in your code you call binary_to_list on the binary before passing it to mochi. Lets see what happened. 15> binary_to_list(UC). [231,136,177] Okay, so the integers are correct. But Erlang treats lists differently from binaries. It's just a list of integers to Erlang. 16> io:format("~ts~n",[binary_to_list(UC)]). ç± ok This is why mochi converted it to 3 chatacters: \\u00e7\\u0088\\u00b1 To make a proper unicode list the unicode:caracters_to_list function must be used. 17> UCS = unicode:characters_to_list("爱"). [29233] 18> io:format("~ts~n", [UCS]). 爱 ok Let's try encoding again, but this time leave out the list_to_binary. 19> riakc_obj:new(<<"user1">>, <<"jason5">>, mochijson2:encode({struct, [{name, unicode:characters_to_binary("爱")}]}), "application/json"). {riakc_obj,<<"user1">>,<<"jason5">>,undefined,[], {dict,1,16,16,8,80,48, {[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],...}, {{[],[],[],[],[],[],[],[],[],[],[[<<...>>|...]],[],[],...}}}, [123,[34,"name",34],58,[34,"\\u7231",34],125]} And there we go. A properly encoded unicode character. -Z
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com