Using UUID as keys is problematic for Riak Search
I'm using UUIDs for keys in Riak -- converted to bytes, not UTF-8 strings. (I'd rather spend 16 bytes for each key, not 36.) As I understand it, Yokozuna maps the Riak key to _yz_id. Here is the suggested schema from the documentation: Would you expect this to work with Riak Search? I would hope so. (Or must keys be UTF-8 strings?) I get this error, which does not surprise me, given that the _yz_id is defined as a string: ==> log/error.log <== 2014-08-10 18:24:16.221 [error] <0.610.0>@yz_kv:index:206 failed to index object {<<"test-0001">>,<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>} with error {ucs,{bad_utf8_character_code}} because [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,185}]},{mochijson2,json_encode_string,2,[{file,"src/mochijson2.erl"},{line,186}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]}] I don't think changing the schema.xml type for _yz_id to "solr.UUIDField" is a good idea. What can I do? Thanks, David ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Using UUID as keys is problematic for Riak Search
You're correct that yokozuna only supports utf8, because the Solr interface only supports utf8 (note that the failure happens when attempting to build a non-utf8 JSON add document command). There's not much we can do here at the moment, since we've yet to (if ever) support a custom interface to Solr that accepts arbitrary binary values. In the mean time, to use yokozuna, you'll have to encode your keys to utf8. Eric Redmond, Engineer @ Basho On Sun, Aug 10, 2014 at 4:01 PM, David Jameswrote:I'm using UUIDs for keys in Riak -- converted to bytes, not UTF-8 strings. (I'd rather spend 16 bytes for each key, not 36.)As I understand it, Yokozuna maps the Riak key to _yz_id. Here is the suggested schema from the documentation: Would you expect this to work with Riak Search? I would hope so.(Or must keys be UTF-8 strings?)I get this error, which does not surprise me, given that the _yz_id is defined as a string: ==> log/error.log <== 2014-08-10 18:24:16.221 [error] <0.610.0>@yz_kv:index:206 failed to index object {<<"test-0001">>,<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>} with error {ucs,{bad_utf8_character_code}} because [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,185}]},{mochijson2,json_encode_string,2,[{file,"src/mochijson2.erl"},{line,186}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]}] I don't think changing the schema.xml type for _yz_id to "solr.UUIDField" is a good idea.What can I do?Thanks,David ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Using UUID as keys is problematic for Riak Search
I like UUIDs for everything as well, although I expected compatibility issues with something. Base 64 encoding the binary value is a nice compromise for me, and takes 22 characters (if you drop the padding) instead of the usual 36 for the hyphenated hex format.It would still require re encoding all the keys, but it's a partial solutions.From: Eric RedmondSent: Monday, 11 August 2014 9:15 AMTo: David JamesCc: riak-usersSubject: Re: Using UUID as keys is problematic for Riak SearchYou're correct that yokozuna only supports utf8, because the Solr interface only supports utf8 (note that the failure happens when attempting to build a non-utf8 JSON add document command). There's not much we can do here at the moment, since we've yet to (if ever) support a custom interface to Solr that accepts arbitrary binary values. In the mean time, to use yokozuna, you'll have to encode your keys to utf8. Eric Redmond, Engineer @ Basho On Sun, Aug 10, 2014 at 4:01 PM, David Jameswrote:I'm using UUIDs for keys in Riak -- converted to bytes, not UTF-8 strings. (I'd rather spend 16 bytes for each key, not 36.)As I understand it, Yokozuna maps the Riak key to _yz_id. Here is the suggested schema from the documentation: Would you expect this to work with Riak Search? I would hope so.(Or must keys be UTF-8 strings?)I get this error, which does not surprise me, given that the _yz_id is defined as a string: ==> log/error.log <== 2014-08-10 18:24:16.221 [error] <0.610.0>@yz_kv:index:206 failed to index object {<<"test-0001">>,<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>} with error {ucs,{bad_utf8_character_code}} because [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,185}]},{mochijson2,json_encode_string,2,[{file,"src/mochijson2.erl"},{line,186}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]}] I don't think changing the schema.xml type for _yz_id to "solr.UUIDField" is a good idea.What can I do?Thanks,David ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Riak solr query range issue
Hi, I have a simple solr document like this: Oslo Bob 34 As "id" i use an email addr added to a 'users' index. I've defined age as Int field. I can now query the index with: # search-cmd search users "name:Bob" :: Searching for 'name:Bob' / '' in users... -- index/id: users/f...@bar.org p -> [0] score -> 0.35355339059327373 -- :: Found 1 results. If i do search-cmd search users "age:34" It works. If i try the range search-cmd search users "age:[20 TO 40]" It just gives me the Usage of "search-cmd" again without saying anything like if the syntax is wrong. Uing the HTTP solr interface basically is the same i get no results.. curl "http://localhost:8098/solr/users/select?q=age:%7B20 TO 50%7D" curl: (52) Empty reply from server curl 'http://localhost:8098/solr/users/select?q=age:{20 TO 50}' curl: (52) Empty reply from server What i am doing wrong? please note i have other data in the index My riak version is 1.4.10 Thanks :tele ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Using UUID as keys is problematic for Riak Search
Thanks for the quick responses. Eric: I don't understand. Why does Solr have the UUIDField ( http://lucene.apache.org/solr/4_7_0/solr-core/org/apache/solr/schema/UUIDField.html) if it were not indexable? What is the nature of the limitation? Jason: Thanks, I will consider Base 64 encoding. On Sun, Aug 10, 2014 at 7:19 PM, Jason Campbell wrote: > I like UUIDs for everything as well, although I expected compatibility > issues with something. Base 64 encoding the binary value is a nice > compromise for me, and takes 22 characters (if you drop the padding) > instead of the usual 36 for the hyphenated hex format. > > It would still require re encoding all the keys, but it's a partial > solutions. > >*From: *Eric Redmond > *Sent: *Monday, 11 August 2014 9:15 AM > *To: *David James > *Cc: *riak-users > *Subject: *Re: Using UUID as keys is problematic for Riak Search > > You're correct that yokozuna only supports utf8, because the Solr > interface only supports utf8 (note that the failure happens when attempting > to build a non-utf8 JSON add document command). There's not much we can do > here at the moment, since we've yet to (if ever) support a custom interface > to Solr that accepts arbitrary binary values. In the mean time, to use > yokozuna, you'll have to encode your keys to utf8. > > Eric Redmond, Engineer @ Basho > > On Sun, Aug 10, 2014 at 4:01 PM, David James > wrote: > > I'm using UUIDs for keys in Riak -- converted to bytes, not UTF-8 strings. > (I'd rather spend 16 bytes for each key, not 36.) > > As I understand it, Yokozuna maps the Riak key to _yz_id. > > Here is the suggested schema from the documentation: > > > multiValued="false" required="true"/> > > > Would you expect this to work with Riak Search? I would hope so. > > (Or must keys be UTF-8 strings?) > > I get this error, which does not surprise me, given that the _yz_id is > defined as a string: > > ==> log/error.log <== > > 2014-08-10 18:24:16.221 [error] <0.610.0>@yz_kv:index:206 failed to index > object > {<<"test-0001">>,<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>} > with error {ucs,{bad_utf8_character_code}} because > [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,185}]},{mochijson2,json_encode_string,2,[{file,"src/mochijson2.erl"},{line,186}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]}] > I don't think changing the schema.xml type for _yz_id to "solr.UUIDField" > is a good idea. > > What can I do? > > Thanks, > David > > > > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Using UUID as keys is problematic for Riak Search
You're expected to base 64 encode it. UUID is simply the kind of value it expects, like a date or an integer. Eric Redmond, Engineer @ Basho On Sun, Aug 10, 2014 at 5:03 PM, David Jameswrote:Thanks for the quick responses.Eric: I don't understand. Why does Solr have the UUIDField (http://lucene.apache.org/solr/4_7_0/solr-core/org/apache/solr/schema/UUIDField.html) if it were not indexable? What is the nature of the limitation? Jason: Thanks, I will consider Base 64 encoding.On Sun, Aug 10, 2014 at 7:19 PM, Jason Campbell wrote: I like UUIDs for everything as well, although I expected compatibility issues with something. Base 64 encoding the binary value is a nice compromise for me, and takes 22 characters (if you drop the padding) instead of the usual 36 for the hyphenated hex format. It would still require re encoding all the keys, but it's a partial solutions. From: Eric RedmondSent: Monday, 11 August 2014 9:15 AMTo: David JamesCc: riak-usersSubject: Re: Using UUID as keys is problematic for Riak Search You're correct that yokozuna only supports utf8, because the Solr interface only supports utf8 (note that the failure happens when attempting to build a non-utf8 JSON add document command). There's not much we can do here at the moment, since we've yet to (if ever) support a custom interface to Solr that accepts arbitrary binary values. In the mean time, to use yokozuna, you'll have to encode your keys to utf8. Eric Redmond, Engineer @ Basho On Sun, Aug 10, 2014 at 4:01 PM, David James wrote: I'm using UUIDs for keys in Riak -- converted to bytes, not UTF-8 strings. (I'd rather spend 16 bytes for each key, not 36.)As I understand it, Yokozuna maps the Riak key to _yz_id. Here is the suggested schema from the documentation: Would you expect this to work with Riak Search? I would hope so.(Or must keys be UTF-8 strings?)I get this error, which does not surprise me, given that the _yz_id is defined as a string: ==> log/error.log <== 2014-08-10 18:24:16.221 [error] <0.610.0>@yz_kv:index:206 failed to index object {<<"test-0001">>,<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>} with error {ucs,{bad_utf8_character_code}} because [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,185}]},{mochijson2,json_encode_string,2,[{file,"src/mochijson2.erl"},{line,186}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]}] I don't think changing the schema.xml type for _yz_id to "solr.UUIDField" is a good idea.What can I do?Thanks,David ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Using UUID as keys is problematic for Riak Search
I'm at my laptop now so I can talk a bit more about it. Don't conflate the value type with the encodings. UUID is a field type, just like how dates or integers are field types. They explain to the Solr indexer how to reason about the value it gets. The field type string "20140810" is encoded differently than the integer value 20140810 or Date "20140810". This is important for the queries you can build, as a date range query is different than an integer or string range. That said, in Solr, usually UUID is generated on the backend, such as with UUIDUpdateProcessorFactory. Even so, you can no more send a binary UUID than you can a binary date value. There are two encodings you have to think about when dealing with Solr. Anything that's binary needs to be converted to a String that Solr can understand. Base64 is how you convert a binary value to a string value. So in the case of your key (in Erlang): 1> base64:encode(<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>). <<"Xo8hIy20TqSX7UhROA0c+g==">> base64 encoding libs exist in any language. Once you have this key string in base64, internally, Yokozuna will assume that string is valid UTF8. I was probably a bit hasty when I said "yokozuna only supports UTF8 . What I should have said is that "yokozuna assumes types/buckets/keys are UTF8 and encodes values appropriately." So in summation: UUID: Solr field type Base64: Encode binary values to a string UTF8: The assumed string encoding Does that help? Eric On Aug 10, 2014, at 5:03 PM, David James wrote: > Thanks for the quick responses. > > Eric: I don't understand. Why does Solr have the UUIDField > (http://lucene.apache.org/solr/4_7_0/solr-core/org/apache/solr/schema/UUIDField.html) > if it were not indexable? What is the nature of the limitation? > > Jason: Thanks, I will consider Base 64 encoding. > > > On Sun, Aug 10, 2014 at 7:19 PM, Jason Campbell wrote: > I like UUIDs for everything as well, although I expected compatibility issues > with something. Base 64 encoding the binary value is a nice compromise for > me, and takes 22 characters (if you drop the padding) instead of the usual 36 > for the hyphenated hex format. > > It would still require re encoding all the keys, but it's a partial solutions. > > From: Eric Redmond > Sent: Monday, 11 August 2014 9:15 AM > To: David James > Cc: riak-users > Subject: Re: Using UUID as keys is problematic for Riak Search > > You're correct that yokozuna only supports utf8, because the Solr interface > only supports utf8 (note that the failure happens when attempting to build a > non-utf8 JSON add document command). There's not much we can do here at the > moment, since we've yet to (if ever) support a custom interface to Solr that > accepts arbitrary binary values. In the mean time, to use yokozuna, you'll > have to encode your keys to utf8. > > Eric Redmond, Engineer @ Basho > > > On Sun, Aug 10, 2014 at 4:01 PM, David James wrote: > > I'm using UUIDs for keys in Riak -- converted to bytes, not UTF-8 strings. > (I'd rather spend 16 bytes for each key, not 36.) > > As I understand it, Yokozuna maps the Riak key to _yz_id. > > Here is the suggested schema from the documentation: > > > multiValued="false" required="true"/> > > > Would you expect this to work with Riak Search? I would hope so. > > (Or must keys be UTF-8 strings?) > > I get this error, which does not surprise me, given that the _yz_id is > defined as a string: > ==> log/error.log <== > > 2014-08-10 18:24:16.221 [error] <0.610.0>@yz_kv:index:206 failed to index > object > {<<"test-0001">>,<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>} > with error {ucs,{bad_utf8_character_code}} because > [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,185}]},{mochijson2,json_encode_string,2,[{file,"src/mochijson2.erl"},{line,186}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]}] > > I don't think changing the schema.xml type for _yz_id to "solr.UUIDField" is > a good idea. > > What can I do? > > Thanks, > David > > > > > > > ___ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Using UUID as keys is problematic for Riak Search
Yes, that clarifies it -- much appreciated. On Sun, Aug 10, 2014 at 8:43 PM, Eric Redmond wrote: > I'm at my laptop now so I can talk a bit more about it. > > Don't conflate the value type with the encodings. UUID is a field type, > just like how dates or integers are field types. They explain to the Solr > indexer how to reason about the value it gets. The field type string > "20140810" is encoded differently than the integer value 20140810 or Date > "20140810". This is important for the queries you can build, as a date > range query is different than an integer or string range. > > That said, in Solr, usually UUID is generated on the backend, such as > with UUIDUpdateProcessorFactory. Even so, you can no more send a binary > UUID than you can a binary date value. > > There are two encodings you have to think about when dealing with Solr. > Anything that's binary needs to be converted to a String that Solr can > understand. Base64 is how you convert a binary value to a string value. So > in the case of your key (in Erlang): > > 1> > base64:encode(<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>). > <<"Xo8hIy20TqSX7UhROA0c+g==">> > > base64 encoding libs exist in any language. > > Once you have this key string in base64, internally, Yokozuna will assume > that string is valid UTF8. > > I was probably a bit hasty when I said "yokozuna only supports UTF8 . What > I should have said is that "yokozuna assumes types/buckets/keys are UTF8 > and encodes values appropriately." > > So in summation: > > UUID: Solr field type > Base64: Encode binary values to a string > UTF8: The assumed string encoding > > Does that help? > Eric > > > On Aug 10, 2014, at 5:03 PM, David James wrote: > > Thanks for the quick responses. > > Eric: I don't understand. Why does Solr have the UUIDField ( > http://lucene.apache.org/solr/4_7_0/solr-core/org/apache/solr/schema/UUIDField.html) > if it were not indexable? What is the nature of the limitation? > > Jason: Thanks, I will consider Base 64 encoding. > > > On Sun, Aug 10, 2014 at 7:19 PM, Jason Campbell wrote: > >> I like UUIDs for everything as well, although I expected compatibility >> issues with something. Base 64 encoding the binary value is a nice >> compromise for me, and takes 22 characters (if you drop the padding) >> instead of the usual 36 for the hyphenated hex format. >> >> It would still require re encoding all the keys, but it's a partial >> solutions. >> >>*From: *Eric Redmond >> *Sent: *Monday, 11 August 2014 9:15 AM >> *To: *David James >> *Cc: *riak-users >> *Subject: *Re: Using UUID as keys is problematic for Riak Search >> >> You're correct that yokozuna only supports utf8, because the Solr >> interface only supports utf8 (note that the failure happens when attempting >> to build a non-utf8 JSON add document command). There's not much we can do >> here at the moment, since we've yet to (if ever) support a custom interface >> to Solr that accepts arbitrary binary values. In the mean time, to use >> yokozuna, you'll have to encode your keys to utf8. >> >> Eric Redmond, Engineer @ Basho >> >> On Sun, Aug 10, 2014 at 4:01 PM, David James >> wrote: >> >> I'm using UUIDs for keys in Riak -- converted to bytes, not UTF-8 >> strings. (I'd rather spend 16 bytes for each key, not 36.) >> >> As I understand it, Yokozuna maps the Riak key to _yz_id. >> >> Here is the suggested schema from the documentation: >> >> >> > multiValued="false" required="true"/> >> >> >> Would you expect this to work with Riak Search? I would hope so. >> >> (Or must keys be UTF-8 strings?) >> >> I get this error, which does not surprise me, given that the _yz_id is >> defined as a string: >> >> ==> log/error.log <== >> >> 2014-08-10 18:24:16.221 [error] <0.610.0>@yz_kv:index:206 failed to index >> object >> {<<"test-0001">>,<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>} >> with error {ucs,{bad_utf8_character_code}} because >> [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,185}]},{mochijson2,json_encode_string,2,[{file,"src/mochijson2.erl"},{line,186}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijso