Yes, that clarifies it -- much appreciated.
On Sun, Aug 10, 2014 at 8:43 PM, Eric Redmond <eredm...@basho.com> wrote: > I'm at my laptop now so I can talk a bit more about it. > > Don't conflate the value type with the encodings. UUID is a field type, > just like how dates or integers are field types. They explain to the Solr > indexer how to reason about the value it gets. The field type string > "20140810" is encoded differently than the integer value 20140810 or Date > "20140810". This is important for the queries you can build, as a date > range query is different than an integer or string range. > > That said, in Solr, usually UUID is generated on the backend, such as > with UUIDUpdateProcessorFactory. Even so, you can no more send a binary > UUID than you can a binary date value. > > There are two encodings you have to think about when dealing with Solr. > Anything that's binary needs to be converted to a String that Solr can > understand. Base64 is how you convert a binary value to a string value. So > in the case of your key (in Erlang): > > 1> > base64:encode(<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>). > <<"Xo8hIy20TqSX7UhROA0c+g==">> > > base64 encoding libs exist in any language. > > Once you have this key string in base64, internally, Yokozuna will assume > that string is valid UTF8. > > I was probably a bit hasty when I said "yokozuna only supports UTF8 . What > I should have said is that "yokozuna assumes types/buckets/keys are UTF8 > and encodes values appropriately." > > So in summation: > > UUID: Solr field type > Base64: Encode binary values to a string > UTF8: The assumed string encoding > > Does that help? > Eric > > > On Aug 10, 2014, at 5:03 PM, David James <davidcja...@gmail.com> wrote: > > Thanks for the quick responses. > > Eric: I don't understand. Why does Solr have the UUIDField ( > http://lucene.apache.org/solr/4_7_0/solr-core/org/apache/solr/schema/UUIDField.html) > if it were not indexable? What is the nature of the limitation? > > Jason: Thanks, I will consider Base 64 encoding. > > > On Sun, Aug 10, 2014 at 7:19 PM, Jason Campbell <xia...@xiaclo.net> wrote: > >> I like UUIDs for everything as well, although I expected compatibility >> issues with something. Base 64 encoding the binary value is a nice >> compromise for me, and takes 22 characters (if you drop the padding) >> instead of the usual 36 for the hyphenated hex format. >> >> It would still require re encoding all the keys, but it's a partial >> solutions. >> >> *From: *Eric Redmond >> *Sent: *Monday, 11 August 2014 9:15 AM >> *To: *David James >> *Cc: *riak-users >> *Subject: *Re: Using UUID as keys is problematic for Riak Search >> >> You're correct that yokozuna only supports utf8, because the Solr >> interface only supports utf8 (note that the failure happens when attempting >> to build a non-utf8 JSON add document command). There's not much we can do >> here at the moment, since we've yet to (if ever) support a custom interface >> to Solr that accepts arbitrary binary values. In the mean time, to use >> yokozuna, you'll have to encode your keys to utf8. >> >> Eric Redmond, Engineer @ Basho >> >> On Sun, Aug 10, 2014 at 4:01 PM, David James <davidcja...@gmail.com> >> wrote: >> >> I'm using UUIDs for keys in Riak -- converted to bytes, not UTF-8 >> strings. (I'd rather spend 16 bytes for each key, not 36.) >> >> As I understand it, Yokozuna maps the Riak key to _yz_id. >> >> Here is the suggested schema from the documentation: >> >> <!-- schema.xml --> >> <field name="_yz_id" type="_yz_str" indexed="true" stored="true" >> multiValued="false" required="true"/> >> <fieldType name="_yz_str" class="solr.StrField" sortMissingLast="true"/> >> >> Would you expect this to work with Riak Search? I would hope so. >> >> (Or must keys be UTF-8 strings?) >> >> I get this error, which does not surprise me, given that the _yz_id is >> defined as a string: >> >> ==> log/error.log <== >> >> 2014-08-10 18:24:16.221 [error] <0.610.0>@yz_kv:index:206 failed to index >> object >> {<<"test-0001">>,<<94,143,33,35,45,180,78,164,151,237,72,81,56,13,28,250>>} >> with error {ucs,{bad_utf8_character_code}} because >> [{xmerl_ucs,from_utf8,1,[{file,"xmerl_ucs.erl"},{line,185}]},{mochijson2,json_encode_string,2,[{file,"src/mochijson2.erl"},{line,186}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]},{mochijson2,'-json_encode_proplist/2-fun-0-',3,[{file,"src/mochijson2.erl"},{line,167}]},{lists,foldl,3,[{file,"lists.erl"},{line,1248}]},{mochijson2,json_encode_proplist,2,[{file,"src/mochijson2.erl"},{line,170}]}] >> I don't think changing the schema.xml type for _yz_id to "solr.UUIDField" >> is a good idea. >> >> What can I do? >> >> Thanks, >> David >> >> >> >> >> >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> >> > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com