Hi, Fink. Thank you for your reply. Here are some inline comments. On Tue, Oct 12, 2010 at 9:42 PM, Bryan Fink <br...@basho.com> wrote: > On Tue, Oct 12, 2010 at 3:16 AM, Dmitry Demeshchuk <demeshc...@gmail.com> > wrote: >> 1. I tried to put some Erlang terms into Riak bucket that is being >> indexed by Riak Search. I hoped that key-value lists like this > ...snip... >> Is there a way to send Erlang proplists into Riak and process them >> using Riak Search? > > Hi, Dmitry. We've filed a bug for doing exactly this: > > https://issues.basho.com/show_bug.cgi?id=788 > > In the meantime, you could also write your own extractor. See the > "Other Data Encodings" section of using_search.org: > > http://bitbucket.org/basho/riak_search/src/d1f10b876cae/doc/using_search.org#cl-985 > > Or on the wiki: > > http://wiki.basho.com/display/RIAK/Riak+Search+-+Indexing+and+Querying+Riak+KV+Data#RiakSearch-IndexingandQueryingRiakKVData-OtherDataEncodings > >> 2. Is there a way to query Erlang buckets indexes using any other APIs >> than REST API? The only way to query the bucket I found was >> >> /solr/some_bucket/select >> >> and my attempts of using Riak Search shell and Erlang API just failed. > > If you could posts details about the ways in which your attempts > failed (error messages, etc.), we might be able to help you > troubleshoot them. > > The other main way of querying Search indexes is using the map/reduce > Search input. The "Querying via HTTP/Curl" section has an example of > how to hook this up: > > http://bitbucket.org/basho/riak_search/src/d1f10b876cae/doc/using_search.org#cl-783 > > http://wiki.basho.com/display/RIAK/Riak+Search+-+Querying#RiakSearch-Querying-QueryingviaHTTP%2FCurl > > And it's also possible to specify the same map/reduce input using any > of the Erlang clients (native, protocol buffer, or http). Though > there is a small bug with the non-streaming native Erlang client at > the moment (https://issues.basho.com/show_bug.cgi?id=803). For an > example of using that syntax, have a look at the Wriaki project: > > http://bitbucket.org/basho/wriaki/src/d2334be214ce/apps/wriaki/src/wiki_resource.erl#cl-267
I worked it out. Both shell and command-line search work good. Seems like I've been doing something wrong before. > >> 3. Is there a way to write custom analyzers in non-java languages? I >> saw the same question and found an answer that analyzer automatically >> tries to start JVM for its needs. The problem is that we don't have >> good Java and JVM developers so it would be better to use some other >> solutions (like OCaml or even C, for example). Also, I'm kinda >> suspicious about Java analyzers performance. > > At the moment, the only non-Java language supported for custom > analyzers is Erlang. You can specify an Erlang analyzer by adding an > "analyzer_factory" entry to your schema, of the form: > > {analyzer_factory, {erlang, my_modlue, my_function}} > > Other formats for the analyzer_factory setting are: > > {erlang, my_module, my_function, Arguments} > {java, FullyQualifiedClassNameAsString} > {java, FullyQualifiedClassNameAsString, Arguments} > FullyQualifiedClassNameAsString > > The last format is demonstrated in the "Defining a Schema" section of the > docs: > > http://bitbucket.org/basho/riak_search/src/d1f10b876cae/doc/using_search.org#cl-193 > > http://wiki.basho.com/display/RIAK/Riak+Search+-+Schema#RiakSearch-Schema-DefiningaSchema > > Unfortunately, we haven't written much documentation about what an > analyzer is expected to do, but hopefully between the comments in > qilr_analyzer, and the default Erlang analyzer, > text_analyzers:default_analyzer_factory/2, you'll be able to work out > some of what you need. > > http://bitbucket.org/basho/riak_search/src/d1f10b876cae/apps/qilr/src/qilr_analyzer.erl#cl-53 > > http://bitbucket.org/basho/riak_search/src/d1f10b876cae/apps/qilr/src/text_analyzers.erl > >> 4. Do you have any tips and advice about working with Unicode in Riak Search? > > Encode everything in UTF-8. There may still be a few bugs we need to > work out, but our intended goal is to have everything in that > department "just work" once you're using UTF-8 everywhere. I'm not sure if I do everything right but here's the step-by step description of my actions: 1. curl -v -d "{\"title\":\"Статья 1\", \"tags\":\"псто, лытдыбр\", \"body\":\"Я что-то здесь написал\"}" -H "Content-Type: application/json" http://127.0.0.1:8098/riak/posts (Note, there are cyrillic symbols) 2. curl -X POST -H "content-type: application/json" http://localhost:8098/mapred -d '{"inputs":"posts", "query":[{"map":{"language":"javascript","source":"Riak.mapValues", "keep":true}}]}' The result is: ["{\"title\":\"\u0421\u0442\u0430\u0442\u044c\u044f 1\", \"tags\":\"\u043f\u0441\u0442\u043e, \u043b\u044b\u0442\u0434\u044b\u0431\u0440\", \"body\":\"\u042f \u0447\u0442\u043e-\u0442\u043e \u0437\u0434\u0435\u0441\u044c \u043d\u0430\u043f\u0438\u0441\u0430\u043b\"}"] So, the cyrillic strings were encoded properly by Riak itself (not sure if it's on the mochiweb level or somewhere else). 3. curl -X POST -H "content-type: application/json" http://localhost:8098/mapred -d '{"inputs":{"module":"riak_search", "function":"mapred_search", "arg": ["posts", "title:Статья*"]}, "query":[{"map":{"language":"javascript","source":"Riak.mapValues", "keep":true}}]}' This is a map-reduce Riak Search request. It's expected to return the previously posted document. However, it returns an empty list. 4. Tried both shell and command-line search - the same result. 5. If I try to reproduce the same using latin characters, everything just works fine. The JSON data may be partially cyrillic - in that case search works on the latin fields only. Am I doing something wrong? Should I encode characters somehow before I send them into RiakSearch? Thanks. > > -Bryan > -- Best regards, Dmitry Demeshchuk _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com