Hi Dmitry, On Wed, Oct 13, 2010 at 11:11 PM, Dmitry Demeshchuk <demeshc...@gmail.com>wrote:
> Greetings. > > I have a couple of questions regarding the analyzers, mainly the Java ones. > > 1. Which platform is preferable for use: OpenJDK or Sun's Java? Say, I > won't have any uses for JVM so it will be used just for analyzers. > We have not seen any appreciable difference between the platforms, either one should be fine. Search isn't relying on the JVM to do anything overly complicated. > 2. Could you please give a brief description of the difference between > the analyzers? > Sure: *com.basho.search.analysis.DefaultAnalyzerFactory* uses Lucene's StandardTokenizer, filters out words less than 3 characters, converts tokens to lower case, and filters out the stopwords listed in Lucene's StopAnalyzer.java ( http://www.koders.com/java/fid5FBD7DCAFB544D74598A9B1D82A341CD648DA47F.aspx?s=java ) *com.basho.search.analysis.WhitespaceAnalyzerFactory* uses Lucene's Whitespace tokenizer. *com.basho.search.analysis.IntegerAnalyzerFactory *parses the field as integers and by default pads to 10 places. *{erlang, text_analyzers, default_analyzer_factory} *parses words as having characters 0-9, a-z, or A-Z, filters out words less than 3 characters, converts tokens to lower case, and filters out the same list of stopwords as DefaultAnalyzerFactory. Two things to note: - You can create your own analyzers in Java or Erlang, see the source code under apps/qilr/java_src - Due to a regression bug, field-level analyzer settings are not used when running a query. Whatever default analyzer you set for the schema is used for all fields. > 3. I guess you have already made some benchmarks regarding the > analyzers, haven't you? > We have made some rudimentary benchmarks which shows that Erlang analyzers are currently faster than Java-based analyzers due to the communication overhead. We will be working on this in future iterations. > I remember that you are going to add a special page into wiki about > the subject. Hope this will also help you to gather up the information > a bit. > Absolutely, we will continue to update the wiki with more information about Search going forward. Hope that helps! Best, Rusty
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com