Hi Scott, Good points. We deviate from Lucene's Standard tokenizer in a few key ways. I'll add more description to the wiki. Thanks for the input.
Best, Rusty On Mon, Jan 24, 2011 at 2:14 PM, Scott Gonyea <sc...@aitrus.org> wrote: > One concern from me is calling it standard_analyzer_factory... That name > is semi-in-use by Solr: > > > http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory > > > <http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.StandardTokenizerFactory>And > did not have the same behavior as the (previously) Default Tokenizer. > That'll have a lot of potential to confuse people coming from Solr. I'd > suggest calling it something like Generic Analyzer Factory--or at least > sticking some scary wording around it in the wiki. > > Scott > > On Monday, January 24, 2011 at 10:53 AM, Rusty Klophaus wrote: > > Hello Riak Users, > > We are excited to announce the release of Riak Search version 0.14! > > Pre-built installations and source tarballs are available at: > http://downloads.basho.com/ > > Release notes are at (also copied below): > > https://github.com/basho/riak_search/raw/riak_search-0.14.0/releasenotes/riak_search-0.14.0.txt > > Thanks, > Basho > > ------------------------------- > Riak Search 0.14.0 Release Notes > -------------------------------- > > The majority of effort during development of Riak Search 0.14 went > toward rewriting the query parsing and planning system. This fixes all > known query planning bugs. We also managed to add quite a few new > features and performance improvements. See the highlights below for > details. > > Important Configuration and Interface Changes: > > - The system now uses the 'whitespace_analyzer_factory' by > default. (It previously used the 'default_analyzer_factory', which > has been renamed to 'standard_analyzer_factory'.) > > - Indexing and searching will fail with an error message if the > analyzer_factory configuration setting is not set at either a schema > or field level. > > - The method signature for custom Erlang and Javascript extractors has > changed. > > Highlights: > > - Fixed the query parser to properly respect field-level analyzer > settings. > > - Fixed the query parser to correctly handle escaped special > characters and terms within single-quotes and double-quotes. > > - Fixed the query parser's interpretation of inclusive and exclusive > ranges, allowing an inclusive range on one side, and an exclusive > range on the other (mimicking Lucene). > > - Fixed the execution engine to significantly speed up proximity > searches and phrase searches. (678) > > - By default new installations use all Erlang-based extractors, and > the JVM is not started. Setting the analysis_port in etc/app.config > will cause the JVM to start and allow the use of Java Lucene-based > analyzers. > > - System now aborts queries that would queue up too many documents in > a result set. This is controlled by a 'max_search_results' setting > in riak_search. Note that this only affects the Solr > interface. Searches through the Riak Client API that feed into a > Map/Reduce job are still allowed to execute because the system > streams those results. > > - Change handoff of Search data stored in merge_index to be more > memory efficient. > > - Added "*_date", "*_int", "*_text", and "*_txt" dynamic fields to the > default schema. > > ------------ > Improvements > ------------ > > 414 - ETS backend now fully functional (415, 795) > 592 - Make parser multi-schema aware > 783 - Pass Search Props as KeyData to Map/Reduce Query > 788 - Add support for indexing Erlang terms / proplists > 839 - Create a way to globally clear schema cache > 925 - Change search-cmd commands (set_schema, etc.) to use dashes. > > ---------- > Fixed Bugs > ---------- > > 186 - Qilr fails when parsing ISO8601 dates > 311 - Qilr does not correctly parse negative numbers > 363 - Range queries broken for negative numbers > 369 - Range queries broken for ALL integer fields > 405 - Update search:index_dir/N to de-index old documents first > 411 - Our handling of NOT is different from Solr - "NOT X", "AND NOT X", "AND > (NOT X)" > 609 - Calling search:search or search:explain with a binary hangs shell > 611 - Error in inclusive/exclusive range building > 612 - Single term queries shouldn't include proximity clauses > 622 - schema and schemachange test fail after new parser > 711 - Update new #range operator to support negative integers > 729 - Make Qilr use analyzer specified in schema > 732 - Word Position is thrown off by Stopwords > 764 - The function search:delete_doc/2 blocks if run after search:index_dir/2 > 797 - Ranges with quoted terms do not return correct results > 801 - Anonymous javascript extractors stored in Bucket/Keys validate but are > not implemented > 802 - Schema allows default field that is not defined, but breaks when > analyzing > 803 - Cannot use search m/r with riak_client:mapred > 832 - Query parser fails on escaped special characters > 833 - Proximity searching is currently broken for Whitespace Analyzer > 836 - Integer padding is ignored for dynamic fields > 837 - The parser interprets hyphens as negations (NOT) > 840 - JSON and raw extractors assumes a default field of "value" > 849 - Default Erlang Analyzer misses 'that' and 'then' as stop words > 850 - text_analyzers module is not tail-recursive > 864 - Solr output chokes on dates > 885 - Coordinating node exits if result set exceeds available memory > 886 - Query parser error when searching on terms that contain the @ symbol > 935 - Change merge_index fold to be unordered > 956 - Error when setting rs_extractfun through Curl/JSON > > ------------ > Known Issues > ------------ > > 362 - Sorting broken on negative numbers > 399 - Handoff can potentially lead to extraneous postings pointing to a > missing or changed document > 790 - Indexing data too quickly can exhaust the ETS table limit > 814 - text_analyzer:default_analyzer_factory skips unicode code points beyond > 0x7f > 861 - merge_index throws errors when data path contains a period > 866 - Sorting positions may change between Solr Searches > 867 - Solr "rows" and "start" parameters are applied too early > 908 - Solr q.op parameter is ignored (Regression) > 955 - Range searching and wildcards across UTF-8 data is broken > 957 - Error when viewing bucket properties with a set rs_extractfun > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com