Re: solr and analyzers module

Chris Hostetter Wed, 19 May 2010 13:58:03 -0700

: 1. Solr, like Lucene, should be able to work with an older analyzers
: module for backwards compatibility purposes.


While i don't disagree with you, Solr "philosiphy" has generally 
discouraged the use of "Analyzer" classes in favor of more more discreet 
Tokenizer & TokenFilter pieces -- direct support for Analyzers is mainly a 
result of wanting to allow an easy trannasition for people that already 
have custom analyzers they wrote for direct use in Lucene.  The more fine 
grain analysis chain appraoch that Solr encourages makes it easier for 
people to debug what is going on, and allows for more customization of the 
individiaul stages of the "Analayzer" thta gets built on the fly.

That said: if we can make it easier to use Analyzers, i'm all for it -- I 
just don't want to set things up in a way that people choose to use 
XyzAnalyzer from an analyzer module, when they could get the exact same 
behavior by chaining together XTokenizer, YTokenFilter, and ZTokenFilter 
(from the same module) and in the later case have more transparent 
debugging and fine grained configuration controls.

: So with this idea, analyzers are just a Solr plugin, and the default
: Solr install includes the ones it does today, so most users would not
: see the difference. But if a user wants Polish, Smart Chinese, or
: improved Unicode support, they would be able to drop in one of the
: additional analyzer modules easily.
: 
: The factories for Solr serve as a buffer to hide the implementation
: details, and I think they should be part of these analyzer modules, so

Just to be clear: what you are suggesting is that module-analyzer-XXX.jar 
artifact of modules/analyzers/XXX should not only contain the Tokenizers & 
TokenFilters that relate to XXX, but also the Factories solr expects to 
initialize them -- so a user only needs to add that 
module-analyzer-XXX.jar to their Solr lib dir to get all the 
functionality, instead of needing module-analyzer-XXX.jar plus some 
solr-analyzer-XXX-glue.jar

        ...am i understanding that correctly?

I'm all in favor of this -- anticipating that some of the stuff in 
IndexSchema might eventually get "promoted" up in to a lucene 
contirb/module is the key reason why we made sure a few years back to 
prevent letting FieldTypes/TokenizerFactories/TokenFilter factories be 
"aware" of the SolrCore or the IndexSchema classes -- instead all they are 
allowed to know about is hte concept of a "ResourceLoader" for accessing 
external file resources (ie: via a classpath and or effective directory).

So refactorying the factory APIs + the ResourceLoader into a new module 
should be relatively straight forward (knock on wood)

: 2. example schema definitions (even snippets) for Solr users as a
: documentation artifact, so they know how to use this stuff.

+1


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: solr and analyzers module

Reply via email to