RE: Hints on constructing/running Solr analyzer chains standalone

Uwe Schindler Sat, 12 Jul 2014 11:24:55 -0700

The factories are part of Lucene, Solr is just using them. To list of available 
factories (in classpath) use
(Tokenizer|TokenFilter|CharFilter)Factory.availableXxxxx() methods (to list all 
their names). You can invoke them using the corresponding forName() method and 
build an Analyzer from them. The latter has to be done manually, there is no 
general simple thing like Solr's chains. But that is quite easy to implement 
(if you really need an Analyzer instance). To just build a TokenStream for 
analysis, the factories is all you need (in fact Solr's chain just calls the 
factories in order... and returns it as TokenStreamComponents).
You don't need to deal with SPI, just make the factories available in 
classpath, Lucene finds them automatically.


For loading resources, use Lucene's ResourceLoader, which gets passed to the 
Factory's method inform() method. You only *need* to pass one, if and only if 
the factory implements ResourceLoaderAware. There are several ResourceLoaders 
available, Solr has its own very complicated one, but the default Lucene ones 
are: ClasspathResourceLoader, FilesystemResourceLoader.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [email protected]


> -----Original Message-----
> From: Alexandre Rafalovitch [mailto:[email protected]]
> Sent: Saturday, July 12, 2014 7:17 PM
> To: [email protected]
> Subject: Re: Hints on constructing/running Solr analyzer chains standalone
> 
> I don't want to read the schema.xml, but I do want to create factories using
> the same parameters they use in schema. So, it looks like I need to play
> around with ResourceLoaders and maybe SPI loaders, so things like wordlists
> get loaded.
> 
> Starting from FieldAnalyzer turned out to be a dead-end because it was using
> pre-initialized field definitions. But starting again from Test cases seem to 
> be
> somewhat more productive.
> 
> The idea for the project is to give a web UI where a user can quickly put one
> or more analyzer stacks together and see how it/they perform against text
> (multiple texts). A bit similar to FieldAnalyzer but allow to have multiple
> stacks side-by-side and NOT needing to reload the core to add new ones.
> Then, generate the XML definition, ready for pasting in. That's the target
> anyway.
> 
> Regards,
>    Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources:
> http://www.solr-start.com/ and @solrstart Solr popularizers community:
> https://www.linkedin.com/groups?gid=6713853
> 
> 
> On Sat, Jul 12, 2014 at 11:34 PM, Uwe Schindler <[email protected]> wrote:
> > Hi,
> >
> >
> >> Hmmmm, I think it's reasonably straightforward to construct what is
> >> implied by a Solr analysis chain in Lucene, would that do? Or do you
> >> want to read a schema.xml file outside Solr?
> >>
> >> If the former, then you can pretty much skip the Solr code entirely.
> >
> > Read this:
> >
> http://lucene.apache.org/core/4_9_0/core/org/apache/lucene/analysis/pa
> > ckage-summary.html#package_description
> >
> > To do analysis, Solr is not needed at all, unless you want to read
> schema.xml files. If you want to do this, that is quite easy using the
> IndexSchema class. You can then get the analyzer from the field type or field
> name. How to use the analyzer is described above and unrelated to Solr.
> >
> > Uwe
> >
> >> On Sat, Jul 12, 2014 at 6:59 AM, Alexandre Rafalovitch
> >> <[email protected]>
> >> wrote:
> >> > Hello,
> >> >
> >> > I am interested in creating and running Solr analyzer chains
> >> > outside of normal process (no live Solr). Just construct a chain,
> >> > feed it tokens and see what happens.
> >> >
> >> > I would appreciate any hints on what that takes and whether there
> >> > are any hidden/weird dependencies (e.g. for resource discoveries).
> >> > I tried tracing through FieldAnalysis calls, but can't actually
> >> > seem to find the point where the actual analysis is done. Just
> >> > getting lost in sets of NamedList<NamedList<... all alike.
> >> >
> >> > Regards,
> >> >    Alex.
> >> > Personal: http://www.outerthoughts.com/ and @arafalov Solr
> resources:
> >> > http://www.solr-start.com/ and @solrstart Solr popularizers
> community:
> >> > https://www.linkedin.com/groups?gid=6713853
> >> >
> >> > -------------------------------------------------------------------
> >> > -- To unsubscribe, e-mail: [email protected] For
> >> > additional commands, e-mail: [email protected]
> >> >
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected] For
> >> additional commands, e-mail: [email protected]
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected] For
> > additional commands, e-mail: [email protected]
> >
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected] For additional
> commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

RE: Hints on constructing/running Solr analyzer chains standalone

Reply via email to