elguillelmo wrote:

Kai_testing Middleton wrote:
The nutch analyzer is NutchDocumentAnalyzer.  Does anyone know how to add
this to the Luke classpath?  I tried this kind of thing but it didn't work


I'm trying to work out the same thing, to no avail. Would anybody be able to
detail how to add Nutch's Analyzer to the Luke's classpath?

What I'm doing at the moment is:

java -classpath lukeall-0.8.1.jar:/path/to/nutchAnalyzer.jar
org.getopt.luke.Luke

Well ... It could be done, but not easily.

First, NutchDocumentAnalyzer is dependent on other Nutch classes (so you need nutch-${version}.jar) but they in turn depend on Hadoop (so you need hadoop-core*.jar), which in turn depends on a dozen or so other jars ... All of this needs to be added to classpath.

Second, this analyzer doesn't have a no-args constructor, it needs a Hadoop Configuration argument. Luke can handle only no-args or single String arg constructors. I would have to change the way Analyzers are instantiated in Luke so that you can pass an existing instance (e.g. one that you created in the scripting plugin context).

Third, NutchDocumentAnalyzer uses CommonGrams, which in turn _require_ the presence of a common-grams.utf8 resource on the classpath.

To summarize: unless you want to get your hands dirty with Luke internals it can't be done.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to