Re: Indexing and search questions

Ahmet Arslan Tue, 20 Apr 2010 14:27:54 -0700

> I'd like to use lucene to search text
> documents for the existence of a large
> list of search terms. I have a file that contains thousands
> of entries, one
> word per line. I was thinking about to writing a
> specialized analyzer
> that tokenizes the document by  looking up each token
> in the source document
> in my list of words and return terms for words that exist
> in my file. I'm
> hoping that using this approach the index file will contain
> only items that
> exist in my document.


Sounds like KeepWordFilter[1][2] is what you are looking for. keepwords.txt 
will be the file that contains thousands of entries, one word per line. 
And just as you guessed using this approach, the index will contain
only items that exist in your document (keepwords.txt). 

I can share the code to use this TokenFilter in Lucene if you want. Or 
alternatively you can easily copy and paste KeepWordFilter.java

[1]http://lucene.apache.org/solr/api/org/apache/solr/analysis/KeepWordFilter.html

[2]http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.KeepWordFilterFactory






---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Indexing and search questions

Reply via email to