I'd like to use lucene to search text documents for the existence of a large list of search terms. I have a file that contains thousands of entries, one word per line. I was thinking about to writing a specialized analyzer that tokenizes the document by looking up each token in the source document in my list of words and return terms for words that exist in my file. I'm hoping that using this approach the index file will contain only items that exist in my document. So once the index is created I should be able to ask the index for all of its terms and whatever is returned is the list of items I'm interested in.
I'm new to Lucene so I'm not sure if I'm going about this the right way. lastly, I want to be able run this process for thousands of documents and store the matches ( and their offset ) in a db. So it should be fairly efficient. I appreciate any comments. TIA, FR