You need to invert the process. Using Lucene may not be the best option... You need to make your document a key into an index of key words. I've done the same thing, but not with Lucene. You need to pass through the document and for each word (token) lookup in some index (hashtable) to find possible keywords or phrases starting with that word, and then see which ones match, and then continue through the document. You may be able to use lucene to do this, but I am not sure if it is the right tool.
What I did was create a hashtable which is keyed by the first word in each keyword I want to match. The value for each hashtable entry is a list of all keywords starting with that word, sorted by length in descending order. When I see a word in the document which has one or more keywords which start with that word, enumerate the sorted list attempting to match the longer ones first. If you get a match, continue processing the document from the end of that match. I only have about 1million total keywords, so I just load entire thing into a in memory hashtable, and have my own document tokenizer. I don't use Lucene at all for that. For example, I might have keywords: Key:"apple" List: (sorted by length in descending order): "apple computer company" "apple computer inc" "apple computer" "apple pie" "apple" Then if I have a document: "I love apple pie, but not apple computer" It finds "apple" but the first one matches "apple pie", then second one matches "apple computer", etc. If you need to use Lucene, you could try parsing your document into a query, and then issue that query (as a big Boolean OR query) to a Lucene index containing your keywords, and then enumerate the matches. But unless you have a lot of keywords to index, it probably doesn't make sense to use Lucene for that. -----Original Message----- From: Ryan Detzel [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 23, 2008 3:31 PM To: java-user@lucene.apache.org Subject: Using lucene to search a bunch of keywords? Everything i've read and seen about luceen is search for keywords in documents; I want to do the reverse. I have a huge list of keywords("big boy","red ball","computer") and I have phrases that I want to see if they keywords are in. For example using the small keyword list above(store in documents in lucene) what's the best approach to pass in a query "the girl likes red balls" and have it match the keyword "red ball"? Thanks. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]