: The problem that I am trying to solve is : How to index phrases (rather : than phrase querying)? I have a Questions/Answers corpus, the : architecture I am using for IR creates one index for questions and : another one for answers (based on single terms) and then matches between : them. I want to index phrases in addition to single terms (for both : questions and answers) and then make a search for all terms and phrases : in the questions index.
can you elaborate a little on what you mean by "index phrases" ... specificly what is it you want to be able to to do, that you don't think you can do with a PhraseQuery? my best guess, reading between the lines, is that want to discover documents in your "answers" index that might correlate to documents in your "questions" index based on a high overlap of phrases -- i'm also guessing (reading between the lines) that you realize you can use things like TermEnum and TermDocs to find terms in common btween both indexes, and which documents contain those terms if my guesses are correct, indexing using ShingleFilter might be of use to you -- Shingling is (lucene specific?) vernacular for word based ngrams, and by indexing in this way you can get "terms" consisting of multiple successive "words" when indexing, and then match things up that way. as someone else mentioned, you can also use other custom Tokenization if you have a better definition of a "phrase" then just a sequence of successive words (ie: index whole sentences as a single term, etc...) -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org