Hi Erick, Any comments about this requirement?
2010/6/29 a peng <zhoudengp...@gmail.com> > Hi Erick, > > Thanks for you reply, now I get the point why I can not get the search > result. But can you guide me how can I use Lucene to implement the following > search feature: > Basically we can call this feature "fuzzy phrase search", which means the > search phrase may contains more words or less words compared to the > sentences indexed in the document. Again I want to use the sample I posted > to demo it: > > The document contains a sentence "This is a test", and the search phrase is > "This is a formal test", as there is only one word difference between the > two sentences, how can I get the "This is a test" phrase as the search > result? > > Thanks. > > > > 2010/6/29 Erick Erickson <erickerick...@gmail.com> > > No, I don't think so. The critical bit is that the indexed text >> does NOT contain the word "formal". So searching for >> any phrase that DOES contain "formal" should fail no matter >> what the slop. >> >> Phrase queries are something like "find all the words in this >> search string, ignoring some number of intervening tokens not in the >> search string". There's nothing in there about "find only some of the >> words in the search string".... >> >> I'm guessing that the original post had a typo in the success case, >> because it's contradicted by "a peng's" second post. It's always >> possible that I'm experiencing a brain short...... >> >> Best >> Erick >> >> On Mon, Jun 28, 2010 at 11:32 AM, tarun sapra <t.sapr...@gmail.com> >> wrote: >> >> > Hey Erick >> > >> > Thanks mate! >> > >> > So I guess my explanation in the mail chain above was correct! >> > >> > On Mon, Jun 28, 2010 at 6:20 AM, Erick Erickson < >> erickerick...@gmail.com >> > >wrote: >> > >> > > I think you're misunderstanding the intent of PhraseQueries and slop. >> > Slop >> > > is the number of intervening tokens that may exist between the words >> > > you're looking for. However, all the words you're looking for MUST >> exist. >> > > So, >> > > >> > > <<< whenever the search phrase contains a word that don't >> > > exist in the document, the search result will be empty >>> >> > > >> > > is exactly how this is intended to work. >> > > >> > > HTH >> > > Erick >> > > >> > > >> > > On Mon, Jun 28, 2010 at 9:09 AM, a peng <zhoudengp...@gmail.com> >> wrote: >> > > >> > > > Hi, >> > > > >> > > > My test result is that whenever the search phrase contains a word >> that >> > > > don't >> > > > exist in the document, the search result will be empty no matter how >> > big >> > > > the >> > > > slop factor I set, seems this is a bug of Lucene, or it is work as >> > > design? >> > > > >> > > > 2010/6/28 tarun sapra <t.sapr...@gmail.com> >> > > > >> > > > > Hi , >> > > > > >> > > > > I think I have been able to understand whats happening here... >> > > > > >> > > > > Indexed Content : "This is a test". >> > > > > your search phrase : "This is a formal test" >> > > > > your setting the slop factor 2 , now if your slop factor is 3 it >> > should >> > > > > work >> > > > > because "is" and "a" are stop words thus the words "This" and >> "test" >> > > are >> > > > 2 >> > > > > slop factor apart but in your search phrase "This is a formal >> test" >> > the >> > > > > words "This" and "test" are 3 slop factor thats why it's nor >> working >> > > > > now in search phrase "This is formal test" the words "This" and >> > "test" >> > > > are >> > > > > 2 >> > > > > slop factor apart thats why this phrase is working. >> > > > > >> > > > > >> > > > > >> > > > > On Mon, Jun 28, 2010 at 11:37 AM, a peng <zhoudengp...@gmail.com> >> > > wrote: >> > > > > >> > > > > > Hi, >> > > > > > >> > > > > > I am using StandardAnalyzer(Version.LUCENE_30); >> > > > > > >> > > > > > 2010/6/27 tarun sapra <t.sapr...@gmail.com> >> > > > > > >> > > > > > > which analyzer are you usin'? >> > > > > > > >> > > > > > > >> > > > > > > On Sun, Jun 27, 2010 at 7:12 AM, a peng < >> zhoudengp...@gmail.com> >> > > > > wrote: >> > > > > > > >> > > > > > > > Hi, >> > > > > > > > >> > > > > > > > I know the indexed content contains the following text: >> "This >> > is >> > > a >> > > > > > test". >> > > > > > > > And the search phrase I used is "This is a formal test", and >> > then >> > > I >> > > > > set >> > > > > > > the >> > > > > > > > slop of the PhraseQuery as 2 with setSlop(2), but I found >> that >> > I >> > > > can >> > > > > > not >> > > > > > > > get >> > > > > > > > a search result. If I set the search phrase as "This is >> formal >> > > > test", >> > > > > > > then >> > > > > > > > I >> > > > > > > > can get the search result. >> > > > > > > > >> > > > > > > > So what is the problem here, thanks in advance. >> > > > > > > > >> > > > > > > > >> > > > > > > > Attached is the Java doc for the setSlop method: >> > > > > > > > >> > > > > > > > public void *setSlop*(int s) >> > > > > > > > >> > > > > > > > Sets the number of other words permitted between words in >> query >> > > > > phrase. >> > > > > > > If >> > > > > > > > zero, then this is an exact phrase search. For larger values >> > this >> > > > > works >> > > > > > > > like >> > > > > > > > a WITHIN or NEAR operator. >> > > > > > > > >> > > > > > > > The slop is in fact an edit-distance, where the units >> > correspond >> > > to >> > > > > > moves >> > > > > > > > of >> > > > > > > > terms in the query phrase out of position. For example, to >> > switch >> > > > the >> > > > > > > order >> > > > > > > > of two words requires two moves (the first move places the >> > words >> > > > atop >> > > > > > one >> > > > > > > > another), so to permit re-orderings of phrases, the slop >> must >> > be >> > > at >> > > > > > least >> > > > > > > > two. >> > > > > > > > >> > > > > > > > More exact matches are scored higher than sloppier matches, >> > thus >> > > > > search >> > > > > > > > results are sorted by exactness. >> > > > > > > > >> > > > > > > > The slop is zero by default, requiring exact matches. >> > > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > -- >> > > > > > > Thanks & Regards >> > > > > > > Tarun Sapra >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > Thanks & Regards >> > > > > Tarun Sapra >> > > > > >> > > > >> > > >> > >> > >> > >> > -- >> > Thanks & Regards >> > Tarun Sapra >> > >> > >