"the guys" really helped me understand the issues with wildcards, it's harder than you think <G>. Try looking over the searchable archive for a thread titled "I just don't get wildcards at all" from a couple of hears ago. Note: Lucene has advanced significantly since then, but the underlying combinatorial complexity of wildcards is still there.
Some people have found joy with n-gram representations for this kind of problem, a lot depends upon how big your corpus is and what exactly your requirements are. Search the archive for ngram (or n-gram) for lots of discussion on that topic. Anyway, something to think about. Best Erick On Mon, May 4, 2009 at 12:48 PM, Huntsman84 <tpgarci...@gmail.com> wrote: > > My aim is to handle * phrases *, as you say, but I don't know how to build > a > WildCardQuery for that purpose... I read in the documentation that those > kind of queries can't start with '*' (e.g. * phrase *), so I tryed > MultiPhraseQuery instead. > > Forgive me if I am too newbie, 10 days ago I didn't know this tool > existed... > > > Erick Erickson wrote: > > > > Why are you using MultiPhraseQuery? It appears (warning, > > I haven't really used it) to be designed to handle *phrases*. > > You're problem statement isn't looking at phrases at all, > > just a wildcard single terms. And you're supposed to > > call the first MPQ.add with, say, the first word of the > > *phrase*, not a term vector. So what happens when your > > first add is an array of Terms I have no clue...... > > > > So I'd instead use RegexTermEnum (or possibly > > WildcardTermEnum, don't know how this latter > > works with leading wildcards, you'll have to check) > > to enumerate the first 200 matching terms, and > > add them clause by clause to a BooleanQuery, > > and then use the BooleanQuery to search...... > > > > But I'd also have to ask how users will feel about > > getting some small partial match of the "real" data > > set. By taking the first 200 Terms you find, it looks > > to the user either arbitrary on incomplete. Think about > > using, say, WildcardTermEnum to construct a Filter > > that you then pass to your search. Constructing > > Filters is quite fast, although you lose the wildcarded > > terms' contributions to the score (don't worry about this > > last IMO). > > > > > > Best > > Erick > > > > On Mon, May 4, 2009 at 11:21 AM, Huntsman84 <tpgarci...@gmail.com> > wrote: > > > >> > >> Hi > >> > >> I've tryed this with MultiPhraseQuery, but it always returns me all > >> documents of the index, no matter what expression I use. > >> > >> I've read that adding a set of terms wich their values are all the > >> entered > >> query (e.g. "str"), the search works as the symbol "*" (e.g. "str*"), so > >> I > >> tryed that. > >> > >> My code is like this: > >> > >> // prompt the user > >> System.out.println("Enter query: "); > >> > >> String line = in.readLine(); //if I put "str", I want all matches like > >> "string", "strong", "astray".... > >> > >> line = line.trim(); > >> > >> //I thought this would be ok... > >> MultiPhraseQuery mpquery = new MultiPhraseQuery(); > >> TermEnum te = reader.terms(new Term("contents",line)); > >> > >> /*Home-made conversion from TermEnum to Term[]... I get just 200 > matches, > >> I > >> don't want my > >> machine busy...*/ > >> Term[] terms = new Term[200]; > >> int j=0; > >> > >> while(te.next() && j<200){ > >> terms[j] = te.term(); > >> j++; > >> } > >> > >> mpquery.add(terms); > >> > >> Query query = parser.parse(line); > >> System.out.println("Searching for: " + query.toString(field)); > >> > >> Date start = new Date(); > >> > >> Hits hits = searcher.search(mpquery); > >> for(int k = 0; k<100; k++){ > >> //when I print the results of the search, none of the values match > >> with > >> the query... > >> System.out.println(hits.doc(k).getField("contents").stringValue()); > >> } > >> > >> > >> > >> Ian Lea wrote: > >> > > >> > Hi > >> > > >> > > >> > This is possible. There is an entry on wildcards in the FAQ. See > >> > also RegexQuery and search the mailing lists for ngrams. > >> > > >> > Depending on your setup and requirements you may need to be aware of > >> > the performance implications of wild card searching, particularly > >> > leading wildcards as will be required for the example you give. See > >> > the FAQ and javadocs for WildcardQuery. > >> > > >> > > >> > -- > >> > Ian. > >> > > >> > On Thu, Apr 30, 2009 at 11:46 AM, Huntsman84 <tpgarci...@gmail.com> > >> wrote: > >> >> > >> >> Hello, > >> >> > >> >> > >> >> > >> >> I am new to Lucene, and I don't know if it is possible to obtain > >> results > >> >> providing part of the keyword. > >> >> > >> >> > >> >> > >> >> For example, if I try to search "in", it should return all matches > >> with > >> >> "string", "meaning", "trinity"... > >> >> > >> >> > >> >> > >> >> Am I expecting too much? > >> >> > >> >> > >> >> > >> >> Thank you so much! > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> > For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > >> > > >> > > >> > >> -- > >> View this message in context: > >> > http://www.nabble.com/Searching-for-partial-matches-tp23313810p23370618.html > >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > > > > > > -- > View this message in context: > http://www.nabble.com/Searching-for-partial-matches-tp23313810p23372180.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >