Re: Searching for partial matches

Erick Erickson Mon, 04 May 2009 10:31:23 -0700

I have no clue, but that would really surprise me. Did you import

org.apache.lucene.search.regex.RegexQuery



??

Have fun
Erick



On Mon, May 4, 2009 at 1:19 PM, Huntsman84 <tpgarci...@gmail.com> wrote:

>
> Ok, I will try that, just one more question.
>
> Do you know why there is a class called "RegexQuery" that appears in the
> API
> documentation but doesn't exist in the lucene-core-2.4.1.jar? I think that
> class would be very useful for my problem...
>
> Thank you so much!!
>
>
> Erick Erickson wrote:
> >
> > "the guys" really helped me understand the issues with wildcards,
> > it's harder than you think <G>. Try looking over the searchable
> > archive for a thread titled "I just don't get wildcards at all" from a
> > couple of hears ago. Note: Lucene has advanced significantly
> > since then, but the underlying combinatorial complexity of
> > wildcards is still there.
> >
> > Some people have found joy with n-gram representations for
> > this kind of problem, a lot depends upon how big your corpus is
> > and what exactly your requirements are. Search the archive for
> > ngram (or n-gram) for lots of discussion on that topic.
> >
> > Anyway, something to think about.
> >
> > Best
> > Erick
> >
> > On Mon, May 4, 2009 at 12:48 PM, Huntsman84 <tpgarci...@gmail.com>
> wrote:
> >
> >>
> >> My aim is to handle * phrases *, as you say, but I don't know how to
> >> build
> >> a
> >> WildCardQuery for that purpose... I read in the documentation that those
> >> kind of queries can't start with '*' (e.g. * phrase *), so I tryed
> >> MultiPhraseQuery instead.
> >>
> >> Forgive me if I am too newbie, 10 days ago I didn't know this tool
> >> existed...
> >>
> >>
> >> Erick Erickson wrote:
> >> >
> >> > Why are you using MultiPhraseQuery? It appears (warning,
> >> > I haven't really used it) to be designed to handle *phrases*.
> >> > You're problem statement isn't looking at phrases at all,
> >> > just a wildcard single terms. And you're supposed to
> >> > call the first MPQ.add with, say, the first word of the
> >> > *phrase*, not a term vector. So what happens when your
> >> > first add is an array of Terms I have no clue......
> >> >
> >> > So I'd instead use RegexTermEnum (or possibly
> >> > WildcardTermEnum, don't know how this latter
> >> > works with leading wildcards, you'll have to check)
> >> > to enumerate the first 200 matching terms, and
> >> > add them clause by clause to a BooleanQuery,
> >> > and then use the BooleanQuery to search......
> >> >
> >> > But I'd also have to ask how users will feel about
> >> > getting some small partial match of the "real" data
> >> > set. By taking the first 200 Terms you find, it looks
> >> > to the user either arbitrary on incomplete. Think about
> >> > using, say, WildcardTermEnum to construct a Filter
> >> > that you then pass to your search. Constructing
> >> > Filters is quite fast, although you lose the wildcarded
> >> > terms' contributions to the score (don't worry about this
> >> > last IMO).
> >> >
> >> >
> >> > Best
> >> > Erick
> >> >
> >> > On Mon, May 4, 2009 at 11:21 AM, Huntsman84 <tpgarci...@gmail.com>
> >> wrote:
> >> >
> >> >>
> >> >> Hi
> >> >>
> >> >> I've tryed this with MultiPhraseQuery, but it always returns me all
> >> >> documents of the index, no matter what expression I use.
> >> >>
> >> >> I've read that adding a set of terms wich their values are all the
> >> >> entered
> >> >> query (e.g. "str"), the search works as the symbol "*" (e.g. "str*"),
> >> so
> >> >> I
> >> >> tryed that.
> >> >>
> >> >> My code is like this:
> >> >>
> >> >> // prompt the user
> >> >> System.out.println("Enter query: ");
> >> >>
> >> >> String line = in.readLine(); //if I put "str", I want all matches
> like
> >> >> "string", "strong", "astray"....
> >> >>
> >> >> line = line.trim();
> >> >>
> >> >> //I thought this would be ok...
> >> >> MultiPhraseQuery mpquery = new MultiPhraseQuery();
> >> >> TermEnum te = reader.terms(new Term("contents",line));
> >> >>
> >> >> /*Home-made conversion from TermEnum to Term[]... I get just 200
> >> matches,
> >> >> I
> >> >> don't want my
> >> >>   machine busy...*/
> >> >> Term[] terms = new Term[200];
> >> >> int j=0;
> >> >>
> >> >> while(te.next() && j<200){
> >> >>    terms[j] = te.term();
> >> >>    j++;
> >> >> }
> >> >>
> >> >> mpquery.add(terms);
> >> >>
> >> >> Query query = parser.parse(line);
> >> >> System.out.println("Searching for: " + query.toString(field));
> >> >>
> >> >> Date start = new Date();
> >> >>
> >> >> Hits hits = searcher.search(mpquery);
> >> >> for(int k = 0; k<100; k++){
> >> >>    //when I print the results of the search, none of the values match
> >> >> with
> >> >> the query...
> >> >>
>  System.out.println(hits.doc(k).getField("contents").stringValue());
> >> >> }
> >> >>
> >> >>
> >> >>
> >> >> Ian Lea wrote:
> >> >> >
> >> >> > Hi
> >> >> >
> >> >> >
> >> >> > This is possible.  There is an entry on wildcards in the FAQ.  See
> >> >> > also RegexQuery and search the mailing lists for ngrams.
> >> >> >
> >> >> > Depending on your setup and requirements you may need to be aware
> of
> >> >> > the performance implications of wild card searching, particularly
> >> >> > leading wildcards as will be required for the example you give.
>  See
> >> >> > the FAQ and javadocs for WildcardQuery.
> >> >> >
> >> >> >
> >> >> > --
> >> >> > Ian.
> >> >> >
> >> >> > On Thu, Apr 30, 2009 at 11:46 AM, Huntsman84 <tpgarci...@gmail.com
> >
> >> >> wrote:
> >> >> >>
> >> >> >> Hello,
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> I am new to Lucene, and I don't know if it is possible to obtain
> >> >> results
> >> >> >> providing part of the keyword.
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> For example, if I try to search "in", it should return all matches
> >> >> with
> >> >> >> "string", "meaning", "trinity"...
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Am I expecting too much?
> >> >> >>
> >> >> >>
> >> >> >>
> >> >> >> Thank you so much!
> >> >> >
> >> >> >
> >> ---------------------------------------------------------------------
> >> >> > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> > For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >> >
> >> >> >
> >> >> >
> >> >>
> >> >> --
> >> >> View this message in context:
> >> >>
> >>
> http://www.nabble.com/Searching-for-partial-matches-tp23313810p23370618.html
> >> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >> >>
> >> >>
> >> >> ---------------------------------------------------------------------
> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >> >>
> >> >>
> >> >
> >> >
> >>
> >> --
> >> View this message in context:
> >>
> http://www.nabble.com/Searching-for-partial-matches-tp23313810p23372180.html
> >> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> >> For additional commands, e-mail: java-user-h...@lucene.apache.org
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://www.nabble.com/Searching-for-partial-matches-tp23313810p23372686.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Re: Searching for partial matches

Reply via email to