Re: wildcarded phrase queries

Paul Elschot Wed, 06 Apr 2005 00:39:52 -0700

On Wednesday 06 April 2005 08:19, Chuck Williams wrote:
> Erik Hatcher writes (4/5/2005 5:57 PM):
> 
> > I have a need to implement wildcarded phrase queries, such as this:
> >
> >     "apach? luc*"
> >
> > which would match "apache lucene", for example.  This needs to also 
> > support ordered and unordered proximity like SpanNearQuery does:
> >
> >     "apach? luc*"~10
> >
> > I presume I'm going to have to key off of SpanQuery with a some 
> > specialized subclasses.
> >
> > What approach do you recommend for implementing something like this?
> 
> Hi Erik,
> 
> Might it be as easy as creating a SpanWilcardQuery that transforms into 
> a SpanOrQuery of SpanTermQuery's, and then use a SpanNearQuery of 
> SpanWildcardQuery's?  You could use a WildcardTermEnum.to generate the 
> list of terms for the SpanOrQuery.  This would have some issues like 
> computing the idf as the sum of all the pattern-matched terms, but it 
> looks like that issue still exists with WildcardQuery too.  I haven't 
> done much with SpanQuery's so this might not work out so simply, or be 
> acceptably efficient.


I did something very much like this in the surround query language.
I used a SpanNearClauseFactory for use during rewrite of a truncated
term with the following operations:
- create a SpanNearClauseFactory from a field name and an indexreader.
- add a weighted Term.
  This internally adds a corresponding SpanTermQuery, or
  increases the weight of an already added one.
- add a weighted SpanNearQuery as a subquery (for nesting, but
  that probably doesn't fit your syntax above).
- create a clause from the things added above.
  This  normally produces a SpanOrQuery over the added terms
  and subqueries.

Like the boolean clauses,  it's advisable to set a maximum to the
total number of SpanNearTerms used for a query.

Regards,
Paul Elschot.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: wildcarded phrase queries

Reply via email to