Sure. Convert your simple queries into span queries (which are also
relatively simple). Then, when you index everything in the "all" field,
subclass your analyzer to return a large PositionIncrementGap. Explaining
how this works with words is awkward, so....

doc.add("all", "one two three");
doc.add("all", "four five six");
doc.add("all", "seven eight nine");
index the document.

Assume you've implemented an analyzer that returns 1000 for
getPositionIncrementGap.

Now, the term offsets in the single document will be
one - 0
two - 1
three - 2
four 1003
five 1004
six 1005
seven 2006
eight 2007
nine 2008

Now, if you use SpanNearQuery with a slop of 900 (i.e. "one nine"~900) you
won't get a match because the "distance" between one and nine is more than
900. But "one three"~900 will match.

It's possible to transform any query into a set of span queries, See the
thread "Multiword Highlighting" that Mark Miller and I were exchanging ideas
on recently. Be aware that the code we were talking about has to have a
modification when used on a "regular" index where it pays attention to the
document that each sub-clause comes. The code, as written, assumes you're
using a MemoryIndex for one and only one document, so unless you need
complex queries, I'd just think about rewriting simple queries with ANDs as
a SpanNearQuery.

Best
Erick

On 2/19/07, Kainth, Sachin <[EMAIL PROTECTED]> wrote:

Hi All,

I want to be able to do a search for a term in all fields in a document.


One way this can be done is to put every element of a document in the
default field (or I guess any other single named field) as well as
separate fields in which those elements belong.  So for example if for
my documents I had the following fields:

A, B, C, D and E

If I then set up a field called

All

And for all documents I processed as well as putting the elements of
that document in A, B, C, D and E I would also put them as a
concatenation into All as well.

One problem with this is that if for a particular document I had these
values for my five fields:

A -> Hello
B -> How
C -> Are
D -> You
E -> Mate
(All -> Hello How Are You Mate)

Then a search for "How Are You" in All would return true when no single
field contains this string which is not ideal.

Another problem with this is that it would double the size of the index
(unless Lucene does something clever here).

A way to solve the original issue is to convert the search for "How Are
You" into this:

A:How Are You OR B:How Are You OR C:How Are You OR D:How Are You OR
E:How Are You

This solves both the problems of the solution where we set up the All
field (viz. increasing the size of the index and  bringing back more
results than we should).

However, this solution also has it's drawback and that is that now we
have gone from a simple query to a complex ANDing of all fields in the
document.

My question is this: is there a third way?

Cheers

Sachin


This email and any attached files are confidential and copyright
protected. If you are not the addressee, any dissemination of this
communication is strictly prohibited. Unless otherwise expressly agreed in
writing, nothing stated in this communication shall be legally binding.

The ultimate parent company of the Atkins Group is WS Atkins
plc.  Registered in England No. 1885586.  Registered Office Woodcote Grove,
Ashley Road, Epsom, Surrey KT18 5BW.

Consider the environment. Please don't print this e-mail unless you really
need to.

Reply via email to