Perhaps the most straight-forward way would be to index a known unique value
for each document that would have had a null entry. Conceptually, when a
field would be null, index the value "nothinghere". Then you can just search
on documents where the value is equal to "nothinghere".

Alternatively, you could create a filter using TermEnum/TermDocs for each
document that had an entry in the field in question and then invert it to
get your real filter. The trick here is to build your term something like
new Term("field", ""); which enumerates all the terms in a field. Since a
filter is just a bitset, there are operations for inverting it. You can even
store these filters somewhere and reuse them (note, they will have to be
re-built when you change your index). For 2 million documents, they would be
quite small, somewhere around 250K bytes. Building these filters is quite
fast, so the first thing I would try is building them dynamically. If you
use a CachingWrapperFilter, they will be cached automatically for the
duration of your program.


But I'd go with the simple way first, that of indexing a unique value for
the null fields and searching on that....

Best
Erick

On 12/6/06, Supriya Kumar Shyamal <[EMAIL PROTECTED]> wrote:

Hi,

I have some question regarding the search. In a document we can have
several fields but not all fields have the value in all documents i.e.
some fields in a document can have null or empty string. But how to
search for a null field value in a document using the IndexSearcher? Any
idea will be very greaful. Since I have index with 2 million documents
adn Ic an't see thorgh all documents usign Luke.

Thanks,
supriya

--
Mit freundlichen Grüßen / Regards

Supriya Kumar Shyamal

Software Developer
tel +49 (30) 443 50 99 -22
fax +49 (30) 443 50 99 -99
email [EMAIL PROTECTED]
___________________________
artnology GmbH
Milastr. 4
10437 Berlin
___________________________

http://www.artnology.com
__________________________________________________________________________

News / Aktuelle Projekte:
* artnology gewinnt Ausschreibung des Bundesministeriums des Innern:
   Softwarelösung für die Verwaltung der Sammlung zeitgenössischer
   Kunstwerke zur kulturellen Repräsentation des Bundes.

Projektreferenzen:
* Globaler eShop und Corporate-Site für Springer: www.springeronline.com
* E-Detailing-Portal für Novartis: www.interaktiv.novartis.de
* Service-Center-Plattform für Biogen: www.ms-life.de
* eCRM-System für Grünenthal: www.gruenenthal.com


___________________________________________________________________________


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to