Sorry for resurrecting an old thread, but how would one go about writing a Lucene query similar to this?
SELECT * FROM patient WHERE first_name = 'Zed' OR allergies IS NULL An AND case would be easy since one would just use a simple TermQuery with a FieldValueFilter added, but what about other boolean cases? Admittedly, this is a contrived example, but the point here is that it seems that since filters are always applied to results after they are returned, how would one go about making the null-ness of a field part of the query logic? On Thu, Feb 16, 2012 at 1:45 PM, Uwe Schindler <u...@thetaphi.de> wrote: > I already mentioned that pseudo NULL term, but the user asked for another > solution... > -- > Uwe Schindler > H.-H.-Meier-Allee 63, 28213 Bremen > http://www.thetaphi.de > > > > Jamie Johnson <jej2...@gmail.com> schrieb: > > Another possible solution is while indexing insert a custom token > which is impossible to show up in the index otherwise, then do the > filter based on that token. > > > On Thu, Feb 16, 2012 at 4:41 PM, Uwe Schindler <u...@thetaphi.de> wrote: > > As the documentation states: > > Lucene is an inverted index that does not have per-document fields. It > only > > knows terms pointing to documents. The query you are searching is a query > > that returns all documents which have no term. To execute this query, it > > will get the term index and iterate all terms of a field, mark those in a > > bitset and negates that. The filter/query I told you uses the FieldCache > to > > do this. Since 3.6 (also in 3.5, but there it is buggy/API different) > there > > is another fieldcache that returns exactly that bitset. The filter > mentioned > > only uses that bitset from this new fieldcache. Fieldcache is populated > on > > first access and keeps alive as long as the underlying index segment is > open > > (means as long as IndexReader is open and the parts of the index is not > > refreshed). If you are also sorting against your fields or doing other > > queries using FieldCache, there is no overhead, otherwise the bitset is > > populated on first access to the filter. > > > > Lucene 3.5 has no easy way to implement that filter, a "NULL" pseudo > term is > > the only solution (and also much faster on the first access in Lucene > 3.6). > > Later accesses hitting the cache in 3.6 will be faster, of course. > > > > Another hacky way to achieve the same results is (works with almost any > > Lucene version): > > BooleanQuery consisting of: MatchAllDocsQuery() as MUST clause and > > PrefixQuery(field, "") as MUST_NOT clause. But the PrefixQuery will do a > > full term index scan without caching :-). You may use > CachingWrapperFilter > > with PrefixFilter instead. > > > > ----- > > Uwe Schindler > > H.-H.-Meier-Allee 63, D-28213 Bremen > > http://www.thetaphi.de > > eMail: u...@thetaphi.de > > > > > >> -----Original Message----- > >> From: Tim Eck [mailto:tim...@gmail.com] > >> Sent: Thursday, February 16, 2012 10:14 PM > >> To: java-user@lucene.apache.org > >> Subject: RE: query for documents WITHOUT a field? > >> > >> Thanks for the fast response. I'll certainly have a look at the upcoming > > 3.6.x > >> release. What is the expected performance for using a negated filter? > >> In particular does it defeat the index in any way and require a full > index > > scan? > >> Is it different between regular fields and numeric fields? > >> > >> For 3.5 and earlier though, is there any suggestion other than magic > > values? > >> > >> -----Original Message----- > >> From: Uwe Schindler [mailto:u...@thetaphi.de] > >> Sent: Thursday, February 16, 2012 1:07 PM > >> To: java-user@lucene.apache.org > >> Subject: RE: query for documents WITHOUT a field? > >> > >> Lucene 3.6 will have a FieldValueFilter that can be negated: > >> > >> Query q = new ConstantScoreQuery(new FieldValueFilter("field", true)); > >> > >> (see http://goo.gl/wyjxn) > >> > >> Lucen 3.5 does not yet have it, you can download 3.6 snapshots from > > Jenkins: > >> http://goo.gl/Ka0gr > >> > >> ----- > >> Uwe Schindler > >> H.-H.-Meier-Allee 63, D-28213 Bremen > >> http://www.thetaphi.de > >> eMail: u...@thetaphi.de > >> > >> > >> > -----Original Message----- > >> > From: Tim Eck [mailto:t...@terracottatech.com] > >> > Sent: Thursday, February 16, 2012 9:59 PM > >> > To: java-user@lucene.apache.org > >> > Subject: query for documents WITHOUT a field? > >> > > >> > My apologies if this answer is readily available someplace, I've > >> > searched around and not found a definitive answer. > >> > > >> > > >> > > >> > I'd like to run a query for documents that _do not_ contain particular > >> indexed > >> > fields to implement something like a SQL-like query where a column is > >> null. > >> > > >> > > >> > > >> > I understand I could possibly use a magic value to represent "null", > >> > but > >> the data > >> > I'm searching doesn't led itself to reserving a value for null. I also > >> understand I > >> > could add an extra field to hold this boolean isNull state but would > >> > love > >> a better > >> > solution :-) > >> > > >> > > >> > > >> > TIA > >> > > >> > > >> > >> > >> > >>_____________________________________________ > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > >> > >> > >>_____________________________________________ > > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > >_____________________________________________ > > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > _____________________________________________ > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >