Re: Exact match on entire field

Karl Wettin Wed, 06 May 2009 09:04:32 -0700

You should probably tell us the reason to why you need thisfunctionallity.

Given you only load the stored comparative field for the first itdoesn't really have to be that expensive. If you know that the firsthit was not a perfect match then you know that any matching documentswith a lower score isn't a perfect match. Stemming et c could howevermess things up for you.

There is nothing in Lucene that tells your if the query yielded aperfect match or not, only how much greater precition one hit hascompared to another. Depending on your needs and your corpus it'spossible to use this information to solve the problem.

You could try to find a delta score threadshold that tells you whereperfect matches begin and end in the results. With some luck thelength normalization built in to Lucene is enough to find this. If notyou can look at more expensive solutions that increase the score ofperfect matches by adding BOL and EOL token markers in your index and(0-slop) query:


index:
"^", "bloemendaal", "$"
"^", "adele", "bloemendaal", "$"

query:
("bloemendaal")
OR ("^", "bloemendaal")
OR ("bloemendaal", "$")
OR ("^", "bloemendaal", "$")

You could use either span queries or shingles and you'll probably haveto fiddle around with boosts on the clauses.

Be aware, it's rather expensive to search for tokens that exists inall documents, so it's probably a lot speedier to use shingles andskip single BOL/EOL tokens in the index as required by span queries.But shingles will make your index explode in size. And lots of BOL/EOLtokens can mess with the idf(t).

There has been a bit of talk about adding functionallity to retrievewhat queryies matched a specific document. If this was in place youcould simple check if the ("^", "bloemendaal", "$") clause matched andyou'll know it was a perfect match. At current rate such a patch mightbe available in a few months from now. You are of course more thanwelcome to implement and contribute such a patch if you have the time.



I hope this helped,

     karl


6 maj 2009 kl. 10.50 skrev Laura Hollink:

Hi,
I am trying to distinguish between a document that matches the querybecause the query *appears* in one of the fields, and a documentthat matches the query because the query equals the complete field.I do want to use an Analyzer for case- and punctuationnormalization. For example:
The query "bloemendaal" matches the complete field "Bloemendaal" ina document in my result list.The query "adele" only partly matches the field "Adele Bloemendaal"in another document.
What is the best way to do this?
I currently solve it by first searching in a normal way, and thanusing the QueryParser on both the query and the relevant field inthe documents in my result list. Finally, I simply compare theparsed query and the parsed field.
        QueryParser parser = new QueryParser(field,new StandardAnalyzer());
        Query query = parser.parse(q);
        Hits hits = is.search(query);
        ...     
        Document doc = hits.doc(i);
        Query myfield = parser.parse(doc.get("skos:prefLabel"));
if(myfield.equals(query)) System.out.println("Query exactly matchesthe entire field.");
        else System.out.println("The field contains the query.");

Is there a better way?

Thanks,
Laura

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Re: Exact match on entire field

Reply via email to