Frode Bjerkholt <[EMAIL PROTECTED]> wrote on 05/10/2006 01:10:43: > My intention is to give different terms in a field different boost values. > The queries from a use perspective, will be one fulltext input field. > The following code illustrates this: > > Field f1 = new Field("name", "John", Field.Store.NO, Field.Index.TOKENIZED); > Field f2 = new Field("name", "Doe", Field.Store.NO, Field.Index.TOKENIZED); > > f1.setBoost(1.0f); > f2.setBoost(2.0f); > > doc.add(f1); > doc.add(f2); > > In the current version of Lucene, as far as I now, this does not work - > Allthough it would have been a very powerful feature.
To support this, additional info would need to be stored along with each index token - i.e. along with each occurrence of each index term in each indexed document. There are discussions on adding (in a future "flexible" index structure) token "payloads". If/when this is added, and if this is flexible and general as desired, such boost per token can be stored there and then used at scoring. For more info on this search for "payloads" in the dev mailing list. Notice however that even so, without separating to distinct fields, when searching for "Doe" - both its occurrences as "name" and as "last name" would be collected, and there would be no way to look for only matches of it as, say, "last name". > > The current solution is to make a firstname field and a lastname field, and > then make a complex query like this: > > Input: Eric Doe > > (firstname:Eric OR lastname:Eric^2) AND (firstname:Doe OR lastname:Doe^2) > > The performance of such a query is quite slow, and it becomes even worse when > you have more than two fields and/or more words in the input string. > > My questions: > > 1. Is there a better/faster solution to accomplish such a query? > I think one way (which I don't like but you may think otherwise) would be to insert two tokens for a boosted one at indexing time, so that your indexing code would look like: Field f1 = new Field("mixed", "John Doe", Store.NO, TOKENIZED); doc.add(f1); Field f2 = new Field("mixed", "Doe", Store.NO, TOKENIZED); doc.add(f2); This would enlarge the index. You might need to work the gap (between f1 and f2) to avoid false phrase matches. But your query should be simple and faster. > Field f2 = new Field("name", "Doe", Field.Store.NO, Field.Index.TOKENIZED); > 2. Would it be possible to implement the described feature in a > future version > of Lucene? --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]