Hello, I would like to use Lucene to index a set of articles, where several different titles may belong to one single article. Currently I use a field for the article as well as a multi-valued field for the titles.
My problem is: - If I index only one of the titles I won't get matches when someone searches for one of the other titles. Of course a part of the content may match too, but as the title is shorter matches there will get a higher score. - If I index all of the possible titles in a multivalued field this introduces some kind of noise and therefore also bad results. The reason is that Lucene concatenates all the values of multi-valued fields when searching them. While a single one of this fields may be a perfect match this isn't the case when also indexing the alternative titles. I have come up with some (hackish) solutions to this problem like indexing this alternative titles as whole new documents (together with the content). Or by using different field-names for each title (e.g. title01, title02, ...) and using a BooleanSearch to search on all possible titles. What I'm basically looking for is some way to not get the mean score of a multi-valued field but the maximum score. Is there some more elegant solution to implement this? I've thought of some things like indexing multiple terms on the same position - but then there would still be the problem that the length of the titles differs and that this will also result in wrong combinations of the terms in the title. Any suggestions on how to solve this problem? bye, /gst
signature.asc
Description: Digital signature