Hellooo, Suppose a user enters ‘box of shoes’ in my search box. I have two documents titled ‘box of clothes’ and ‘box of socks’. I’ve figured out through a separate algorithm that ‘socks’ is more similar to ‘shoes’ than clothes.
I even have a numeric score for the similarity: for socks it’s 0.8 and for clothes is 0.65 How can I feed this info to lucene to help it rank socks higher than clothes? I still want the usual tf-idf rules to apply. Ie’box’ and ‘of’ occur in a lot of documents but ‘socks’ and ‘clothes’ are rarer so they should be given more importance. So I don’t want to have to overwrite the similarity class. I just want to be able to pass in the info that ‘socks’ and ‘clothes’ are both kinda like synonyms for shoes, but socks is more similar to shoes than clothes. May be create a boost using the similarity score which doesn’t artificially boost frequent / less important terms. If I just provided them as regular synonyms, they they will both be considered equal in weight. Thanks.