What about boosting documents of the Brand type? You can statically boost those documents with a log() function or something similar ...
On Dec 14, 2007 8:24 PM, Doron Cohen <[EMAIL PROTECTED]> wrote: > It seems that documents having less fields satisfying > the query worth more than those satisfying more fields > of the query, because the first ones are more "to > the point". > > At least it seems like it in the example. > > If this makes sense I would try to compose a top level > boolean query out of the one-field queries, and make it use > a similarity that implements this strange logic of coord. > (Strange, because usually coord punishes for under > matching, while here you want to reward for just that.) > > A tricky part would be to make the sub-queries use > the "regular" coord logic, but first let's see if this is > a valid direction at all. > > Doron > > On Dec 14, 2007 8:03 PM, Paul Elschot <[EMAIL PROTECTED]> wrote: > > > Karl, > > > > This might work for you: > > https://issues.apache.org/jira/browse/LUCENE-293 > > > > Regards, > > Paul Elschot > > > > On Friday 14 December 2007 18:06:01 Karl Wettin wrote: > > > I have an index that contains three sorts of documents: > > > > > > Car brand > > > Tire brand > > > Tire pressure > > > > > > (Please bear with me, the real index has nothing to do with cars. I > > > just try to explain the problem in an alternative domain to avoid NDA > > > conflicts.) > > > > > > There is a heirarchial composite relationship between these sort of > > > documents. A document describing "tire pressure" also contains "tire > > > brand" and "car brand". A document describing "tire brand" also > > > contains information about "car brand". A document describing "car > > > brand" contains only that. > > > > > > The requirement is that the consumer of the API should not have to > > > specify what fields they are searching in. There is no time (nor > > > training data) to implement a hidden markov model (HMM) tokenizer or > > > something along that path in order to extract possible attributes from > > > the query string. Instead the query string is tokenized once per field > > > and assebled to one huge query. Normally this works fairly well. > > > > > > Here are some example documents: > > > > > > Volvo > > > Volvo, Michelin > > > Volvo, Nokian > > > Volvo, Nokian, 2.2 bars > > > Volvo, Firestone, 2.4 bars > > > > > > Saab > > > Saab, Michelin > > > Saab, Nokian > > > Saab, Nokian, 2.1 bars > > > Saab, Firestone > > > Saab, Firestone, 2.4 bars > > > Saab, Firestone, 2.5 bars > > > > > > If I search for Saab the top result will be the document representing > > > the car brand "Saab". The query would look like this: "car:saab > > > tire:saab preasure:saab" > > > > > > But lets say Saab starts manufacturing tires too: > > > > > > Saab > > > Saab, Saab tires > > > Saab, Saab tires, 1.9 bars > > > Saab, Saab tires, 1.8 bars > > > > > > If I search for "Saab" I still want the top result to be Saab the car > > > brand. But it not longer is, the match for "Saab, Saab tires" now > > > have a greater score than "Saab", of course. > > > > > > My idea is to work along the line of indexing "Saab" in the tire brand > > > and tire pressure field too. Now searching for Saab will yeild a > > > result where the car brand "Saab" is the top result. > > > > > > However, this will not work as I have different tokenization > > > strategies for each field (stemming and what not). Tokenizing the > > > query string Saab for the field "tire brand" in Swedish might end up > > > as "saa" and will thus not find the token Saab inserted for the > > > document describing the car brand Saab. > > > > > > I have a couple of experiments in my head I need to try out, starting > > > with tokezining query strings per field and using the tokens generated > > > for the field car brand as query in the tire brand and tire pressure > > > too. And vice versus. > > > > > > Any brilliant ideas that might work? Hacky solutions are OK. > > > > > > > > > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: [EMAIL PROTECTED] > > For additional commands, e-mail: [EMAIL PROTECTED] > > > > > -- Regards, Shai Erera