Good point on phrase/span queries, Hostetter.
:Assuming you have the following "phrase synonym" (and code that :that can find them during Analysis)... : : [CyberCafe] => [Cyber] [Cafe]
... : the only thing that's ever occured to me is to set the position incriment : of all the words to "0" (but that will still reseult in false positives in : the "cyber cafe" example) or to pick some high default position incriment : (bigger then the longest multi-word synonym) and use that normally, and : reserve incriments of "1" for words in a multi-word synonym.
A good suggestion, however it does have a small side-effect: If I understand you correctly, that strategy will create the following token stream for "CyberCafe Inc.", assuming that we increment by, say, 10 per default:
[cybercafe, 1] [cyber, 1] [cafe, 2] [inc, 10]
In that case, a search for the phrase "cybercafe cafe inc" would return a match. In this case it is acceptable albeit a bit strange to the user, but then again, searching for "cybercafe cafe" IS a bit strange. However, situations can be constructed where the result would be a false positive. Also, we could end up with no match for phrase queries if the slop-factor is too low (e.g. 0): "Cybercafe inc" would not be found unless the same analyse-algorithm also is applied to both the document and the query,
And ranking could also be aversely affected.
There is no such concept as a 2-dimensional term vector? [CyberCafe Inc] => [[cybercafe], [[cyber] [café]]] [inc] (in theory it would have to be a directed, acyclic graph (DAG), I guess)
_________________________________________________________________ Del din verden med MSN Spaces http://spaces.msn.com
--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]