Good point on phrase/span queries, Hostetter.

:Assuming you have the following "phrase synonym" (and code that
:that can find them during Analysis)...
:
:  [CyberCafe] => [Cyber] [Cafe]

...
: the only thing that's ever occured to me is to set the position incriment
: of all the words to "0" (but that will still reseult in false positives in
: the "cyber cafe" example) or to pick some high default position incriment
: (bigger then the longest multi-word synonym) and use that normally, and
: reserve incriments of "1" for words in a multi-word synonym.

A good suggestion, however it does have a small side-effect: If I understand you correctly, that strategy will create the following token stream for "CyberCafe Inc.", assuming that we increment by, say, 10 per default:
[cybercafe, 1] [cyber, 1] [cafe, 2] [inc, 10]


In that case, a search for the phrase "cybercafe cafe inc" would return a match. In this case it is acceptable albeit a bit strange to the user, but then again, searching for "cybercafe cafe" IS a bit strange. However, situations can be constructed where the result would be a false positive. Also, we could end up with no match for phrase queries if the slop-factor is too low (e.g. 0): "Cybercafe inc" would not be found unless the same analyse-algorithm also is applied to both the document and the query,
And ranking could also be aversely affected.


There is no such concept as a 2-dimensional term vector?
[CyberCafe Inc] => [[cybercafe], [[cyber] [café]]] [inc]
(in theory it would have to be a directed, acyclic graph (DAG), I guess)

_________________________________________________________________
Del din verden med MSN Spaces  http://spaces.msn.com


--------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]



Reply via email to