Hi, I am trying to build a search utility that looks for 'similarities' between documents. In other words, for every document listed as a part of search result for a phrase, I want to be able to list documents that are similar to it (but not necessarily match the same search criterion). For example, if my search for "Tomcat" returned "Tomcat installation guide", I want to write a utility that looks for all similar installation guides that may or may not be related to Tomcat.
One approach I am thinking is to use term vectors. Algorithm: first extract the top X term vectors from the current document and create a Boolean query for those terms. Run it against contents of other documents (I will probably have to remove commonly used terms manually?). Resulting documents should be similar to the original one. I am wondering if something like this already exists or someone has a better algorithm/solution. Or am I headed off in the wrong direction with this algorithm? Your advice is highly appreciated. Thanks -Hareesh --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]