This seems my University project. Some years ago I did something more or less similar.
Insted of removing some words, I would just rate witch word. For instance count each word in your description and give a coeficient to it like 1 / (count * count). After for similarities you can consider to sum for each common word in the two text. Like this the most common words will not count in your rating and the not common words (product name, code, utility, ... for instance) would be very hight rated. Marcos Rebelo -----Original Message----- From: Paul D. Kraus [mailto:[EMAIL PROTECTED] Sent: Friday, October 17, 2003 3:05 PM To: [EMAIL PROTECTED] Subject: Description Search Ok here is the scenioro I have two price lists that contain itemcode description cost list .... The important two are itemcode and description. One is our list the other is the price list of our major competitior. Of course they use different itemcode to identify there items then we do. I need a way to search via the description that will let find the most relevant item in our system that matches there descriptions. Ideas i had. Seperate out each word of there description. Ignore common words /and the of .../ . Then have it search our description then based on how many of those words are found togeather in our description give the item a rating. Then print out there item and descr followed by the top 5 most likely canidates. This would be step one anyways. I have never done anything even remotely close to this so any insights, modules, or ideas that you may have will be most welcome. Thanks in advance. Paul Kraus =-=-=-=-=-=-=-=-=-=-= PEL Supply Company Network Administrator 216.267.5775 Voice 216.267.6176 Fax 800.321.1263 Toll Free www.pelsupply.com =-=-=-=-=-=-=-=-=-=-= -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]