This seems my University project.

Some years ago I did something more or less similar.

Insted of removing some words, I would just rate witch word. For instance
count each word in your description and give a coeficient to it like 1 /
(count * count). After for similarities you can consider to sum for each
common word in the two text.

Like this the most common words will not count in your rating and the not
common words (product name, code, utility, ... for instance) would be very
hight rated.


Marcos Rebelo

-----Original Message-----
From: Paul D. Kraus [mailto:[EMAIL PROTECTED]
Sent: Friday, October 17, 2003 3:05 PM
To: [EMAIL PROTECTED]
Subject: Description Search


Ok here is the scenioro I have two price lists that contain itemcode 
description cost list .... The important two are itemcode and description.
One is our list the other is the price list of our major competitior. Of 
course they use different itemcode to identify there items then we do. I 
need a way to search via the description that will let find the most 
relevant item in our system that matches there descriptions.

Ideas i had. Seperate out each word of there description. Ignore common 
words /and the of .../ . Then have it search our description then based on 
how many of those words are found togeather in our description give the 
item a rating. Then print out there item and descr followed by the top 5 
most likely canidates.

This would be step one anyways. I have never done anything even remotely 
close to this so any insights, modules, or ideas that you may have will be 
most welcome.

Thanks in advance.

Paul Kraus
=-=-=-=-=-=-=-=-=-=-=
PEL Supply Company
Network Administrator
216.267.5775 Voice
216.267.6176 Fax
800.321.1263 Toll Free
www.pelsupply.com
=-=-=-=-=-=-=-=-=-=-=

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-- 
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to