Hi Grant! Thanks for the reply. I'll look into the links you suggested. Just curious though, what did you do to implement this--if you can spill some of the beans ;-) You think what you did was better than the FuzzyQuery approach? Was it a custom algorithm or did you utilize some framework for this? I basically don't want to reinvent the wheel when doing this name matching issue. Thanks in advance!
-los Grant Ingersoll-6 wrote: > > It's like deja vu all over again. I literally just finished up a > similar task (about 2 hours ago). I didn't use Lucene for it, > although I suppose I could have. Lucene does have the FuzzyQuery > (http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/ > javadoc/org/apache/lucene/search/FuzzyQuery.html) that uses > Levenshtein as a place to start. > > There are other string matching algorithms as well that are used in > various approaches. See http://en.wikipedia.org/wiki/Edit_distance. > Googling record linkage may help. From there, you can pretty much > knock yourself out with all the different approaches > > On Apr 5, 2007, at 3:58 PM, moraleslos wrote: > >> >> I was wondering if anyone has done people name matching using >> Lucene. For >> example, I have a name coming from some external source that I >> would like to >> match with the one I have in my DB. Lets say my DB contains the >> name "John >> Smith". If the external source has something like "Smith John", >> "Smith, >> John", "J. Smith", etc., I would like to rate this matching based >> on some % >> of closeness for review later. I've searched around a bit for >> algorithms >> and I kept seeing the Levenshtein distance algorithm which I'm sure >> Lucene >> uses under the hood. So I trying to guage if Lucene is useful for >> doing >> something specific as this, or are there better algorithms and/or >> software >> out there that does name matching. Thanks in advance! >> >> -los >> -- >> View this message in context: http://www.nabble.com/Lucene-for-name- >> matching-tf3533454.html#a9862342 >> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> > > -------------------------- > Grant Ingersoll > Center for Natural Language Processing > http://www.cnlp.org > > Read the Lucene Java FAQ at http://wiki.apache.org/jakarta-lucene/ > LuceneFAQ > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > -- View this message in context: http://www.nabble.com/Lucene-for-name-matching-tf3533454.html#a9863587 Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]