Re: Fuzzy Name Searching

Ken Dibble Thu, 13 Apr 2017 10:16:07 -0700

Unfortunately, that would require modifications to the database,which I try to avoid due to the downtime they require.
     Why would that be an issue of consequence?
You add some columns to a table. The rest of the software canignore them. (Unless you use select * or other black arts, saidrest might never see the new columns.)

Yeah, there are black arts involved. My framework has some genericcode that expects to be able to process every field in certain typesof tables. If there is not corresponding code for each field incertain subclasses, errors are thrown. Probably a poor design choicein retrospect, but the choice was made in 2004 and rectifying itwould require a huge amount of work.

Thus when I make data changes involving certain tables, I have tolock everybody out not only for the time it takes to make thosechanges, but to update everybody's copy of the software. There areover 100 users at this site, many of whom use the software almostconstantly, and there are other sites that also use the software.


One of those tables would be the table that contains the names of people.

I can work around that by storing the pre-processed data in aseparate table with a one-to-one relationship to the names table(assuming that I don't need backlinks in the names table), but thatseems inefficient to me. My principle for organizing a table ofpeople is that it should contain every required aspect of theperson's identity of which the person can have only one. A soundexvalue or Levenshtein weight for the person's name would qualify as such.

No, I don't have a loader program. I would have to introduce internetdownload capabilities in order to provide that, and that's a headacheI don't need.

I'm looking for suggestions on how to produce results that includeclose matches on last names that doesn't require pre-processing.
     I can not see that the preprocessing would be very involved.

I agree, it would not. It's the necessity of providing storage forthe results that causes the problem.

Well, there's a separate problem--the cost of having to appy one ortwo UDFs that would have to run on every name-search query. I washoping, though, to get some suggestions for UDFs that I could atleast test and see if they aren't impossibly slow.

I suppose there are no easy answers, but if anyone has an algorithmfor this kind of thing that they would be willing to share, I'd be grateful.
There are not, because different languages assign differentvalues to the Roman alphabet characters. You are going to havedecide on language trade-offs.

I'm liking the Levenshtein stuff, and I could use it in my utilityfor detecting and removing duplicate records. That wouldn't involvepre-processing. I think it could really speed that up.


Thanks.

Ken Dibble

www.stic-cil.org


_______________________________________________
Post Messages to: [email protected]
Subscription Maintenance: http://mail.leafe.com/mailman/listinfo/profox
OT-free version of this list: http://mail.leafe.com/mailman/listinfo/profoxtech
Searchable Archive: http://leafe.com/archives/search/profox
This message: 
http://leafe.com/archives/byMID/profox/2E.80.21189.632BFE85@cdptpa-omsmta03
** All postings, unless explicitly stated otherwise, are the opinions of the 
author, and do not constitute legal or medical advice. This statement is added 
to the messages for those lawyers who are too stupid to see the obvious.

Re: Fuzzy Name Searching

Reply via email to