Dear Gregg, Check this out:
library(fuzzyjoin) ?stringdist_left_join Best Regards, Ashim On Wed, Jun 15, 2022 at 8:28 PM Gregg Powell via R-help <r-help@r-project.org> wrote: > > Have data sets where there are names, in the first column, client names in > the second, and Client start date in the third. > > There are thousands of these records with thousands of names/clients/client > start dates. The name is entered each time the person begins with a new > client such that each person has many entries in the name column. Often the > names were not entered in a consistent way. With and without middle initial, > middle name, or various abbreviations such as ",RN" at the end of the name. > > Is there a package that can do fuzzy name matching so that the names in name > column get replaced with a "standardized" format - where some type of machine > learning can pick the most common spelling of each repeat name and replace > the different variations with the common spelling? > > I included an example below. First table includes the names with the various > spellings. Second table depicts what I hope to achieve. > > Again - this is on a large scale - there are something like 10,000 records > with names that need to be standardized. > > > Name > > Client > > Client Start Date > > John Good > > Client 1 > > 1/1/2020 > > Joe Jackson > > Client 2 > > 6/1/2020 > > Bob A. Barker > > Client 3 > > 8/1/2020 > > John B. Good > > Client 4 > > 10/1/2020 > > Joe J. Jackson > > Client 5 > > 12/1/2020 > > Bob Allen Barker > > Client 6 > > 1/1/2021 > > John Good > > Client 7 > > 5/1/2021 > > Joe Jack Jackson > > Client 8 > > 8/1/2021 > > Bob Barker > > Client 9 > > 12/1/2021 > > > > > > > > Name > > Client > > Client Start Date > > John Good > > Client 1 > > 1/1/2020 > > Joe J. Jackson > > Client 2 > > 6/1/2020 > > Bob A. Barker > > Client 3 > > 8/1/2020 > > John Good > > Client 4 > > 10/1/2020 > > Joe J. Jackson > > Client 5 > > 12/1/2020 > > Bob A. Barker > > Client 6 > > 1/1/2021 > > John Good > > Client 7 > > 5/1/2021 > > Joe J. Jackson > > Client 8 > > 8/1/2021 > > Bob A. Barker > > Client 9 > > 12/1/2021 > > > > THANKS! > > Gregg Powell > > Arizona, USA______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.