On 08/24/2010 07:27 AM, Doran, Harold wrote: > There is the stringMatch function in the MiscPsycho package. > >> stringMatch('Hadley', 'Hadley Wickham', normalize = 'no') > [1] 8 >> stringMatch('Hadley', 'Hadley Wickham', normalize = 'yes') > [1] 0.4285714 > > It uses Levenshtein distance to tell you how much they differ by, either > normalized or not. So, the above two tell you the first string differs from > the second string by 8 insertions/deletions/substitutions. The second number > normalizes the comparison such that 1 denotes perfect agreement and 2 denotes > imperfect agreement. > > Examples of an exact match are below. > >> stringMatch('Hadley Wickham', 'Hadley Wickham', normalize = 'yes') > [1] 1 >> stringMatch('Hadley Wickham', 'Hadley Wickham', normalize = 'n') > [1] 0
You're probably looking for something lighter weight, but Bioconductor Biostrings has pairwiseAlignment. > library(Biostrings) > pairwiseAlignment("Hadley Wickham", "Hadley Hamwick") Global PairwiseAlignedFixedSubject (1 of 1) pattern: [1] Hadley W---ick subject: [1] Hadley Hamwick score: 29.5102 > pairwiseAlignment("Hadley Hamwick", "Hadley Wickham") Global PairwiseAlignedFixedSubject (1 of 1) pattern: [1] Hadley Hamwick subject: [1] Hadley W---ick score: 29.5102 > aln <- pairwiseAlignment("Hadley Hamwick", "Haderley Hamwich") > consensusMatrix(aln)["-",] [1] 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 Martin > > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Hadley Wickham > Sent: Tuesday, August 24, 2010 10:17 AM > To: R-help > Subject: [R] Comparing/diffing strings > > Hi all, > > all.equal is generally very useful when you want to find the > differences between two objects. It breaks down however, when you > have two long strings to compare: > >> all.equal(a, b) > [1] "1 string mismatch" > > Does any one know of any good text diffing tools implemented in R? > > Thanks, > > Hadley > -- Martin Morgan Computational Biology / Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109 Location: Arnold Building M1 B861 Phone: (206) 667-2793 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.