One kind of ugly solution > d.f=data.frame(seq1, seq2, stringsAsFactors=FALSE) > d.f[["nMismatch"]] <- with(d.f, { + m <- mapply("!=", strsplit(seq1, ""), strsplit(seq2, "")) + colSums(m) + })
Check out the Bioconductor Biostrings package, especially the version available with the development version of R, for DNA string algorithms. Martin joseph wrote: > Hello > I have 2 columns of short sequences that I would like to compare and count > the number of mismatches and record the number of mismatches in a new > column. The sequences are part of a data frame that looks like this: > seq1=c("CGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CGGTGGTCAGTCTGGGACCTGGGCAGCAGGCT", > "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA") > seq2=c("AGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CAGTGGTCAGTCTGGGACCTGGGCATCAGGCT", > "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA") > d.f=data.frame(seq1, seq2) > thank you for your help > Joseph > > > > > > > > ____________________________________________________________________________________ > Looking for last minute shopping deals? > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.