One kind of ugly solution

 > d.f=data.frame(seq1, seq2, stringsAsFactors=FALSE)
 > d.f[["nMismatch"]] <- with(d.f, {
+   m <- mapply("!=", strsplit(seq1, ""), strsplit(seq2, ""))
+   colSums(m)
+ })

Check out the Bioconductor Biostrings package, especially the version 
available with the development version of R, for DNA string algorithms.

Martin

joseph wrote:
> Hello
> I have 2 columns of short sequences that I would like to compare and count 
> the number of mismatches and  record the  number of mismatches in a new 
> column. The sequences are part of a data frame that looks like this:
> seq1=c("CGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CGGTGGTCAGTCTGGGACCTGGGCAGCAGGCT", 
> "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")
> seq2=c("AGGTGTAGAGGAAAAAAAGGAAACAGGAGTTC","CAGTGGTCAGTCTGGGACCTGGGCATCAGGCT", 
> "CGGGCCTCTCGGCCTGCAGCCCCCAACAGCCA")
> d.f=data.frame(seq1, seq2)
> thank you for your help
> Joseph
> 
> 
> 
> 
> 
> 
>       
> ____________________________________________________________________________________
> Looking for last minute shopping deals?  
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to