Hello!
I need your help. I am trying to calculate the pairwise differences between
sequences from several fasta files.
I would like for each of my DNA alignments (fasta files), calculate the
pairwise differences and then:
- 1. Combine all the data of each file to have one file and one histogram
(mismatch distribution)
- 2. calculate the mean for each difference for all the file and again make
a mismatch distribution plot
Here the script that I wrote:
library("pegas")
> library("seqinr")
> library("ggplot2")
>
>
> Files <- list.files(pattern="fas")
> nb_files <- length(Files)
>
>
> for (i in 1:nb_files) {
> Dist <- as.numeric(dist.gene(read.dna(Files[i], "fasta"), method
> = "pairwise",
> pairwise.deletion = FALSE, variance = FALSE))
>
> Data <- merge(Data, Dist, by=c("x"), all=T)
> }
>
> hist(Data, prob=TRUE)
> lines(density(Data), col="blue", lwd=2)
>
However, the script does not work and I do not know what to change to make
it working.
Thanks in advance for your help.
Myriam
--
Myriam Croze, PhD
Post-doctorante
Division of EcoScience,
Ewha Womans University
Seoul, South Korea
Email: [email protected]
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.