> 3x3 subset used
>                          Locus1     Locus2         Locus3
> Samp1               GG           <NA>           GG
> Samp2               AG             CA              GA
> Samp3               AG             CA              GG
> 
> The euclidean distance function is defined as: sqrt(sum((x_i - y_i)^2)) My
> assumption was that the difference between x_i and y_i would be the number
> of allelic differences at each base pair site between samples. 

Base R does not share your assumption, which (from a general purpose stats 
point of view) would be a completely outlandish interpretation of the data. As 
far as base R is concerned, these are just arbitrary character strings 
represented (by default) as factors. Since factors are, internally, integers 
assigned (by default) in increasing lexical order to the levels present, if you 
apply dist() to factors constructed from allele data, you will usually get 
complete nonsense in genetic terms. 

You should probably look at something like dist.gene in the ape package: see
https://www.rdocumentation.org/packages/ape/versions/5.0/topics/dist.gene

S Ellison


*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to