Re: [R] Extracting metadata information to corresponding dissimilarity matrix

Rune Grønseth Thu, 18 May 2017 12:38:05 -0700

Brilliant, David, thank you so much!

Cheers,


Rune 

> 16. mai 2017 kl. 18.44 skrev David L Carlson <dcarl...@tamu.edu>:
> 
> Fixing a typo in the original, adding a simplification, and using 
> dissimilarity instead of similarity:
> 
> set.seed(42)
> dta <- data.frame(ID=1:7, gender=sample(c("M", "F"), 7, replace=TRUE),
>     age=sample.int(75, 7))
> dsim <- dist(dta$age) # distance, already lower triangular
> dsim
> 
> dta1 <- dta
> names(dta1) <- paste0(names(dta), "1") # generalizes to more than 3 columns
> dta2 <- dta
> names(dta2) <- paste0(names(dta), "2")
> 
> dta12 <- merge(dta2, dta1) # order is important
> dta12 <- dta12[dta12$ID1 < dta12$ID2, ] # get rid of duplicates
> 
> dta12 <- data.frame(dta12, dsim=as.vector(dsim)) # Typo was here
> dta12 <- dta12[, c("ID1", "ID2", "gender1", "gender2", "age1", "age2", 
> "dsim")]
> dta12
> 
> David C
> 
> 
> -----Original Message-----
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of David L 
> Carlson
> Sent: Tuesday, May 16, 2017 11:21 AM
> To: Rune Grønseth <nielsenr...@me.com>; r-help@r-project.org
> Subject: Re: [R] Extracting metadata information to corresponding 
> dissimilarity matrix
> 
> I think this is what you are trying to do. I've created a data set with 7 
> rows and a similarity matrix based on age:
> 
> set.seed(42)
> dta <- data.frame(ID=1:7, gender=sample(c("M", "F"), 7, replace=TRUE),
>     age=sample.int(75, 7))
> sim <- max(dist(dta$age)) - dist(dta$age) # already lower triangular
> sim
> 
> #    1  2  3  4  5  6
> # 2 24               
> # 3 21 59            
> # 4 40 46 43         
> # 5  0 38 41 22      
> # 6  7 45 48 29 55   
> # 7 55 31 28 47  7 14
> 
> # Now duplicate dta:
> dta1 <- dta
> names(dta1) <- c("ID1", "gender1", "age1")
> dta2 <- dta
> names(dta2) <- c("ID2", "gender2", "age2")
> 
> # Now merge and eliminate unneeded rows
> dta12 <- merge(dta2, dta1) # order is important
> dta12 <- dta12[dta12$ID1 < dta12$ID2, ]
> 
> # Finally combine the similarities with the combined data and rearrange
> # the variable names
> dta12 <- data.frame(dta12mod, sim=as.vector(sim))
> dta12 <- dta12[, c("ID1", "ID2", "gender1", "gender2", "age1", "age2", "sim")]
> dta12
> 
> #    ID1 ID2 gender1 gender2 age1 age2 sim
> # 2    1   2       F       F   11   49  24
> # 3    1   3       F       M   11   52  21
> # 4    1   4       F       F   11   33  40
> # 5    1   5       F       F   11   73   0
> # 6    1   6       F       F   11   66   7
> # 7    1   7       F       F   11   18  55
> # 10   2   3       F       M   49   52  59
> # 11   2   4       F       F   49   33  46
> # 12   2   5       F       F   49   73  38
> # 13   2   6       F       F   49   66  45
> # 14   2   7       F       F   49   18  31
> # 18   3   4       M       F   52   33  43
> # 19   3   5       M       F   52   73  41
> # 20   3   6       M       F   52   66  48
> # 21   3   7       M       F   52   18  28
> # 26   4   5       F       F   33   73  22
> # 27   4   6       F       F   33   66  29
> # 28   4   7       F       F   33   18  47
> # 34   5   6       F       F   73   66  55
> # 35   5   7       F       F   73   18   7
> # 42   6   7       F       F   66   18  14
> 
> -------------------------------------
> David L Carlson
> Department of Anthropology
> Texas A&M University
> College Station, TX 77840-4352
> 
> -----Original Message-----
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Rune Grønseth
> Sent: Tuesday, May 16, 2017 4:31 AM
> To: r-help@r-project.org
> Subject: [R] Extracting metadata information to corresponding dissimilarity 
> matrix
> 
> Hi,
> I am R beginner. I've tried googling and reading, but this might be too 
> simple to be found in the documentation. 
> 
> I have a dissimilarity index (symmetric matrix) from which I have extracted 
> the unique values using the exodist package command "lower". There are 14 
> observations, so there are 91 unique comparisons.
> 
> After this I'd like to extract corresponding metadata from a separate data 
> frame (the 14 observations organized in rows identified by a 
> samplenumber-vector, and other variables as gender, age, et cetera). The aim 
> is to have a new data frame with 91 rows and metadata vectors giving me the 
> value of the dissimilarity index,  gender each of the two observations that 
> are compared by the dissimilarity metric. So if I'm looking for gender 
> differences, I need 5 vectors in the data frame: samplenumber1, 
> samplenumber2, gender1, gender2 and dissimilarity metric.
> 
> Does anyone have suggestions or experiences in reformatting data in this 
> manner? This is just a test-dataset. My full data-set is for more than 100 
> observations, so I need a more general code, if that is possible.
> 
> With great appreciation of any help.
> 
> Rune Grønseth 
> 
> ---
> 
> Rune Grønseth, MD, PhD, postdoctoral fellow
> Department of Thoracic Medicine
> Haukeland University Hospital
> N-5021 Bergen
> Norway
> 
>       [[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Extracting metadata information to corresponding dissimilarity matrix

Reply via email to