Hi, I have a large dataset with info on individuals (B) that have been involved in projects (A) during multiple years (C). The dataset contains three columns: A, B, C. Example: A B C 1 1 a 1999 2 1 b 1999 3 1 c 1999 4 1 d 1999 5 2 c 2001 6 2 d 2001 7 3 a 2004 8 3 c 2004 9 3 d 2004
I am interested in how well all the individuals in a project know each other. To calculate this team familiarity measure I want to sum the familiarity between all individual pairs in a team. The familiarity between each individual pair in a team is calculated as the summation of each pair's prior co-appearance in a project divided by the total number of team members. So the team familiarity in project 3 = (1/4+1/4) + (1/4+1/4+1/2) + (1/4+1/4+1/2) = 2,5 or a has been in project 1 (of size 4) with c and d > 1/4+1/4 and c has been in project 1 (of size 4) with 1 and d > 1/4+1/4 and c has been in project 2 (of size 2) with d > 1/2. I think that the best way to do it is to transform the data into an edgelist (each pair in one row/two columns) and then creating two additional columns for the strength of the familiarity and the year of the project in which the pair was active. The problem is that I am stuck already in the first step. So the question is: how do I go from the current data structure to a list of projects and the familiarity of its team members? Your help is very much appreciated. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Transforming-relational-data-tp3305398p3305398.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.