With the new example, what is the full output, and what do you need instead? Was it correct for the previous example?
Matthew "mathijsdevaan" <mathijsdev...@gmail.com> wrote in message news:1298372018181-3318939.p...@n4.nabble.com... > > Hi Matthew, thanks for your help. There are some things going wrong still. > Consider this (slightly extended) example: > > library(data.table) > DT = data.table(read.table(textConnection(" A B C > 1 1 a 1999 > 2 1 b 1999 > 3 1 c 1999 > 4 1 d 1999 > 5 2 c 2001 > 6 2 d 2001 > 7 3 a 2004 > 8 3 b 2004 > 9 3 d 2004 > 10 4 c 2001 > 11 4 d 2001"),head=TRUE,stringsAsFactors=FALSE)) > firststep = DT[,cbind(A,expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2] > firststep > C A Var1 Var2 v > 1 1999 1 b a 0.2500000 > 2 1999 1 c a 0.2500000 > 3 1999 1 d a 0.2500000 > 4 1999 1 a b 0.2500000 > 5 1999 1 c b 0.2500000 > 6 1999 1 d b 0.2500000 > 7 1999 1 a c 0.2500000 > 8 1999 1 b c 0.2500000 > 9 1999 1 d c 0.2500000 > 10 1999 1 a d 0.2500000 > 11 1999 1 b d 0.2500000 > 12 1999 1 c d 0.2500000 > 13 2001 2 b a 0.2500000 > 14 2001 4 b a 0.2500000 > 15 2001 2 a b 0.2500000 > 16 2001 4 a b 0.2500000 > 17 2001 2 b a 0.2500000 > 18 2001 4 b a 0.2500000 > 19 2001 2 a b 0.2500000 > 20 2001 4 a b 0.2500000 > 21 2004 3 b a 0.3333333 > 22 2004 3 c a 0.3333333 > 23 2004 3 a b 0.3333333 > 24 2004 3 c b 0.3333333 > 25 2004 3 a c 0.3333333 > 26 2004 3 b c 0.3333333 > > Following "firststep", project 2 and 4 involved individuals a and b, while > actually c and d were involved. It seems that there is something going > wrong > in transforming the data. > > Then going to the final result, a list is generated of years and sums of > v, > rather than a list of projects and sums of v. Probably I haven't been > clear > enough: I want to produce a list of all projects and the familiarity of > all > project members involved right before the start of the project. > > Example > project_id familiarity > 4 0.25 > > Members c and d were jointly involved in 3 projects: 1,2,4. Project 4 took > place in 2001, so only project 1 took place before that (1999 (project 2 > took place in the same year and is therefore not included). The average > familiarity between the members in project 1 was 1/4, so: > > project_id familiarity > 4 0.25 > > Thanks! > > > Matthew Dowle wrote: >> >> >> Thanks for the attempt and required output. How about this? >> >> firststep = DT[,cbind(expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2] >> setkey(firststep,Var1,Var2,C) >> firststep = firststep[,transform(.SD,cv=cumsum(v)),by=list(Var1,Var2)] >> setkey(firststep,Var1,Var2,C) >> DT[, {x=data.table(expand.grid(B,B),C[1]-1L) >> firststep[x,roll=TRUE,nomatch=0][,sum(cv)] # prior familiarity >> },by=C] >> C V1 >> [1,] 1999 0.0 >> [2,] 2001 0.5 >> [3,] 2004 2.5 >> >> I think you may have said you have large data. If so, this >> method should be fast. Please let us know how you get on. >> >> HTH >> Matthew >> >> >> >> On Thu, 17 Feb 2011 23:07:19 -0800, mathijsdevaan wrote: >> >>> OK, for the last step I have tried this (among other things): >>> library(data.table) >>> DT = data.table(read.table(textConnection(" A B C 1 1 a 1999 >>> 2 1 b 1999 >>> 3 1 c 1999 >>> 4 1 d 1999 >>> 5 2 c 2001 >>> 6 2 d 2001 >>> 7 3 a 2004 >>> 8 3 b 2004 >>> 9 3 d 2004"),head=TRUE,stringsAsFactors=FALSE)) >>> >>> firststep = DT[,cbind(expand.grid(B,B),v=1/length(B)),by=C][Var1!=Var2] >>> setkey(firststep,Var1,Var2) >>> list1<-firststep[J(expand.grid(DT$B,DT$B),v=1/length(DT$B)),nomatch=0] >> [,sum(v)] >>> list1 >>> #27 >>> >>> What I would like to get: >>> list >>> 1 0 >>> 2 0.5 >>> 3 2.5 >>> >>> Thanks! >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > -- > View this message in context: > http://r.789695.n4.nabble.com/Re-Transforming-relational-data-tp3307449p3318939.html > Sent from the R help mailing list archive at Nabble.com. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.