Hi Arun, Thanks for your reply. Unfortunately the Comments are just text in the real data. There is no way to differentiate based on the value of the Comments column. I guess because of that reason I couldn't get your solution to work properly. Do you think I can try it for a more general case where we don't merger/split the comments based on the values?
Thanks for your help, I appreciate! On Mon, Jun 10, 2013 at 10:14 PM, arun <smartpink...@yahoo.com> wrote: > HI, > I am not sure about your DataN1 column. If there is any identifier to > differentiate the comments (in this case 1,2,3), then it will easier to > place that in the correct column. > My previous solution is not helpful in situations like these: > dat2<-read.table(text=" > Row_ID_N, Src_Row_ID, DataN1 > 1a, 1, This is comment 1 > 2a, 1, This is comment 2 > 3a, 2, This is comment 2 > 4a, 1, This is comment 3 > ",sep=",",header=TRUE,stringsAsFactors=FALSE) > dat3<-read.table(text=" > Row_ID_N, Src_Row_ID, DataN1 > 1a, 1, This is comment 1 > 2a, 1, This is comment 2 > 3a, 2, This is comment 3 > 4a, 1, This is comment 3 > 5a, 2, This is comment 2 > ",sep=",",header=TRUE,stringsAsFactors=FALSE) > > > library(stringr) > library(plyr) > fun1<- function(data1,data2){ > data2$DataN1<- str_trim(data2$DataN1) > res<- merge(data1,data2,by.x=1,by.y=2) > res1<- res[,-5] > res2<- > ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1)) > Mx1<- max(sapply(res2[,5],length)) > res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){ > indx<- > as.numeric(gsub("[[:alpha:]]","",x)) > x[match(seq(Mx1),indx)] > })),stringsAsFactors=FALSE) > colnames(res3)[grep("X",colnames(res3))]<- > paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) > res3 > } > fun1(dat1,dat2) > # Row_ID_CR Data1 Data2 Data3 > DataComment1 > #1 1 aa bb cc This is > comment 1 > #2 2 dd ee ff > <NA> > # DataComment2 DataComment3 > #1 This is comment 2 This is comment 3 > #2 This is comment 2 <NA> > fun1(dat1,dat3) > # Row_ID_CR Data1 Data2 Data3 > DataComment1 > #1 1 aa bb cc This is > comment 1 > #2 2 dd ee ff > <NA> > # DataComment2 DataComment3 > #1 This is comment 2 This is comment 3 > #2 This is comment 2 This is comment 3 > > > A.K. > > > ----- Original Message ----- > From: arun <smartpink...@yahoo.com> > To: Shreya Rawal <rawal.shr...@gmail.com> > Cc: R help <r-help@r-project.org> > Sent: Monday, June 10, 2013 6:41 PM > Subject: Re: [R] Combining CSV data > > Hi, > Try this: > > dat1<-read.table(text=" > Row_ID_CR, Data1, Data2, Data3 > 1, aa, bb, cc > 2, dd, ee, ff > ",sep=",",header=TRUE,stringsAsFactors=FALSE) > > dat2<-read.table(text=" > Row_ID_N, Src_Row_ID, DataN1 > 1a, 1, This is comment 1 > 2a, 1, This is comment 2 > 3a, 2, This is comment 1 > 4a, 1, This is comment 3 > ",sep=",",header=TRUE,stringsAsFactors=FALSE) > library(stringr) > dat2$DataN1<-str_trim(dat2$DataN1) > res<- merge(dat1,dat2,by.x=1,by.y=2) > res1<-res[,-5] > library(plyr) > res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, > DataN1=list(DataN1)) > res2 > # Row_ID_CR Data1 Data2 Data3 > #1 1 aa bb cc > #2 2 dd ee ff > # DataN1 > #1 This is comment 1, This is comment 2, This is comment 3 > #2 This is comment 1 > > > > res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) > {x[duplicated(x)]<-NA;x}))) > colnames(res3)[grep("X",colnames(res3))]<- > paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) > res3 > # Row_ID_CR Data1 Data2 Data3 > DataComment1 > #1 1 aa bb cc This is > comment 1 > #2 2 dd ee ff This is > comment 1 > # DataComment2 DataComment3 > #1 This is comment 2 This is comment 3 > #2 <NA> <NA> > > A.K. > > > ----- Original Message ----- > From: Shreya Rawal <rawal.shr...@gmail.com> > To: r-help@r-project.org > Cc: > Sent: Monday, June 10, 2013 4:38 PM > Subject: [R] Combining CSV data > > Hello R community, > > I am trying to combine two CSV files that look like this: > > File A > > Row_ID_CR, Data1, Data2, Data3 > 1, aa, bb, cc > 2, dd, ee, ff > > > File B > > Row_ID_N, Src_Row_ID, DataN1 > 1a, 1, This is comment 1 > 2a, 1, This is comment 2 > 3a, 2, This is comment 1 > 4a, 1, This is comment 3 > > And the output I am looking for is, comparing the values of Row_ID_CR and > Src_Row_ID > > Output > > ROW_ID_CR, Data1, Data2, Data3, DataComment1, > DataComment2, DataComment3 > 1, aa, bb, cc, This is > comment1, This is comment2, This is comment 3 > 2, dd, ee, ff, This is > comment1 > > > I am a novice R user, I am able to replicate a left join but I need a bit > more in the final result. > > > Thanks!! > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.