Great, thanks Arun, but I seem to be running into this error. Not sure what did I miss.
> result<-data.frame(final_ouput[,-5],read.table(text=as.character(final_output$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)colnames(result)[5:7]<-paste0("DataComment",1:3) Error: unexpected symbol in "result<-data.frame(final_ouput[,-5],read.table(text=as.character(final_output$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)colnames" On Tue, Jun 11, 2013 at 5:09 PM, arun <smartpink...@yahoo.com> wrote: > > > > HI, > You could use: > result3<- > data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE) > colnames(result3)[5:7]<- paste0("DataComment",1:3) > A.K. > ________________________________ > From: Shreya Rawal <rawal.shr...@gmail.com> > To: arun <smartpink...@yahoo.com> > Sent: Tuesday, June 11, 2013 4:22 PM > Subject: Re: [R] Combining CSV data > > > > Hey Arun, > > I guess you could guide me with this a little bit. I have been working on > the solution Jim suggested (and also because that I could understand it > with my little knowledge of R :)) > > So with these commands I am able to get the data in this format: > > > fileA <- read.csv(text = "Row_ID_CR, Data1, Data2, Data3 > + 1, aa, bb, cc > + 2, dd, ee, ff", as.is = TRUE) > > > > fileB <- read.csv(text = "Row_ID_N, Src_Row_ID, DataN1 > + 1a, 1, This is comment 1 > + 2a, 1, This is comment 2 > + 3a, 2, This is comment 1 > + 4a, 1, This is comment 3", as.is = TRUE) > > > > # get rid of leading/trailing blanks on comments > > fileB$DataN1 <- gsub("^ *| *$", "", fileB$DataN1) > > > > # merge together > > result <- merge(fileA, fileB, by.x = 'Row_ID_CR', by.y = "Src_Row_ID") > > > > # now partition by Row_ID_CR and aggregate the comments > > result2 <- do.call(rbind, > + lapply(split(result, result$Row_ID_CR), function(.grp){ > + cbind(.grp[1L, -c(5,6)], comment = paste(.grp$DataN1, collapse = > '|')) > + }) > + ) > > Row_ID_CR Data1 Data2 Data3 > comment > 1 1 aa bb cc > This is comment 1| This is comment 2| This > is comment 3 > 2 2 dd ee ff > This is comment 1| This is Comment 2 > > I can even split the last column by > this: strsplit(as.character(result2$comment), split='\\|') > > [[1]] > [1] "This is comment 1" "This is comment 2" " This is comment 3" > > [[2]] > [1] "This is comment 1" "This is comment 2" > > > but now I am not sure how to combine everything together. I guess by now > you must have realized how new I am to R :) > > Thanks!! > Shreya > > > > > > > On Tue, Jun 11, 2013 at 1:02 PM, arun <smartpink...@yahoo.com> wrote: > > Hi, > >If the dataset is like this with the comments in the order: > > > >dat2<-read.table(text=" > >Row_ID_N, Src_Row_ID, DataN1 > >1a, 1, This is comment 1 > >2a, 1, This is comment 2 > >3a, 2, This is comment 1 > >4a, 1, This is comment 3 > >",sep=",",header=TRUE,stringsAsFactors=FALSE) > > > >dat3<-read.table(text=" > >Row_ID_N, Src_Row_ID, DataN1 > >1a, 1, This is comment 1 > >2a, 1, This is comment 2 > >3a, 2, This is comment 1 # > > > >4a, 1, This is comment 3 > >5a, 2, This is comment 2 # > > > >",sep=",",header=TRUE,stringsAsFactors=FALSE) > > > > > >library(stringr) > >library(plyr) > >fun1<- function(data1,data2){ > > data2$DataN1<- str_trim(data2$DataN1) > > res<- merge(data1,data2,by.x=1,by.y=2) > > res1<- res[,-5] > > res2<- > ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1)) > > Mx1<- max(sapply(res2[,5],length)) > > res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){ > > c(x,rep(NA,Mx1-length(x))) > > > > })),stringsAsFactors=FALSE) > > colnames(res3)[grep("X",colnames(res3))]<- > paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) > > res3 > > } > > > > > >fun1(dat1,dat2) > ># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >#1 1 aa bb cc This is > comment 1 > > > >#2 2 dd ee ff This is > comment 1 > ># DataComment2 DataComment3 > >#1 This is comment 2 This is comment 3 > >#2 <NA> <NA> > > > > fun1(dat1,dat3) > ># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >#1 1 aa bb cc This is > comment 1 > > > >#2 2 dd ee ff This is > comment 1 > > # DataComment2 DataComment3 > >#1 This is comment 2 This is comment 3 > > > >#2 This is comment 2 <NA> > > > > > >Otherwise, you need to provide an example that matches the real dataset. > >A.K. > > > >________________________________ > >From: Shreya Rawal <rawal.shr...@gmail.com> > >To: arun <smartpink...@yahoo.com> > >Cc: R help <r-help@r-project.org> > >Sent: Tuesday, June 11, 2013 12:22 PM > > > >Subject: Re: [R] Combining CSV data > > > > > > > >Hi Arun, > > > >Thanks for your reply. Unfortunately the Comments are just text in the > real data. There is no way to differentiate based on the value of the > Comments column. I guess because of that reason I couldn't get your > solution to work properly. Do you think I can try it for a more general > case where we don't merger/split the comments based on the values? > > > >Thanks for your help, I appreciate! > > > > > > > >On Mon, Jun 10, 2013 at 10:14 PM, arun <smartpink...@yahoo.com> wrote: > > > >HI, > >>I am not sure about your DataN1 column. If there is any identifier to > differentiate the comments (in this case 1,2,3), then it will easier to > place that in the correct column. > >> My previous solution is not helpful in situations like these: > >> > >>dat2<-read.table(text=" > >>Row_ID_N, Src_Row_ID, DataN1 > >>1a, 1, This is comment 1 > >>2a, 1, This is comment 2 > >>3a, 2, This is comment 2 > >>4a, 1, This is comment 3 > >>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >>dat3<-read.table(text=" > >> > >>Row_ID_N, Src_Row_ID, DataN1 > >>1a, 1, This is comment 1 > >>2a, 1, This is comment 2 > >>3a, 2, This is comment 3 > >>4a, 1, This is comment 3 > >>5a, 2, This is comment 2 > >>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >> > >> > >>library(stringr) > >>library(plyr) > >>fun1<- function(data1,data2){ > >> data2$DataN1<- str_trim(data2$DataN1) > >> res<- merge(data1,data2,by.x=1,by.y=2) > >> res1<- res[,-5] > >> res2<- > ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1)) > >> Mx1<- max(sapply(res2[,5],length)) > >> res3<- > data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){ > >> indx<- > as.numeric(gsub("[[:alpha:]]","",x)) > >> x[match(seq(Mx1),indx)] > >> })),stringsAsFactors=FALSE) > >> > >> colnames(res3)[grep("X",colnames(res3))]<- > paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) > >> res3 > >> } > >>fun1(dat1,dat2) > >> > >># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >>#1 1 aa bb cc This is > comment 1 > >>#2 2 dd ee ff > <NA> > >> > >># DataComment2 DataComment3 > >>#1 This is comment 2 This is comment 3 > >>#2 This is comment 2 <NA> > >> fun1(dat1,dat3) > >> > >># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >>#1 1 aa bb cc This is > comment 1 > >>#2 2 dd ee ff > <NA> > >> > >># DataComment2 DataComment3 > >>#1 This is comment 2 This is comment 3 > >>#2 This is comment 2 This is comment 3 > >> > >> > >> > >>A.K. > >> > >> > >>----- Original Message ----- > >> > >>From: arun <smartpink...@yahoo.com> > >>To: Shreya Rawal <rawal.shr...@gmail.com> > >>Cc: R help <r-help@r-project.org> > >>Sent: Monday, June 10, 2013 6:41 PM > >>Subject: Re: [R] Combining CSV data > >> > >>Hi, > >>Try this: > >> > >>dat1<-read.table(text=" > >>Row_ID_CR, Data1, Data2, Data3 > >>1, aa, bb, cc > >>2, dd, ee, ff > >>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >> > >>dat2<-read.table(text=" > >>Row_ID_N, Src_Row_ID, DataN1 > >>1a, 1, This is comment 1 > >>2a, 1, This is comment 2 > >>3a, 2, This is comment 1 > >>4a, 1, This is comment 3 > >>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >>library(stringr) > >>dat2$DataN1<-str_trim(dat2$DataN1) > >>res<- merge(dat1,dat2,by.x=1,by.y=2) > >> res1<-res[,-5] > >>library(plyr) > >> res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, > DataN1=list(DataN1)) > >> res2 > >> # Row_ID_CR Data1 Data2 Data3 > >>#1 1 aa bb cc > >>#2 2 dd ee ff > >># DataN1 > >>#1 This is comment 1, This is comment 2, This is comment 3 > >>#2 This is comment 1 > >> > >> > >> > >>res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) > {x[duplicated(x)]<-NA;x}))) > >> colnames(res3)[grep("X",colnames(res3))]<- > paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) > >>res3 > >># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >>#1 1 aa bb cc This is > comment 1 > >>#2 2 dd ee ff This is > comment 1 > >># DataComment2 DataComment3 > >>#1 This is comment 2 This is comment 3 > >>#2 <NA> <NA> > >> > >>A.K. > >> > >> > >>----- Original Message ----- > >>From: Shreya Rawal <rawal.shr...@gmail.com> > >>To: r-help@r-project.org > >>Cc: > >>Sent: Monday, June 10, 2013 4:38 PM > >>Subject: [R] Combining CSV data > >> > >>Hello R community, > >> > >>I am trying to combine two CSV files that look like this: > >> > >>File A > >> > >>Row_ID_CR, Data1, Data2, Data3 > >>1, aa, bb, cc > >>2, dd, ee, ff > >> > >> > >>File B > >> > >>Row_ID_N, Src_Row_ID, DataN1 > >>1a, 1, This is comment 1 > >>2a, 1, This is comment 2 > >>3a, 2, This is comment 1 > >>4a, 1, This is comment 3 > >> > >>And the output I am looking for is, comparing the values of Row_ID_CR and > >>Src_Row_ID > >> > >>Output > >> > >>ROW_ID_CR, Data1, Data2, Data3, DataComment1, > >>DataComment2, DataComment3 > >>1, aa, bb, cc, This is > >>comment1, This is comment2, This is comment 3 > >>2, dd, ee, ff, This is > >>comment1 > >> > >> > >>I am a novice R user, I am able to replicate a left join but I need a bit > >>more in the final result. > >> > >> > >>Thanks!! > >> > >> [[alternative HTML version deleted]] > >> > >>______________________________________________ > >>R-help@r-project.org mailing list > >>https://stat.ethz.ch/mailman/listinfo/r-help > >>PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >>and provide commented, minimal, self-contained, reproducible code. > >> > >> > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.