Ah that makes sense. This looks perfect now. Thanks for your help on this!
On Wed, Jun 12, 2013 at 9:10 AM, arun <smartpink...@yahoo.com> wrote: > HI Shreya, > #Looks like you run the two line code as a single line. > > result3<- > > data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE) > > > colnames(result3)[5:7]<- paste0("DataComment",1:3) > > result3 > # Row_ID_CR Data1 Data2 Data3 > DataComment1 > #1 1 aa bb cc This is > comment 1 > #2 2 dd ee ff This is > comment 1 > # DataComment2 DataComment3 > #1 This is comment 2 This is comment 3 > #2 <NA> <NA> > > > > A.K. > > > ________________________________ > From: Shreya Rawal <rawal.shr...@gmail.com> > To: arun <smartpink...@yahoo.com> > Cc: R help <r-help@r-project.org>; jim holtman <jholt...@gmail.com> > Sent: Wednesday, June 12, 2013 8:58 AM > Subject: Re: [R] Combining CSV data > > > > Great, thanks Arun, but I seem to be running into this error. Not sure > what did I miss. > > > > result<-data.frame(final_ouput[,-5],read.table(text=as.character(final_output$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)colnames(result)[5:7]<-paste0("DataComment",1:3) > Error: unexpected symbol in > "result<-data.frame(final_ouput[,-5],read.table(text=as.character(final_output$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)colnames" > > > > On Tue, Jun 11, 2013 at 5:09 PM, arun <smartpink...@yahoo.com> wrote: > > > > > > > >HI, > >You could use: > >result3<- > data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE) > >colnames(result3)[5:7]<- paste0("DataComment",1:3) > > > >A.K. > >________________________________ > >From: Shreya Rawal <rawal.shr...@gmail.com> > >To: arun <smartpink...@yahoo.com> > >Sent: Tuesday, June 11, 2013 4:22 PM > > > >Subject: Re: [R] Combining CSV data > > > > > > > >Hey Arun, > > > >I guess you could guide me with this a little bit. I have been working on > the solution Jim suggested (and also because that I could understand it > with my little knowledge of R :)) > > > >So with these commands I am able to get the data in this format: > > > >> fileA <- read.csv(text = "Row_ID_CR, Data1, Data2, Data3 > >+ 1, aa, bb, cc > >+ 2, dd, ee, ff", as.is = TRUE) > >> > >> fileB <- read.csv(text = "Row_ID_N, Src_Row_ID, DataN1 > >+ 1a, 1, This is comment 1 > >+ 2a, 1, This is comment 2 > >+ 3a, 2, This is comment 1 > >+ 4a, 1, This is comment 3", as.is = > TRUE) > >> > >> # get rid of leading/trailing blanks on comments > >> fileB$DataN1 <- gsub("^ *| *$", "", fileB$DataN1) > >> > >> # merge together > >> result <- merge(fileA, fileB, by.x = 'Row_ID_CR', by.y = "Src_Row_ID") > >> > >> # now partition by Row_ID_CR and aggregate the comments > >> result2 <- do.call(rbind, > >+ lapply(split(result, result$Row_ID_CR), function(.grp){ > >+ cbind(.grp[1L, -c(5,6)], comment = paste(.grp$DataN1, collapse > = '|')) > >+ }) > >+ ) > > > >Row_ID_CR Data1 Data2 Data3 > comment > >1 1 aa bb cc > This is comment 1| This is comment 2| This > is comment 3 > >2 2 dd ee ff > This is comment 1| This is Comment 2 > > > >I can even split the last column by > this: strsplit(as.character(result2$comment), split='\\|') > > > >[[1]] > >[1] "This is comment 1" "This is comment 2" " This is comment 3" > > > >[[2]] > >[1] "This is comment 1" "This is comment 2" > > > > > >but now I am not sure how to combine everything together. I guess by now > you must have realized how new I am to R :) > > > >Thanks!! > >Shreya > > > > > > > > > > > > > >On Tue, Jun 11, 2013 at 1:02 PM, arun <smartpink...@yahoo.com> wrote: > > > >Hi, > >>If the dataset is like this with the comments in the order: > >> > >>dat2<-read.table(text=" > >>Row_ID_N, Src_Row_ID, DataN1 > >>1a, 1, This is comment 1 > >>2a, 1, This is comment 2 > >>3a, 2, This is comment 1 > >>4a, 1, This is comment 3 > >>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >> > >>dat3<-read.table(text=" > >>Row_ID_N, Src_Row_ID, DataN1 > >>1a, 1, This is comment 1 > >>2a, 1, This is comment 2 > >>3a, 2, This is comment 1 # > >> > >>4a, 1, This is comment 3 > >>5a, 2, This is comment 2 # > >> > >>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >> > >> > >>library(stringr) > >>library(plyr) > >>fun1<- function(data1,data2){ > >> data2$DataN1<- str_trim(data2$DataN1) > >> res<- merge(data1,data2,by.x=1,by.y=2) > >> res1<- res[,-5] > >> res2<- > ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1)) > >> Mx1<- max(sapply(res2[,5],length)) > >> res3<- > data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){ > >> c(x,rep(NA,Mx1-length(x))) > >> > >> })),stringsAsFactors=FALSE) > >> colnames(res3)[grep("X",colnames(res3))]<- > paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) > >> res3 > >> } > >> > >> > >>fun1(dat1,dat2) > >># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >>#1 1 aa bb cc This is > comment 1 > >> > >>#2 2 dd ee ff This is > comment 1 > >># DataComment2 DataComment3 > >>#1 This is comment 2 This is comment 3 > >>#2 <NA> <NA> > >> > >> fun1(dat1,dat3) > >># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >>#1 1 aa bb cc This is > comment 1 > >> > >>#2 2 dd ee ff This is > comment 1 > >> # DataComment2 DataComment3 > >>#1 This is comment 2 This is comment 3 > >> > >>#2 This is comment 2 <NA> > >> > >> > >>Otherwise, you need to provide an example that matches the real dataset. > >>A.K. > >> > >>________________________________ > >>From: Shreya Rawal <rawal.shr...@gmail.com> > >>To: arun <smartpink...@yahoo.com> > >>Cc: R help <r-help@r-project.org> > >>Sent: Tuesday, June 11, 2013 12:22 PM > >> > >>Subject: Re: [R] Combining CSV data > >> > >> > >> > >>Hi Arun, > >> > >>Thanks for your reply. Unfortunately the Comments are just text in the > real data. There is no way to differentiate based on the value of the > Comments column. I guess because of that reason I couldn't get your > solution to work properly. Do you think I can try it for a more general > case where we don't merger/split the comments based on the values? > >> > >>Thanks for your help, I appreciate! > >> > >> > >> > >>On Mon, Jun 10, 2013 at 10:14 PM, arun <smartpink...@yahoo.com> wrote: > >> > >>HI, > >>>I am not sure about your DataN1 column. If there is any identifier to > differentiate the comments (in this case 1,2,3), then it will easier to > place that in the correct column. > >>> My previous solution is not helpful in situations like these: > >>> > >>>dat2<-read.table(text=" > >>>Row_ID_N, Src_Row_ID, DataN1 > >>>1a, 1, This is comment 1 > >>>2a, 1, This is comment 2 > >>>3a, 2, This is comment 2 > >>>4a, 1, This is comment 3 > >>>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >>>dat3<-read.table(text=" > >>> > >>>Row_ID_N, Src_Row_ID, DataN1 > >>>1a, 1, This is comment 1 > >>>2a, 1, This is comment 2 > >>>3a, 2, This is comment 3 > >>>4a, 1, This is comment 3 > >>>5a, 2, This is comment 2 > >>>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >>> > >>> > >>>library(stringr) > >>>library(plyr) > >>>fun1<- function(data1,data2){ > >>> data2$DataN1<- str_trim(data2$DataN1) > >>> res<- merge(data1,data2,by.x=1,by.y=2) > >>> res1<- res[,-5] > >>> res2<- > ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1)) > >>> Mx1<- max(sapply(res2[,5],length)) > >>> res3<- > data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){ > >>> indx<- > as.numeric(gsub("[[:alpha:]]","",x)) > >>> x[match(seq(Mx1),indx)] > >>> })),stringsAsFactors=FALSE) > >>> > >>> colnames(res3)[grep("X",colnames(res3))]<- > paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) > >>> res3 > >>> } > >>>fun1(dat1,dat2) > >>> > >>># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >>>#1 1 aa bb cc This is > comment 1 > >>>#2 2 dd ee > ff <NA> > >>> > >>># DataComment2 DataComment3 > >>>#1 This is comment 2 This is comment 3 > >>>#2 This is comment 2 <NA> > >>> fun1(dat1,dat3) > >>> > >>># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >>>#1 1 aa bb cc This is > comment 1 > >>>#2 2 dd ee > ff <NA> > >>> > >>># DataComment2 DataComment3 > >>>#1 This is comment 2 This is comment 3 > >>>#2 This is comment 2 This is comment 3 > >>> > >>> > >>> > >>>A.K. > >>> > >>> > >>>----- Original Message ----- > >>> > >>>From: arun <smartpink...@yahoo.com> > >>>To: Shreya Rawal <rawal.shr...@gmail.com> > >>>Cc: R help <r-help@r-project.org> > >>>Sent: Monday, June 10, 2013 6:41 PM > >>>Subject: Re: [R] Combining CSV data > >>> > >>>Hi, > >>>Try this: > >>> > >>>dat1<-read.table(text=" > >>>Row_ID_CR, Data1, Data2, Data3 > >>>1, aa, bb, cc > >>>2, dd, ee, ff > >>>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >>> > >>>dat2<-read.table(text=" > >>>Row_ID_N, Src_Row_ID, DataN1 > >>>1a, 1, This is comment 1 > >>>2a, 1, This is comment 2 > >>>3a, 2, This is comment 1 > >>>4a, 1, This is comment 3 > >>>",sep=",",header=TRUE,stringsAsFactors=FALSE) > >>>library(stringr) > >>>dat2$DataN1<-str_trim(dat2$DataN1) > >>>res<- merge(dat1,dat2,by.x=1,by.y=2) > >>> res1<-res[,-5] > >>>library(plyr) > >>> res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize, > DataN1=list(DataN1)) > >>> res2 > >>> # Row_ID_CR Data1 Data2 Data3 > >>>#1 1 aa bb cc > >>>#2 2 dd ee ff > >>># DataN1 > >>>#1 This is comment 1, This is comment 2, This is comment 3 > >>>#2 This is comment 1 > >>> > >>> > >>> > >>>res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x) > {x[duplicated(x)]<-NA;x}))) > >>> colnames(res3)[grep("X",colnames(res3))]<- > paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))])) > >>>res3 > >>># Row_ID_CR Data1 Data2 Data3 > DataComment1 > >>>#1 1 aa bb cc This is > comment 1 > >>>#2 2 dd ee ff This is > comment 1 > >>># DataComment2 DataComment3 > >>>#1 This is comment 2 This is comment 3 > >>>#2 <NA> <NA> > >>> > >>>A.K. > >>> > >>> > >>>----- Original Message ----- > >>>From: Shreya Rawal <rawal.shr...@gmail.com> > >>>To: r-help@r-project.org > >>>Cc: > >>>Sent: Monday, June 10, 2013 4:38 PM > >>>Subject: [R] Combining CSV data > >>> > >>>Hello R community, > >>> > >>>I am trying to combine two CSV files that look like this: > >>> > >>>File A > >>> > >>>Row_ID_CR, Data1, Data2, Data3 > >>>1, aa, bb, cc > >>>2, dd, ee, ff > >>> > >>> > >>>File B > >>> > >>>Row_ID_N, Src_Row_ID, DataN1 > >>>1a, 1, This is comment 1 > >>>2a, 1, This is comment 2 > >>>3a, 2, This is comment 1 > >>>4a, 1, This is comment 3 > >>> > >>>And the output I am looking for is, comparing the values of Row_ID_CR > and > >>>Src_Row_ID > >>> > >>>Output > >>> > >>>ROW_ID_CR, Data1, Data2, Data3, DataComment1, > >>>DataComment2, DataComment3 > >>>1, aa, bb, cc, This is > >>>comment1, This is comment2, This is comment 3 > >>>2, dd, ee, ff, This is > >>>comment1 > >>> > >>> > >>>I am a novice R user, I am able to replicate a left join but I need a > bit > >>>more in the final result. > >>> > >>> > >>>Thanks!! > >>> > >>> [[alternative HTML version deleted]] > >>> > >>>______________________________________________ > >>>R-help@r-project.org mailing list > >>>https://stat.ethz.ch/mailman/listinfo/r-help > >>>PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >>>and provide commented, minimal, self-contained, reproducible code. > >>> > >>> > >> > > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.