Re: [R] Combining CSV data

Shreya Rawal Wed, 12 Jun 2013 06:18:20 -0700

Ah that makes sense. This looks perfect now. Thanks for your help on this!



On Wed, Jun 12, 2013 at 9:10 AM, arun <smartpink...@yahoo.com> wrote:

> HI Shreya,
> #Looks like you run the two line code as a single line.
>
> result3<-
>
> data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)
>
>
> colnames(result3)[5:7]<- paste0("DataComment",1:3)
>
>  result3
> #  Row_ID_CR                 Data1        Data2        Data3
> DataComment1
> #1         1                    aa           bb           cc This is
> comment 1
> #2         2                    dd           ee           ff This is
> comment 1
> #       DataComment2      DataComment3
> #1 This is comment 2 This is comment 3
> #2              <NA>              <NA>
>
>
>
> A.K.
>
>
> ________________________________
> From: Shreya Rawal <rawal.shr...@gmail.com>
> To: arun <smartpink...@yahoo.com>
> Cc: R help <r-help@r-project.org>; jim holtman <jholt...@gmail.com>
> Sent: Wednesday, June 12, 2013 8:58 AM
> Subject: Re: [R] Combining CSV data
>
>
>
> Great, thanks Arun, but I seem to be running into this error. Not sure
> what did I miss.
>
> >
> result<-data.frame(final_ouput[,-5],read.table(text=as.character(final_output$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)colnames(result)[5:7]<-paste0("DataComment",1:3)
> Error: unexpected symbol in
> "result<-data.frame(final_ouput[,-5],read.table(text=as.character(final_output$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)colnames"
>
>
>
> On Tue, Jun 11, 2013 at 5:09 PM, arun <smartpink...@yahoo.com> wrote:
>
>
> >
> >
> >HI,
> >You could use:
> >result3<-
> data.frame(result2[,-5],read.table(text=as.character(result2$comment),sep="|",fill=TRUE,na.strings=""),stringsAsFactors=FALSE)
> >colnames(result3)[5:7]<- paste0("DataComment",1:3)
> >
> >A.K.
> >________________________________
> >From: Shreya Rawal <rawal.shr...@gmail.com>
> >To: arun <smartpink...@yahoo.com>
> >Sent: Tuesday, June 11, 2013 4:22 PM
> >
> >Subject: Re: [R] Combining CSV data
> >
> >
> >
> >Hey Arun,
> >
> >I guess you could guide me with this a little bit. I have been working on
> the solution Jim suggested (and also because that I could understand it
> with my little knowledge of R :))
> >
> >So with these commands I am able to get the data in this format:
> >
> >> fileA <- read.csv(text = "Row_ID_CR,   Data1,    Data2,    Data3
> >+ 1,                   aa,          bb,          cc
> >+ 2,                   dd,          ee,          ff", as.is = TRUE)
> >>
> >> fileB <- read.csv(text = "Row_ID_N,   Src_Row_ID,   DataN1
> >+ 1a,               1,                   This is comment 1
> >+ 2a,               1,                   This is comment 2
> >+ 3a,               2,                   This is comment 1
> >+ 4a,               1,                   This is comment 3", as.is =
> TRUE)
> >>
> >> # get rid of leading/trailing blanks on comments
> >> fileB$DataN1 <- gsub("^ *| *$", "", fileB$DataN1)
> >>
> >> # merge together
> >> result <- merge(fileA, fileB, by.x = 'Row_ID_CR', by.y = "Src_Row_ID")
> >>
> >> # now partition by Row_ID_CR and aggregate the comments
> >> result2 <- do.call(rbind,
> >+     lapply(split(result, result$Row_ID_CR), function(.grp){
> >+         cbind(.grp[1L, -c(5,6)], comment = paste(.grp$DataN1, collapse
> = '|'))
> >+     })
> >+ )
> >
> >Row_ID_CR                 Data1        Data2        Data3
>                                 comment
> >1         1                    aa           bb           cc
>                                This is comment 1| This is comment 2| This
> is comment 3
> >2         2                    dd           ee           ff
>                                  This is comment 1| This is Comment 2
> >
> >I can even split the last column by
> this: strsplit(as.character(result2$comment), split='\\|')
> >
> >[[1]]
> >[1] "This is comment 1" "This is comment 2" " This is comment 3"
> >
> >[[2]]
> >[1] "This is comment 1" "This is comment 2"
> >
> >
> >but now I am not sure how to combine everything together. I guess by now
> you must have realized how new I am to R :)
> >
> >Thanks!!
> >Shreya
> >
> >
> >
> >
> >
> >
> >On Tue, Jun 11, 2013 at 1:02 PM, arun <smartpink...@yahoo.com> wrote:
> >
> >Hi,
> >>If the dataset is like this with the comments in the order:
> >>
> >>dat2<-read.table(text="
> >>Row_ID_N,  Src_Row_ID,  DataN1
> >>1a,              1,                  This is comment 1
> >>2a,              1,                  This is comment 2
> >>3a,              2,                  This is comment 1
> >>4a,              1,                  This is comment 3
> >>",sep=",",header=TRUE,stringsAsFactors=FALSE)
> >>
> >>dat3<-read.table(text="
> >>Row_ID_N,  Src_Row_ID,  DataN1
> >>1a,              1,                  This is comment 1
> >>2a,              1,                  This is comment 2
> >>3a,              2,                  This is comment 1   #
> >>
> >>4a,              1,                  This is comment 3
> >>5a,         2,                  This is comment 2  #
> >>
> >>",sep=",",header=TRUE,stringsAsFactors=FALSE)
> >>
> >>
> >>library(stringr)
> >>library(plyr)
> >>fun1<- function(data1,data2){
> >>    data2$DataN1<- str_trim(data2$DataN1)
> >>        res<- merge(data1,data2,by.x=1,by.y=2)
> >>    res1<- res[,-5]
> >>    res2<-
> ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
> >>    Mx1<- max(sapply(res2[,5],length))
> >>    res3<-
> data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
> >>                                  c(x,rep(NA,Mx1-length(x)))
> >>
> >>                                  })),stringsAsFactors=FALSE)
> >>    colnames(res3)[grep("X",colnames(res3))]<-
> paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
> >>    res3
> >>    }
> >>
> >>
> >>fun1(dat1,dat2)
> >>#  Row_ID_CR                Data1        Data2        Data3
> DataComment1
> >>#1         1                   aa           bb           cc This is
> comment 1
> >>
> >>#2         2                   dd           ee           ff This is
> comment 1
> >>#       DataComment2      DataComment3
> >>#1 This is comment 2 This is comment 3
> >>#2              <NA>              <NA>
> >>
> >> fun1(dat1,dat3)
> >>#  Row_ID_CR                Data1        Data2        Data3
> DataComment1
> >>#1         1                   aa           bb           cc This is
> comment 1
> >>
> >>#2         2                   dd           ee           ff This is
> comment 1
> >> #      DataComment2      DataComment3
> >>#1 This is comment 2 This is comment 3
> >>
> >>#2 This is comment 2              <NA>
> >>
> >>
> >>Otherwise, you need to provide an example that matches the real dataset.
> >>A.K.
> >>
> >>________________________________
> >>From: Shreya Rawal <rawal.shr...@gmail.com>
> >>To: arun <smartpink...@yahoo.com>
> >>Cc: R help <r-help@r-project.org>
> >>Sent: Tuesday, June 11, 2013 12:22 PM
> >>
> >>Subject: Re: [R] Combining CSV data
> >>
> >>
> >>
> >>Hi Arun,
> >>
> >>Thanks for your reply. Unfortunately the Comments are just text in the
> real data. There is no way to differentiate based on the value of the
> Comments column. I guess because of that reason I couldn't get your
> solution to work properly. Do you think I can try it for a more general
> case where we don't merger/split the comments based on the values?
> >>
> >>Thanks for your help, I appreciate!
> >>
> >>
> >>
> >>On Mon, Jun 10, 2013 at 10:14 PM, arun <smartpink...@yahoo.com> wrote:
> >>
> >>HI,
> >>>I am not sure about your DataN1 column.  If there is any identifier to
> differentiate the comments (in this case 1,2,3), then it will easier to
> place that in the correct column.
> >>>  My previous solution is not helpful in situations like these:
> >>>
> >>>dat2<-read.table(text="
> >>>Row_ID_N,  Src_Row_ID,  DataN1
> >>>1a,              1,                  This is comment 1
> >>>2a,              1,                  This is comment 2
> >>>3a,              2,                  This is comment 2
> >>>4a,              1,                  This is comment 3
> >>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
> >>>dat3<-read.table(text="
> >>>
> >>>Row_ID_N,  Src_Row_ID,  DataN1
> >>>1a,              1,                  This is comment 1
> >>>2a,              1,                  This is comment 2
> >>>3a,              2,                  This is comment 3
> >>>4a,              1,                  This is comment 3
> >>>5a,         2,                  This is comment 2
> >>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
> >>>
> >>>
> >>>library(stringr)
> >>>library(plyr)
> >>>fun1<- function(data1,data2){
> >>>    data2$DataN1<- str_trim(data2$DataN1)
> >>>        res<- merge(data1,data2,by.x=1,by.y=2)
> >>>    res1<- res[,-5]
> >>>    res2<-
> ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
> >>>    Mx1<- max(sapply(res2[,5],length))
> >>>    res3<-
> data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
> >>>                                  indx<-
> as.numeric(gsub("[[:alpha:]]","",x))
> >>>                                  x[match(seq(Mx1),indx)]
> >>>                                  })),stringsAsFactors=FALSE)
> >>>
> >>>    colnames(res3)[grep("X",colnames(res3))]<-
> paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
> >>>    res3
> >>>    }
> >>>fun1(dat1,dat2)
> >>>
> >>>#  Row_ID_CR                Data1        Data2        Data3
> DataComment1
> >>>#1         1                   aa           bb           cc This is
> comment 1
> >>>#2         2                   dd           ee
> ff              <NA>
> >>>
> >>>#       DataComment2      DataComment3
> >>>#1 This is comment 2 This is comment 3
> >>>#2 This is comment 2              <NA>
> >>> fun1(dat1,dat3)
> >>>
> >>>#  Row_ID_CR                Data1        Data2        Data3
> DataComment1
> >>>#1         1                   aa           bb           cc This is
> comment 1
> >>>#2         2                   dd           ee
> ff              <NA>
> >>>
> >>>#       DataComment2      DataComment3
> >>>#1 This is comment 2 This is comment 3
> >>>#2 This is comment 2 This is comment 3
> >>>
> >>>
> >>>
> >>>A.K.
> >>>
> >>>
> >>>----- Original Message -----
> >>>
> >>>From: arun <smartpink...@yahoo.com>
> >>>To: Shreya Rawal <rawal.shr...@gmail.com>
> >>>Cc: R help <r-help@r-project.org>
> >>>Sent: Monday, June 10, 2013 6:41 PM
> >>>Subject: Re: [R] Combining CSV data
> >>>
> >>>Hi,
> >>>Try this:
> >>>
> >>>dat1<-read.table(text="
> >>>Row_ID_CR,  Data1,    Data2,    Data3
> >>>1,                  aa,          bb,          cc
> >>>2,                  dd,          ee,          ff
> >>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
> >>>
> >>>dat2<-read.table(text="
> >>>Row_ID_N,  Src_Row_ID,  DataN1
> >>>1a,              1,                  This is comment 1
> >>>2a,              1,                  This is comment 2
> >>>3a,              2,                  This is comment 1
> >>>4a,              1,                  This is comment 3
> >>>",sep=",",header=TRUE,stringsAsFactors=FALSE)
> >>>library(stringr)
> >>>dat2$DataN1<-str_trim(dat2$DataN1)
> >>>res<- merge(dat1,dat2,by.x=1,by.y=2)
> >>> res1<-res[,-5]
> >>>library(plyr)
> >>> res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,
> DataN1=list(DataN1))
> >>> res2
> >>> # Row_ID_CR                Data1        Data2        Data3
> >>>#1         1                   aa           bb           cc
> >>>#2         2                   dd           ee           ff
> >>>#                                                   DataN1
> >>>#1 This is comment 1, This is comment 2, This is comment 3
> >>>#2                                       This is comment 1
> >>>
> >>>
> >>>
> >>>res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x)
> {x[duplicated(x)]<-NA;x})))
> >>> colnames(res3)[grep("X",colnames(res3))]<-
> paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
> >>>res3
> >>>#  Row_ID_CR                Data1        Data2        Data3
> DataComment1
> >>>#1         1                   aa           bb           cc This is
> comment 1
> >>>#2         2                   dd           ee           ff This is
> comment 1
> >>>#       DataComment2      DataComment3
> >>>#1 This is comment 2 This is comment 3
> >>>#2              <NA>              <NA>
> >>>
> >>>A.K.
> >>>
> >>>
> >>>----- Original Message -----
> >>>From: Shreya Rawal <rawal.shr...@gmail.com>
> >>>To: r-help@r-project.org
> >>>Cc:
> >>>Sent: Monday, June 10, 2013 4:38 PM
> >>>Subject: [R] Combining CSV data
> >>>
> >>>Hello R community,
> >>>
> >>>I am trying to combine two CSV files that look like this:
> >>>
> >>>File A
> >>>
> >>>Row_ID_CR,   Data1,    Data2,    Data3
> >>>1,                   aa,          bb,          cc
> >>>2,                   dd,          ee,          ff
> >>>
> >>>
> >>>File B
> >>>
> >>>Row_ID_N,   Src_Row_ID,   DataN1
> >>>1a,               1,                   This is comment 1
> >>>2a,               1,                   This is comment 2
> >>>3a,               2,                   This is comment 1
> >>>4a,               1,                   This is comment 3
> >>>
> >>>And the output I am looking for is, comparing the values of Row_ID_CR
> and
> >>>Src_Row_ID
> >>>
> >>>Output
> >>>
> >>>ROW_ID_CR,    Data1,    Data2,    Data3,    DataComment1,
> >>>DataComment2,          DataComment3
> >>>1,                      aa,         bb,         cc,        This is
> >>>comment1,    This is comment2,     This is comment 3
> >>>2,                      dd,          ee,         ff,          This is
> >>>comment1
> >>>
> >>>
> >>>I am a novice R user, I am able to replicate a left join but I need a
> bit
> >>>more in the final result.
> >>>
> >>>
> >>>Thanks!!
> >>>
> >>>    [[alternative HTML version deleted]]
> >>>
> >>>______________________________________________
> >>>R-help@r-project.org mailing list
> >>>https://stat.ethz.ch/mailman/listinfo/r-help
> >>>PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> >>>and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>>
> >>
> >
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Combining CSV data

Reply via email to