Hi Arun,

Thanks for your reply. Unfortunately the Comments are just text in the real
data. There is no way to differentiate based on the value of the Comments
column. I guess because of that reason I couldn't get your solution to work
properly. Do you think I can try it for a more general case where we don't
merger/split the comments based on the values?

Thanks for your help, I appreciate!


On Mon, Jun 10, 2013 at 10:14 PM, arun <smartpink...@yahoo.com> wrote:

> HI,
> I am not sure about your DataN1 column.  If there is any identifier to
> differentiate the comments (in this case 1,2,3), then it will easier to
> place that in the correct column.
>   My previous solution is not helpful in situations like these:
> dat2<-read.table(text="
> Row_ID_N,  Src_Row_ID,  DataN1
> 1a,              1,                  This is comment 1
> 2a,              1,                  This is comment 2
> 3a,              2,                  This is comment 2
> 4a,              1,                  This is comment 3
> ",sep=",",header=TRUE,stringsAsFactors=FALSE)
> dat3<-read.table(text="
> Row_ID_N,  Src_Row_ID,  DataN1
> 1a,              1,                  This is comment 1
> 2a,              1,                  This is comment 2
> 3a,              2,                  This is comment 3
> 4a,              1,                  This is comment 3
> 5a,         2,                  This is comment 2
> ",sep=",",header=TRUE,stringsAsFactors=FALSE)
>
>
> library(stringr)
> library(plyr)
> fun1<- function(data1,data2){
>     data2$DataN1<- str_trim(data2$DataN1)
>         res<- merge(data1,data2,by.x=1,by.y=2)
>     res1<- res[,-5]
>     res2<-
> ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,DataN1=list(DataN1))
>     Mx1<- max(sapply(res2[,5],length))
>     res3<- data.frame(res2[,-5],do.call(rbind,lapply(res2[,5],function(x){
>                                   indx<-
> as.numeric(gsub("[[:alpha:]]","",x))
>                                   x[match(seq(Mx1),indx)]
>                                   })),stringsAsFactors=FALSE)
>     colnames(res3)[grep("X",colnames(res3))]<-
> paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
>     res3
>     }
> fun1(dat1,dat2)
> #  Row_ID_CR                Data1        Data2        Data3
> DataComment1
> #1         1                   aa           bb           cc This is
> comment 1
> #2         2                   dd           ee           ff
> <NA>
> #       DataComment2      DataComment3
> #1 This is comment 2 This is comment 3
> #2 This is comment 2              <NA>
>  fun1(dat1,dat3)
> #  Row_ID_CR                Data1        Data2        Data3
> DataComment1
> #1         1                   aa           bb           cc This is
> comment 1
> #2         2                   dd           ee           ff
> <NA>
> #       DataComment2      DataComment3
> #1 This is comment 2 This is comment 3
> #2 This is comment 2 This is comment 3
>
>
> A.K.
>
>
> ----- Original Message -----
> From: arun <smartpink...@yahoo.com>
> To: Shreya Rawal <rawal.shr...@gmail.com>
> Cc: R help <r-help@r-project.org>
> Sent: Monday, June 10, 2013 6:41 PM
> Subject: Re: [R] Combining CSV data
>
> Hi,
> Try this:
>
> dat1<-read.table(text="
> Row_ID_CR,  Data1,    Data2,    Data3
> 1,                  aa,          bb,          cc
> 2,                  dd,          ee,          ff
> ",sep=",",header=TRUE,stringsAsFactors=FALSE)
>
> dat2<-read.table(text="
> Row_ID_N,  Src_Row_ID,  DataN1
> 1a,              1,                  This is comment 1
> 2a,              1,                  This is comment 2
> 3a,              2,                  This is comment 1
> 4a,              1,                  This is comment 3
> ",sep=",",header=TRUE,stringsAsFactors=FALSE)
> library(stringr)
> dat2$DataN1<-str_trim(dat2$DataN1)
> res<- merge(dat1,dat2,by.x=1,by.y=2)
>  res1<-res[,-5]
> library(plyr)
>  res2<-ddply(res1,.(Row_ID_CR,Data1,Data2,Data3),summarize,
> DataN1=list(DataN1))
>  res2
>  # Row_ID_CR                Data1        Data2        Data3
> #1         1                   aa           bb           cc
> #2         2                   dd           ee           ff
> #                                                   DataN1
> #1 This is comment 1, This is comment 2, This is comment 3
> #2                                       This is comment 1
>
>
>
> res3<-data.frame(res2[,-5],t(apply(do.call(rbind,res2[,5]),1,function(x)
> {x[duplicated(x)]<-NA;x})))
>  colnames(res3)[grep("X",colnames(res3))]<-
> paste0("DataComment",gsub("[[:alpha:]]","",colnames(res3)[grep("X",colnames(res3))]))
> res3
> #  Row_ID_CR                Data1        Data2        Data3
> DataComment1
> #1         1                   aa           bb           cc This is
> comment 1
> #2         2                   dd           ee           ff This is
> comment 1
> #       DataComment2      DataComment3
> #1 This is comment 2 This is comment 3
> #2              <NA>              <NA>
>
> A.K.
>
>
> ----- Original Message -----
> From: Shreya Rawal <rawal.shr...@gmail.com>
> To: r-help@r-project.org
> Cc:
> Sent: Monday, June 10, 2013 4:38 PM
> Subject: [R] Combining CSV data
>
> Hello R community,
>
> I am trying to combine two CSV files that look like this:
>
> File A
>
> Row_ID_CR,   Data1,    Data2,    Data3
> 1,                   aa,          bb,          cc
> 2,                   dd,          ee,          ff
>
>
> File B
>
> Row_ID_N,   Src_Row_ID,   DataN1
> 1a,               1,                   This is comment 1
> 2a,               1,                   This is comment 2
> 3a,               2,                   This is comment 1
> 4a,               1,                   This is comment 3
>
> And the output I am looking for is, comparing the values of Row_ID_CR and
> Src_Row_ID
>
> Output
>
> ROW_ID_CR,    Data1,    Data2,    Data3,    DataComment1,
> DataComment2,          DataComment3
> 1,                      aa,         bb,         cc,        This is
> comment1,    This is comment2,     This is comment 3
> 2,                      dd,          ee,         ff,          This is
> comment1
>
>
> I am a novice R user, I am able to replicate a left join but I need a bit
> more in the final result.
>
>
> Thanks!!
>
>     [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to