Thanks again, guys! Arun's method worked. I have over 270,000 rows and it took me 1 min. Dimitri
On Sat, Jun 8, 2013 at 7:47 AM, Dimitri Liakhovitski < dimitri.liakhovit...@gmail.com> wrote: > Thank you so much, Jorge and Arun - I'll give it a try! > Dimitri > > > On Fri, Jun 7, 2013 at 11:27 PM, arun <smartpink...@yahoo.com> wrote: > >> HI, >> Tried it on 1e5 row dataset: >> >> l1<- letters[1:10] >> s1<-sapply(seq_along(l1),function(i) paste(rep(l1[i],3),collapse="")) >> set.seed(24) >> >> x1<-data.frame(x=paste(paste0(sample(s1,1e5,replace=TRUE),sample(1:15,1e5,replace=TRUE)),paste0(sample(s1,1e5,replace=TRUE),sample(1:15,1e5,replace=TRUE)),paste0(sample(s1,1e5,replace=TRUE),sample(1:15,1e5,replace=TRUE)),sep="_"),stringsAsFactors=FALSE) >> >> system.time(resNew<-data.frame(x=x1,read.table(text=gsub("[A-Za-z]","",x1[,1]),sep="_",header=FALSE),stringsAsFactors=FALSE)) >> # user system elapsed >> # 2.712 0.016 2.732 >> >> head(resNew) >> # x V1 V2 V3 >> #1 ccc12_ggg2_jjj14 12 2 14 >> #2 ccc7_ddd15_aaa11 7 15 11 >> #3 hhh12_ddd14_fff12 12 14 12 >> #4 fff11_bbb15_aaa6 11 15 6 >> #5 ggg12_ccc9_ggg8 12 9 8 >> #6 jjj8_eee12_eee4 8 12 4 >> >> A.K. >> >> >> ----- Original Message ----- >> From: arun <smartpink...@yahoo.com> >> To: Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com> >> Cc: R help <r-help@r-project.org> >> Sent: Friday, June 7, 2013 11:00 PM >> Subject: Re: [R] splitting a string column into multiple columns faster >> >> HI, >> May be this helps: >> >> >> res<-data.frame(x=x,read.table(text=gsub("[A-Za-z]","",x[,1]),sep="_",header=FALSE),stringsAsFactors=FALSE) >> res >> # x V1 V2 V3 >> #1 aaa1_bbb1_ccc3 1 1 3 >> #2 aaa2_bbb3_ccc2 2 3 2 >> #3 aaa3_bbb2_ccc1 3 2 1 >> A.K. >> >> ----- Original Message ----- >> From: Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com> >> To: r-help <r-help@r-project.org> >> Cc: >> Sent: Friday, June 7, 2013 9:24 PM >> Subject: [R] splitting a string column into multiple columns faster >> >> Hello! >> >> I have a column in my data frame that I have to split: I have to distill >> the numbers from the text. Below is my example and my solution. >> >> x<-data.frame(x=c("aaa1_bbb1_ccc3","aaa2_bbb3_ccc2","aaa3_bbb2_ccc1")) >> x >> library(stringr) >> out<-as.data.frame(str_split_fixed(x$x,"aaa",2)) >> out2<-as.data.frame(str_split_fixed(out$V2,"_bbb",2)) >> out3<-as.data.frame(str_split_fixed(out2$V2,"_ccc",2)) >> result<-cbind(x,out2[1],out3) >> result >> My problem is: >> str_split.fixed is relatively slow. In my real data frame I have over >> 80,000 rows so that it takes almost 30 seconds to run just one line (like >> out<-... above) >> And it's even slower because I have to do it step-by-step many times. >> >> Any way to do it by specifying all 3 delimiters at once >> ("aaa","_bbb","_ccc") and then split it in one swoop into a data frame >> with >> several columns? >> >> Thanks a lot for any pointers! >> >> -- >> Dimitri Liakhovitski >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > > -- > Dimitri Liakhovitski > -- Dimitri Liakhovitski [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.