Thanks again, guys!
Arun's method worked. I have over 270,000 rows and it took me 1 min.
Dimitri


On Sat, Jun 8, 2013 at 7:47 AM, Dimitri Liakhovitski <
dimitri.liakhovit...@gmail.com> wrote:

> Thank you so much, Jorge and Arun - I'll give it a try!
> Dimitri
>
>
> On Fri, Jun 7, 2013 at 11:27 PM, arun <smartpink...@yahoo.com> wrote:
>
>> HI,
>> Tried it on 1e5 row dataset:
>>
>> l1<- letters[1:10]
>> s1<-sapply(seq_along(l1),function(i) paste(rep(l1[i],3),collapse=""))
>> set.seed(24)
>>
>> x1<-data.frame(x=paste(paste0(sample(s1,1e5,replace=TRUE),sample(1:15,1e5,replace=TRUE)),paste0(sample(s1,1e5,replace=TRUE),sample(1:15,1e5,replace=TRUE)),paste0(sample(s1,1e5,replace=TRUE),sample(1:15,1e5,replace=TRUE)),sep="_"),stringsAsFactors=FALSE)
>>
>> system.time(resNew<-data.frame(x=x1,read.table(text=gsub("[A-Za-z]","",x1[,1]),sep="_",header=FALSE),stringsAsFactors=FALSE))
>> #   user  system elapsed
>> #  2.712   0.016   2.732
>>
>> head(resNew)
>> #                  x V1 V2 V3
>> #1  ccc12_ggg2_jjj14 12  2 14
>> #2  ccc7_ddd15_aaa11  7 15 11
>> #3 hhh12_ddd14_fff12 12 14 12
>> #4  fff11_bbb15_aaa6 11 15  6
>> #5   ggg12_ccc9_ggg8 12  9  8
>> #6   jjj8_eee12_eee4  8 12  4
>>
>> A.K.
>>
>>
>> ----- Original Message -----
>> From: arun <smartpink...@yahoo.com>
>> To: Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com>
>> Cc: R help <r-help@r-project.org>
>> Sent: Friday, June 7, 2013 11:00 PM
>> Subject: Re: [R] splitting a string column into multiple columns faster
>>
>> HI,
>> May be this helps:
>>
>>
>> res<-data.frame(x=x,read.table(text=gsub("[A-Za-z]","",x[,1]),sep="_",header=FALSE),stringsAsFactors=FALSE)
>> res
>> #               x V1 V2 V3
>> #1 aaa1_bbb1_ccc3  1  1  3
>> #2 aaa2_bbb3_ccc2  2  3  2
>> #3 aaa3_bbb2_ccc1  3  2  1
>> A.K.
>>
>> ----- Original Message -----
>> From: Dimitri Liakhovitski <dimitri.liakhovit...@gmail.com>
>> To: r-help <r-help@r-project.org>
>> Cc:
>> Sent: Friday, June 7, 2013 9:24 PM
>> Subject: [R] splitting a string column into multiple columns faster
>>
>> Hello!
>>
>> I have a column in my data frame that I have to split: I have to distill
>> the numbers from the text. Below is my example and my solution.
>>
>> x<-data.frame(x=c("aaa1_bbb1_ccc3","aaa2_bbb3_ccc2","aaa3_bbb2_ccc1"))
>> x
>> library(stringr)
>> out<-as.data.frame(str_split_fixed(x$x,"aaa",2))
>> out2<-as.data.frame(str_split_fixed(out$V2,"_bbb",2))
>> out3<-as.data.frame(str_split_fixed(out2$V2,"_ccc",2))
>> result<-cbind(x,out2[1],out3)
>> result
>> My problem is:
>> str_split.fixed is relatively slow. In my real data frame I have over
>> 80,000 rows so that it takes almost 30 seconds to run just one line (like
>> out<-... above)
>> And it's even slower because I have to do it step-by-step many times.
>>
>> Any way to do it by specifying all 3 delimiters at once
>> ("aaa","_bbb","_ccc") and then split it in one swoop into a data frame
>> with
>> several columns?
>>
>> Thanks a lot for any pointers!
>>
>> --
>> Dimitri Liakhovitski
>>
>>     [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
>
>
> --
> Dimitri Liakhovitski
>



-- 
Dimitri Liakhovitski

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to