Oh, if efficiency is a consideration, then my code is about 15 times as fast as Rui's: > F2 <- F1[rep(1:5,1e6),] ## 5 million rows ##Rui's > system.time({ + F2$Y1 <- +grepl("_", F2$text) + tmp <- strsplit(as.character(F2$text), "_") + tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x) + tmp <- do.call(rbind, tmp) + colnames(tmp) <- c("X1", "X2") + F2 <- cbind(F2[-3], tmp) # remove the original column + }) user system elapsed 20.072 0.625 20.786
## my version > system.time({ + wh <- grep("_",F2$text, fixed = TRUE, invert = TRUE) + F2[wh,"text"] <- paste(F2[wh,"text"],".",sep = "_") + z <- unlist(strsplit(F1$text,"_")) + F2 <- cbind(F2, matrix(z, ncol = 2, byrow = TRUE)) + F2 + }) user system elapsed 1.256 0.019 1.281 Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Tue, Sep 22, 2020 at 5:04 PM Val <valkr...@gmail.com> wrote: > Thank you all for the help! > > LMH, Yes I would like to see the alternative. I am using this for a > large data set and if the alternative is more efficient than this > then I would be happy. > > On Tue, Sep 22, 2020 at 6:25 PM Bert Gunter <bgunter.4...@gmail.com> > wrote: > > > > To be clear, I think Rui's solution is perfectly fine and probably > better than what I offer below. But just for fun, I wanted to do it without > the lapply(). Here is one way. I think my comments suffice to explain. > > > > > ## which are the non "_" indices? > > > wh <- grep("_",F1$text, fixed = TRUE, invert = TRUE) > > > ## paste "_." to these > > > F1[wh,"text"] <- paste(F1[wh,"text"],".",sep = "_") > > > ## Now strsplit() and unlist() them to get a vector > > > z <- unlist(strsplit(F1$text, "_")) > > > ## now cbind() to the data frame > > > F1 <- cbind(F1, matrix(z, ncol = 2, byrow = TRUE)) > > > F1 > > ID1 ID2 text 1 2 > > 1 A1 B1 NONE_. NONE . > > 2 A1 B1 cf_12 cf 12 > > 3 A1 B1 NONE_. NONE . > > 4 A2 B2 X2_25 X2 25 > > 5 A2 B3 fd_15 fd 15 > > >## You can change the names of the 2 columns yourself > > > > Cheers, > > Bert > > > > Bert Gunter > > > > "The trouble with having an open mind is that people keep coming along > and sticking things into it." > > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > > > > > On Tue, Sep 22, 2020 at 12:19 PM Rui Barradas <ruipbarra...@sapo.pt> > wrote: > >> > >> Hello, > >> > >> A base R solution with strsplit, like in your code. > >> > >> F1$Y1 <- +grepl("_", F1$text) > >> > >> tmp <- strsplit(as.character(F1$text), "_") > >> tmp <- lapply(tmp, function(x) if(length(x) == 1) c(x, ".") else x) > >> tmp <- do.call(rbind, tmp) > >> colnames(tmp) <- c("X1", "X2") > >> F1 <- cbind(F1[-3], tmp) # remove the original column > >> rm(tmp) > >> > >> F1 > >> # ID1 ID2 Y1 X1 X2 > >> #1 A1 B1 0 NONE . > >> #2 A1 B1 1 cf 12 > >> #3 A1 B1 0 NONE . > >> #4 A2 B2 1 X2 25 > >> #5 A2 B3 1 fd 15 > >> > >> > >> Note that cbind dispatches on F1, an object of class "data.frame". > >> Therefore it's the method cbind.data.frame that is called and the result > >> is also a df, though tmp is a "matrix". > >> > >> > >> Hope this helps, > >> > >> Rui Barradas > >> > >> > >> Às 20:07 de 22/09/20, Rui Barradas escreveu: > >> > Hello, > >> > > >> > Something like this? > >> > > >> > > >> > F1$Y1 <- +grepl("_", F1$text) > >> > F1 <- F1[c(1, 2, 4, 3)] > >> > F1 <- tidyr::separate(F1, text, into = c("X1", "X2"), sep = "_", fill > = > >> > "right") > >> > F1 > >> > > >> > > >> > Hope this helps, > >> > > >> > Rui Barradas > >> > > >> > Às 19:55 de 22/09/20, Val escreveu: > >> >> HI All, > >> >> > >> >> I am trying to create new columns based on another column string > >> >> content. First I want to identify rows that contain a particular > >> >> string. If it contains, I want to split the string and create two > >> >> variables. > >> >> > >> >> Here is my sample of data. > >> >> F1<-read.table(text="ID1 ID2 text > >> >> A1 B1 NONE > >> >> A1 B1 cf_12 > >> >> A1 B1 NONE > >> >> A2 B2 X2_25 > >> >> A2 B3 fd_15 ",header=TRUE,stringsAsFactors=F) > >> >> If the variable "text" contains this "_" I want to create an > indicator > >> >> variable as shown below > >> >> > >> >> F1$Y1 <- ifelse(grepl("_", F1$text),1,0) > >> >> > >> >> > >> >> Then I want to split that string in to two, before "_" and after "_" > >> >> and create two variables as shown below > >> >> x1= strsplit(as.character(F1$text),'_',2) > >> >> > >> >> My problem is how to combine this with the original data frame. The > >> >> desired output is shown below, > >> >> > >> >> > >> >> ID1 ID2 Y1 X1 X2 > >> >> A1 B1 0 NONE . > >> >> A1 B1 1 cf 12 > >> >> A1 B1 0 NONE . > >> >> A2 B2 1 X2 25 > >> >> A2 B3 1 fd 15 > >> >> > >> >> Any help? > >> >> Thank you. > >> >> > >> >> ______________________________________________ > >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> >> https://stat.ethz.ch/mailman/listinfo/r-help > >> >> PLEASE do read the posting guide > >> >> http://www.R-project.org/posting-guide.html > >> >> and provide commented, minimal, self-contained, reproducible code. > >> >> > >> > > >> > ______________________________________________ > >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> > http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, self-contained, reproducible code. > >> > >> ______________________________________________ > >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.