Re: [R] Search for common character strings within a column

arun Fri, 12 Apr 2013 13:39:22 -0700

Hi,
May be this helps:
Not sure how you wanted to select those two letters.



dat1<- read.table(text="
   Seq,Output
 A B B C D A C,Yes
 B C A C B D A C,Yes
C D A A C D,No
",sep=",",header=TRUE,stringsAsFactors=FALSE)
library(stringr)
lapply(str_split(str_trim(dat1$Seq)," ")[dat1$Output=="Yes"],function(x) 
{x1<-t(combn(x,2)); apply(x1,1,paste0,collapse="")})
#[[1]]
# [1] "AB" "AB" "AC" "AD" "AA" "AC" "BB" "BC" "BD" "BA" "BC" "BC" "BD" "BA" "BC"
#[16] "CD" "CA" "CC" "DA" "DC" "AC"

#[[2]]
# [1] "BC" "BA" "BC" "BB" "BD" "BA" "BC" "CA" "CC" "CB" "CD" "CA" "CC" "AC" "AB"
#[16] "AD" "AA" "AC" "CB" "CD" "CA" "CC" "BD" "BA" "BC" "DA" "DC" "AC"

res<- sapply(str_split(str_trim(dat1$Seq)," ")[dat1$Output=="Yes"],function(x) 
{x1<-t(combn(x,2)); x2<-table(apply(x1,1,paste0,collapse="")); 
x2[which.max(x2)]})
res
#BC BC 
# 4  4
 
dat1$MaxCombn<-NA
 dat1$MaxCombn[dat1$Output=="Yes"]<- names(res)
 dat1
#               Seq Output MaxCombn
#1    A B B C D A C    Yes       BC
#2  B C A C B D A C    Yes       BC
#3      C D A A C D     No     <NA>
A.K.


>I have a dataset (data) that consists of two columns: Seq and output. 
Each entry in Seq is a combination of As,Bs,Cs and Ds and ranges from 5 – >30 
characters in length. Each sequence is associated with an output of 
either yes or no such that: 
>
 >     Seq                        Output 
>(1) A B B C D A C              Yes 
>(2) B C A C B D A C    Yes 
>(3) C D A A C D                No 
>
>etc, etc. 
>
>I want to find which 2 letter (A B, A C, A D, etc) strings are 
most associated with each output. Essentially I want to find which 2 
letter combinations >occur most frequently in the column Seq, when the 
output is Yes. I’m new to R and can’t figure out a solution to this 
problem. 
>
>Any help greatly appreciated! 
>
>Cheers, 
>
>AB

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Search for common character strings within a column

Reply via email to