>> Dear all, >> >> I am doing a project on variant calling using R.I am working on >> pileup file.There are 10 columns in my data frame and I want to >> count the number of A,C,G and T in each row for column 9.example of >> column 9 is given below- >> >> .a,g,, >> .t,t,, >> .,c,c, >> .,a,,, >> .,t,t,t >> .c,,g,^!. >> .g,ggg.^!, >> .$,,,,,., >> a,g,,t, >> ,,,,,.,^!. >> ,$,,,,.,. >> >> This is a bit confusing for me as these characters are in one column >> and how can we scan them for each row to print number of A,C,G and T >> for each row. > > Seems a bit clunky but this does the job (first the data): >> txt <- " .a,g,, > + .t,t,, > + .,c,c, > + .,a,,, > + .,t,t,t > + .c,,g,^!. > + .g,ggg.^!, > + .$,,,,,., > + a,g,,t, > + ,,,,,.,^!. > + ,$,,,,.,." > >> txtvec <- readLines(textConnection(txt)) > > Now the clunky solution, Basically subtracts 1 from the counts of > "fragments" that result from splitting on each letter in turn. Could > be made prettier with a function that did the job. > >> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, > split="a"), length) , "-", 1)), > + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), > length) , "-", 1)), > + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), > length) , "-", 1)), > + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), > length) , "-", 1)) ) > A C G T > .a,g,, 1 0 1 0 > .t,t,, 0 0 0 2 > .,c,c, 0 2 0 0 > .,a,,, 1 0 0 0 > .,t,t,t 0 0 0 2 > .c,,g,^!. 0 1 1 0 > .g,ggg.^!, 0 0 4 0 > .$,,,,,., 0 0 0 0 > a,g,,t, 1 0 1 1 > ,,,,,.,^!. 0 0 0 0 > ,$,,,,.,. 0 0 0 0 > > Has the advantage that the input data ends up as rownames, which was a > surprise. > > If you wanted to count "A" and "a" as equivalent, then the split > argument should be "a|A" > >
>>AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT LIKE THIS. BUT CAN I COUNT . AND , ALSO USING- data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, split=".|,"), length) , "-", 1)), I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME PLACES IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN CALCULATING AND JUST SHOWING 0. >> >> >> Thanking you, >> Warm Regards >> Vikas Bansal >> Msc Bioinformatics >> Kings College London >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > West Hartford, CT > > > > > > David Winsemius, MD West Hartford, CT ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.