HI THIS SEEMS LITTLE BIT CONFUSING.BUT I AM USING THIS CODING AS SUGGESTED BY YOU-
df=read.table("Case2.pileup",fill=T,sep="\t",colClasses = "character") txt=df[,9] txtvec <- readLines(textConnection(txt)) vik=data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, split="a|A"), length) , "-", 1)),C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c|C"), length) , "-", 1)),G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g|G"), length) , "-", 1)),T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t|T"), length) , "-", 1)) ) THE THING IS,AT SOME PLACES IT IS CALCULATING PERFECTLY BUT AT SOME POSITIONS IT IS NOT.I AM TRYING TO FIND OUT THE SOLUTION IN BOOKS,ON THE NET BUT I DONT KNOW WHY THERE IS NOTHING RELATED TO THIS.I THINK THIS CODING SEEMS TO BE GOOD BUT I AM MISSING SOMETHING. FOR YOUR CONVENIENCE I HAVE ATTACHED MY Case2.pileup file. I AM VERY THANKFUL TO YOU AND APPRECIATE THAT YOU ARE HELPING AND TAKING YOUR PRECIOUS TIME. Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: Dennis Murphy [djmu...@gmail.com] Sent: Saturday, July 02, 2011 8:22 PM To: r-help@r-project.org Cc: Bansal, Vikas; David Winsemius Subject: Re: [R] For help in R coding Hi: There seems to be a problem if the string ends in , or . , which makes it difficult for strsplit() to pick up if it is splitting on those characters. Here is an alternative, splitting on individual characters and using charmatch() instead: charsum <- function(s, char) { u <- strsplit(s, "") sum(sapply(u, function(x) charmatch(x, char)), na.rm = TRUE) } unname(sapply(txtvec, function(x) charsum(x, ','))) unname(sapply(txtvec, function(x) charsum(x, '.'))) Putting this into a data frame, dfout <- data.frame(periods = unname(sapply(txtvec, function(x) charsum(x, '.'))), commas = unname(sapply(txtvec, function(x) charsum(x, '.'))) ) txtvec HTH, Dennis On Sat, Jul 2, 2011 at 10:19 AM, David Winsemius <dwinsem...@comcast.net> wrote: > > On Jul 2, 2011, at 12:34 PM, Bansal, Vikas wrote: > >> >> >>>> Dear all, >>>> >>>> I am doing a project on variant calling using R.I am working on >>>> pileup file.There are 10 columns in my data frame and I want to >>>> count the number of A,C,G and T in each row for column 9.example of >>>> column 9 is given below- >>>> >>>> .a,g,, >>>> .t,t,, >>>> .,c,c, >>>> .,a,,, >>>> .,t,t,t >>>> .c,,g,^!. >>>> .g,ggg.^!, >>>> .$,,,,,., >>>> a,g,,t, >>>> ,,,,,.,^!. >>>> ,$,,,,.,. >>>> >>>> This is a bit confusing for me as these characters are in one column >>>> and how can we scan them for each row to print number of A,C,G and T >>>> for each row. >>> >>> Seems a bit clunky but this does the job (first the data): >>>> >>>> txt <- " .a,g,, >>> >>> + .t,t,, >>> + .,c,c, >>> + .,a,,, >>> + .,t,t,t >>> + .c,,g,^!. >>> + .g,ggg.^!, >>> + .$,,,,,., >>> + a,g,,t, >>> + ,,,,,.,^!. >>> + ,$,,,,.,." >>> >>>> txtvec <- readLines(textConnection(txt)) >>> >>> Now the clunky solution, Basically subtracts 1 from the counts of >>> "fragments" that result from splitting on each letter in turn. Could >>> be made prettier with a function that did the job. >>> >>>> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >>> >>> split="a"), length) , "-", 1)), >>> + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), >>> length) , "-", 1)), >>> + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), >>> length) , "-", 1)), >>> + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), >>> length) , "-", 1)) ) >>> A C G T >>> .a,g,, 1 0 1 0 >>> .t,t,, 0 0 0 2 >>> .,c,c, 0 2 0 0 >>> .,a,,, 1 0 0 0 >>> .,t,t,t 0 0 0 2 >>> .c,,g,^!. 0 1 1 0 >>> .g,ggg.^!, 0 0 4 0 >>> .$,,,,,., 0 0 0 0 >>> a,g,,t, 1 0 1 1 >>> ,,,,,.,^!. 0 0 0 0 >>> ,$,,,,.,. 0 0 0 0 >>> >>> Has the advantage that the input data ends up as rownames, which was a >>> surprise. >>> >>> If you wanted to count "A" and "a" as equivalent, then the split >>> argument should be "a|A" >>> >>> >> >>>> AS YOU MENTIONED THAT IF I WANT TO COUNT A AND a I SHOULD SPLIT LIKE >>>> THIS. >> >> BUT CAN I COUNT . AND , ALSO USING- >> data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, >> split=".|,"), length) , "-", 1)), >> >> I TRIED IT BUT ITS NOT WORKING.IT IS GIVING THE OUTPUT BUT AT SOME PLACES >> IT IS SHOWING MORE NUMBER OF . AND , AND SOMEWHERE IT IS NOT EVEN >> CALCULATING AND JUST SHOWING 0. > > You need to use valid regex expressions for 'split'. Since "." and "," are > special characters they need to be escaped when you wnat the literals to be > recognized as such. > > I haven't figured out why but you need to drop the final operation of > subtracting 1 from the values when counting commas: > > data.frame(periods = unlist(lapply( lapply( sapply(txtvec, strsplit, > split="\\."), length) , "-", 1)) > ,commas = unlist( lapply( sapply(txtvec, strsplit, > split="\\,"), length) ) ) > periods commas > .a,g,, 1 3 > .t,t,, 1 3 > .,c,c, 1 3 > .,a,,, 1 4 > .,t,t,t 1 4 > .c,,g,^!. 1 4 > .g,ggg.^!, 2 2 > .$,,,,,., 2 6 > a,g,,t, 0 4 > ,,,,,.,^!. 1 7 > ,$,,,,.,. 1 7 > > -- > > David Winsemius, MD > West Hartford, CT > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.