Dear David, Thanks for your reply.I tried your code it is running but as I mentioned in my mail,I am working on pileup file.So I used a command- mydf=read.table("Case2.pileup",fill=T,sep="\t") to read pileup file to have data frame i:e mydf.Now the problem is it has 10 columns and have to count the number of A C G T which is in 9th column. In your mail we input data like this > txt <- " .a,g,, + .t,t,, + .,c,c, + .,a,,, + .,t,t,t + .c,,g,^!. + .g,ggg.^!, + .$,,,,,., + a,g,,t, + ,,,,,.,^!. + ,$,,,,.,."
but how I should input my data(in column 9) from dataframe mydf using txt command because there are thousands of rows? Thanking you, Warm Regards Vikas Bansal Msc Bioinformatics Kings College London ________________________________________ From: David Winsemius [dwinsem...@comcast.net] Sent: Friday, July 01, 2011 11:25 PM To: Bansal, Vikas Cc: r-help@r-project.org Subject: Re: [R] For help in R coding On Jul 1, 2011, at 12:47 PM, Bansal, Vikas wrote: > Dear all, > > I am doing a project on variant calling using R.I am working on > pileup file.There are 10 columns in my data frame and I want to > count the number of A,C,G and T in each row for column 9.example of > column 9 is given below- > > .a,g,, > .t,t,, > .,c,c, > .,a,,, > .,t,t,t > .c,,g,^!. > .g,ggg.^!, > .$,,,,,., > a,g,,t, > ,,,,,.,^!. > ,$,,,,.,. > > This is a bit confusing for me as these characters are in one column > and how can we scan them for each row to print number of A,C,G and T > for each row. Seems a bit clunky but this does the job (first the data): > txt <- " .a,g,, + .t,t,, + .,c,c, + .,a,,, + .,t,t,t + .c,,g,^!. + .g,ggg.^!, + .$,,,,,., + a,g,,t, + ,,,,,.,^!. + ,$,,,,.,." > txtvec <- readLines(textConnection(txt)) Now the clunky solution, Basically subtracts 1 from the counts of "fragments" that result from splitting on each letter in turn. Could be made prettier with a function that did the job. > data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit, split="a"), length) , "-", 1)), + C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"), length) , "-", 1)), + G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"), length) , "-", 1)), + T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"), length) , "-", 1)) ) A C G T .a,g,, 1 0 1 0 .t,t,, 0 0 0 2 .,c,c, 0 2 0 0 .,a,,, 1 0 0 0 .,t,t,t 0 0 0 2 .c,,g,^!. 0 1 1 0 .g,ggg.^!, 0 0 4 0 .$,,,,,., 0 0 0 0 a,g,,t, 1 0 1 1 ,,,,,.,^!. 0 0 0 0 ,$,,,,.,. 0 0 0 0 Has the advantage that the input data ends up as rownames, which was a surprise. If you wanted to count "A" and "a" as equivalent, then the split argument should be "a|A" > Most of the rows have . and , and other symbols > but we will ignore them.I just want to run a loop with a counter > which will count the number of A,C,G and T for each row and will > give output something like this- > > > A C G T > 1 0 1 0 > 0 0 0 2 > 0 2 0 0 > 1 0 0 0 > 0 0 0 3 > > This output is for first 5 rows from the example given above. > > I am new to R can you please help me.I will be very thankful to you. > > > > Thanking you, > Warm Regards > Vikas Bansal > Msc Bioinformatics > Kings College London > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD West Hartford, CT ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.