On Jul 1, 2011, at 9:18 PM, Bansal, Vikas wrote:

Dear David,

it is showing this error-

Looks like a syntax error rather than a semantic error.


data.frame(A = unlist(lapply( lapply( sapply(mydf[,5], strsplit,
+ split="a|A"), length) , "-", 1)),C = unlist(lapply( lapply( sapply((mydf[,5], strsplit, split="c|C"),
Error: unexpected ',' in:
"data.frame(A = unlist(lapply( lapply( sapply(, strsplit,

There seems to be a missing object to the first argument of sapply...?

You should supply str(mydf[,5]) or at least see if the error occurs on mydf[1:20, 5] and supply str on that it the error persists.

--
David.

split="a|A"), length) , "-", 1)),C = unlist(lapply( lapply( sapply((mydf[,5],"
length) , "-", 1)),G = unlist(lapply( lapply( sapply((mydf[,5], strsplit, split="g|G"),
Error: unexpected ')' in "length)"
length) , "-", 1)),T = unlist(lapply( lapply( sapply(mydf[,5], strsplit, split="t|T"),
Error: unexpected ')' in "length)"

What should I do?

Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
________________________________________
From: David Winsemius [dwinsem...@comcast.net]
Sent: Saturday, July 02, 2011 2:07 AM
To: Bansal, Vikas
Subject: Re: [R] For help in R coding

On Jul 1, 2011, at 8:01 PM, Bansal, Vikas wrote:

Dear David,

Thanks for your reply.I tried your code it is running but as I
mentioned in my mail,I am working on pileup file.So I used a command-
mydf=read.table(
to read pileup file to have data frame i:e mydf.Now the problem is
it has 10 columns and have to count the number of A C G T which is
in 9th column.
In your mail we input data like this
txt <- " .a,g,,
+            .t,t,,
+            .,c,c,
+            .,a,,,
+            .,t,t,t
+            .c,,g,^!.
+            .g,ggg.^!,
+            .$,,,,,.,
+            a,g,,t,
+            ,,,,,.,^!.
+            ,$,,,,.,."

but how I should input my data from dataframe mydf using txt command
because there are thousands of rows?

Just sent mydf[ , 9] as the argument in place of testvec.


Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
________________________________________
From: David Winsemius [dwinsem...@comcast.net]
Sent: Friday, July 01, 2011 11:25 PM
To: Bansal, Vikas
Cc: r-help@r-project.org
Subject: Re: [R] For help in R coding

On Jul 1, 2011, at 12:47 PM, Bansal, Vikas wrote:

Dear all,

I am doing a project on variant calling using R.I am working on
pileup file.There are 10 columns in my data frame and I want to
count the number of A,C,G and T in each row for column 9.example of
column 9 is given below-

         .a,g,,
         .t,t,,
         .,c,c,
         .,a,,,
         .,t,t,t
         .c,,g,^!.
         .g,ggg.^!,
         .$,,,,,.,
         a,g,,t,
         ,,,,,.,^!.
         ,$,,,,.,.

This is a bit confusing for me as these characters are in one column
and how can we scan them for each row to print number of A,C,G and T
for each row.

Seems a bit clunky but this does the job (first the data):
txt <- " .a,g,,
+            .t,t,,
+            .,c,c,
+            .,a,,,
+            .,t,t,t
+            .c,,g,^!.
+            .g,ggg.^!,
+            .$,,,,,.,
+            a,g,,t,
+            ,,,,,.,^!.
+            ,$,,,,.,."

txtvec <- readLines(textConnection(txt))

Now the clunky solution, Basically subtracts 1 from the counts of
"fragments" that result from splitting on each letter in turn. Could
be made prettier with a function that did the job.

data.frame(A = unlist(lapply( lapply( sapply(txtvec, strsplit,
split="a"), length) , "-", 1)),
+ C = unlist(lapply( lapply( sapply(txtvec, strsplit, split="c"),
length) , "-", 1)),
+ G = unlist(lapply( lapply( sapply(txtvec, strsplit, split="g"),
length) , "-", 1)),
+ T = unlist(lapply( lapply( sapply(txtvec, strsplit, split="t"),
length) , "-", 1)) )
                     A C G T
.a,g,,               1 0 1 0
          .t,t,,     0 0 0 2
          .,c,c,     0 2 0 0
          .,a,,,     1 0 0 0
          .,t,t,t    0 0 0 2
          .c,,g,^!.  0 1 1 0
          .g,ggg.^!, 0 0 4 0
          .$,,,,,.,  0 0 0 0
          a,g,,t,    1 0 1 1
          ,,,,,.,^!. 0 0 0 0
          ,$,,,,.,.  0 0 0 0

Has the advantage that the input data ends up as rownames, which was a
surprise.

If you wanted to count "A" and "a" as equivalent, then the split
argument should be "a|A"


Most of the rows have      .         and      ,    and other symbols
but we will ignore them.I just want to run a loop with a counter
which will count the number of A,C,G and T for each row and will
give output something like this-


A   C   G  T
1   0   1  0
0   0   0  2
0   2   0  0
1   0   0  0
0   0   0  3

This output is for first 5 rows from the example given above.

I am new to R can you please help me.I will be very thankful to you.



Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
West Hartford, CT







David Winsemius, MD
West Hartford, CT


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to