On 02/20/2014 04:16 PM, Michael Lawrence wrote:
There's also "?coverage(ga)", so the user can see what happens
specifically for their object, without worrying about typing out the class.
I was never lucky with this:
> library(rtracklayer)
> path_to_gff <- system.file("tests", "v1.gff", package = "rtracklayer")
> ?import(path_to_gff)
Error in .helpForCall(topicExpr, parent.frame()) :
no documentation for function ‘import’ and signature ‘con =
"character", format = "ANY", text = "ANY"’
In addition: Warning message:
In .helpForCall(topicExpr, parent.frame()) :
no method defined for function ‘import’ and signature ‘con =
"character", format = "ANY", text = "ANY"’
H.
At some point it would be neat to generate S4 documentation at run-time.
Just popup a browser and see every method that might be dispatched with
a given object. In theory, the R help server would support this.
On Thu, Feb 20, 2014 at 3:13 PM, Hervé Pagès <hpa...@fhcrc.org
<mailto:hpa...@fhcrc.org>> wrote:
On 02/20/2014 02:55 PM, Martin Morgan wrote:
On 02/20/2014 02:32 PM, Hervé Pagès wrote:
Hi Jesper,
On 02/20/2014 02:13 PM, Jesper Gådin wrote:
Very true that it is quite difficult to find the
documentation when one
is not aware of its existence :P
Yeah, this has been a source of frustration for many people. And
always a source of embarrassment (for us) when teaching our
software
to beginners.
I've started to change this. In the upcoming version of BioC
(2.14,
scheduled for mid-April), when you'll do ?coverage, you'll
get to
choose between the 3 man pages that document coverage
methods (there
is one in IRanges, one in GenomicRanges, and one in
GenomicAlignments).
I want to generalize this to other generics that have
methods spread
across several packages (e.g. findOverlaps, the intra- and
inter-range
methods, etc...).
Having to choose between several man pages every time you do
e.g.
?findOverlaps is a minor annoyance compared to not being able to
find the man page at all. (Note that if you already know
where is
your favorite man page, you'll be able to direct access it with
e.g. ?GenomicRanges::findOverlaps). Nobody will ever need to use
the insane
?`findOverlaps,GenomicRanges,__GenomicRanges-method` to
tab completion helps, so that you don't need to be totally
insane, just
insane enough to know to start with
?"cover<tab>
and also insane enough to know which method you need to pick up amongst
the 30+ methods listed by ?"findOverlaps<tab>
Overwhelming!
H.
Martin
open that man page again. Ever! (it will still work though...)
Cheers,
H.
coverage() is fast and beautiful. Thanks!
/Jesper
On Wed, Feb 19, 2014 at 9:21 PM, Hervé Pagès
<hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
<mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>> wrote:
Hi Jesper,
On 02/19/2014 08:44 AM, Michael Lawrence wrote:
On Wed, Feb 19, 2014 at 8:39 AM, Jesper Gådin
<jesper.ga...@gmail.com
<mailto:jesper.ga...@gmail.com>
<mailto:jesper.ga...@gmail.com
<mailto:jesper.ga...@gmail.com>__>>wrote:
Hi Michael,
Herves suggestion will probably work for my
use case, but if
there are any
smoother ways it would be preferable.
The use case is as follows:
1) calculate strand specific coverage over
a region from
GAlignments object (or file)
At the moment I read a file using
readGAlignmentsFromBam()
with tag="XS",
then filter it on "flag" and "mapq". Then I
subset the
resulting GAlignments in a minus and a plus
-strand object.
Then I calculate coverage by my own
coverage function which
uses the cigar
information in the GAlignments object. This
function is the
one using
cigarToRleList() at the moment. As I
understand the
coverage() function
from the GenomicAlignments/IRanges package
does not take
into account
cigars, or does it?
It does take the coverage into account;
specifically to exclude
the introns
from coverage. I think there's also an option
to exclude
deletions.
Unfortunately the man page is not easy to access
(you need to do
?`coverage,GAlignments-method`____), but it says:
The methods for GAlignments and GAlignmentPairs
objects do:
coverage(grglist(x), ...)
And if you do grglist() on a GAlignments or
GAlignmentPairs
objects, the
ranges you get in the returned GRangesList object
are calculated
based
on the CIGAR.
Trust but verify. Here is how you can actually
verify that
coverage()
does take the CIGAR into account:
library(RNAseqData.HNRNPC.bam.____chr14)
gal <-
readGAlignments(RNAseqData.____HNRNPC.bam.chr14_BAMFILES[1])
cig_op_table <- cigarOpTable(cigar(gal))
First we pick up an alignment with an N in its
CIGAR and do
coverage()
on it:
> gal_with_N <- gal[which(cig_op_table[ , "N"]
!= 0)[1]]
> gal_with_N
GAlignments with 1 alignment and 0 metadata columns:
seqnames strand cigar qwidth
start end
width
<Rle> <Rle> <character> <integer>
<integer> <integer>
<integer>
[1] chr14 + 55M2117N17M 72
19650072 19652260
2189
ngap
<integer>
[1] 1
---
seqlengths:
chr1 chr10 ...
chrY
249250621 135534747 ...
59373566
> coverage(gal_with_N)$chr14
integer-Rle of length 107349540 with 5 runs
Lengths: 19650071 55 2117 17
87697280
Values : 0 1 0 1
0
Same thing with an alignment with an I in its CIGAR:
> gal_with_I <- gal[which(cig_op_table[ , "I"]
!= 0)[1]]
> gal_with_I
GAlignments with 1 alignment and 0 metadata columns:
seqnames strand cigar qwidth
start end
width
<Rle> <Rle> <character> <integer>
<integer> <integer>
<integer>
[1] chr14 - 64M1I7M 72
19411677 19411747
71
ngap
<integer>
[1] 0
---
seqlengths:
chr1 chr10 ...
chrY
249250621 135534747 ...
59373566
> coverage(gal_with_I)$chr14
integer-Rle of length 107349540 with 3 runs
Lengths: 19411676 71 87937793 <tel:71%2087937793>
Values : 0 1 0
Same thing with an alignment with a D in its CIGAR:
> gal_with_D <- gal[which(cig_op_table[ , "D"]
!= 0)[1]]
> gal_with_D
GAlignments with 1 alignment and 0 metadata columns:
seqnames strand cigar qwidth
start end
width
<Rle> <Rle> <character> <integer>
<integer> <integer>
<integer>
[1] chr14 + 38M1D34M 72
19659063 19659135
73
ngap
<integer>
[1] 0
---
seqlengths:
chr1 chr10 ...
chrY
249250621 135534747 ...
59373566
> coverage(gal_with_D)$chr14
integer-Rle of length 107349540 with 3 runs
Lengths: 19659062 73 87690405
Values : 0 1 0
Seeing is believing,
Cheers,
H.
I started to look at the applyPileups() in
Rsamtools which I
can get to
calculate coverage using cigars, but not
using the strand or
flag
information for filtering. That solution
would start from a
bam-file
instead of a GAlignments object, and sure I
can do the
filtering outside R.
But it would be very convenient to do it
all from within R.
If there are nice solutions starting from
both a GAlignments
and a
bam-file it would be great! =)
/Jesper
On Tue, Feb 18, 2014 at 10:52 PM, Michael
Lawrence <
lawrence.mich...@gene.com <mailto:lawrence.mich...@gene.com>
<mailto:lawrence.michael@gene.__com
<mailto:lawrence.mich...@gene.com>>> wrote:
Hi Jesper,
Would you be willing to volunteer your
use case? As
Herve hinted,
cigarToRleList and friends are
low-level helpers. There
may be an easier
way to achieve what you want, or an
opportunity to
improve things.
Michael
On Mon, Feb 17, 2014 at 1:10 AM, Jesper
Gådin
<jesper.ga...@gmail.com
<mailto:jesper.ga...@gmail.com>
<mailto:jesper.ga...@gmail.com
<mailto:jesper.ga...@gmail.com>__>>wrote:
Hi,
Have come across a cigar-vector
that is problematic
to process.
#load package
library(GenomicAlignments)
#load data (see attached file)
load("2014-02-17-cigarExample.____rdata")
#run function cigarToRleList
cigarRle <-
cigarToRleList(cigarExample)
Error in .Call2("Rle_constructor",
values, lengths,
check, 0L, PACKAGE =
"IRanges") :
integer overflow while summing
elements in
'lengths'
sessionInfo()
R Under development (unstable)
(2013-11-13 r64209)
Platform: x86_64-unknown-linux-gnu
(64-bit)
locale:
[1] LC_CTYPE=en_US.UTF-8
LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8
LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8
LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8
LC_NAME=C
[9] LC_ADDRESS=C
LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8
LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats graphics
grDevices utils
datasets methods
[8] base
other attached packages:
[1] GenomicAlignments_0.99.18
Rsamtools_1.15.26
[3] Biostrings_2.31.12
XVector_0.3.6
[5] GenomicRanges_1.15.26
IRanges_1.21.23
[7] BiocGenerics_0.9.3
loaded via a namespace (and not
attached):
[1] BatchJobs_1.1-1135 BBmisc_1.5
BiocParallel_0.5.8
bitops_1.0-6
[5] brew_1.0-6
codetools_0.2-8
DBI_0.2-7
digest_0.6.4
[9] fail_1.2 foreach_1.4.1
iterators_1.0.6 plyr_1.8
[13] RSQLite_0.11.4 sendmailR_1.1-2
stats4_3.1.0 tools_3.1.0
[17] zlibbioc_1.9.0
___________________________________________________
Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org>
<mailto:Bioc-devel@r-project.__org
<mailto:Bioc-devel@r-project.org>> mailing list
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
[[alternative HTML version deleted]]
___________________________________________________
Bioc-devel@r-project.org
<mailto:Bioc-devel@r-project.org>
<mailto:Bioc-devel@r-project.__org
<mailto:Bioc-devel@r-project.org>>
mailing list
https://stat.ethz.ch/mailman/____listinfo/bioc-devel
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel>
<https://stat.ethz.ch/mailman/__listinfo/bioc-devel
<https://stat.ethz.ch/mailman/listinfo/bioc-devel>>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
<mailto:hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
<tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
<tel:%28206%29%20667-1319>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fhcrc.org <mailto:hpa...@fhcrc.org>
Phone: (206) 667-5791 <tel:%28206%29%20667-5791>
Fax: (206) 667-1319 <tel:%28206%29%20667-1319>
--
Hervé Pagès
Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024
E-mail: hpa...@fhcrc.org
Phone: (206) 667-5791
Fax: (206) 667-1319
_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel