Hi Jesper,

On 02/19/2014 08:44 AM, Michael Lawrence wrote:
On Wed, Feb 19, 2014 at 8:39 AM, Jesper Gådin <jesper.ga...@gmail.com>wrote:

Hi Michael,

Herves suggestion will probably work for my use case, but if there are any
smoother ways it would be preferable.

The use case is as follows:

1) calculate strand specific coverage over a region from
GAlignments object (or file)

At the moment I read a file using readGAlignmentsFromBam() with tag="XS",
then filter it on "flag" and "mapq". Then I subset the
resulting GAlignments in a minus and a plus -strand object.
Then I calculate coverage by my own coverage function which uses the cigar
information in the GAlignments object. This function is the one using
cigarToRleList() at the moment. As I understand the coverage() function
from the GenomicAlignments/IRanges package does not take into account
cigars, or does it?


It does take the coverage into account; specifically to exclude the introns
from coverage. I think there's also an option to exclude deletions.

Unfortunately the man page is not easy to access (you need to do
?`coverage,GAlignments-method`), but it says:

   The methods for GAlignments and GAlignmentPairs objects do:

     coverage(grglist(x), ...)

And if you do grglist() on a GAlignments or GAlignmentPairs objects, the
ranges you get in the returned GRangesList object are calculated based
on the CIGAR.

Trust but verify. Here is how you can actually verify that coverage()
does take the CIGAR into account:

  library(RNAseqData.HNRNPC.bam.chr14)
  gal <- readGAlignments(RNAseqData.HNRNPC.bam.chr14_BAMFILES[1])
  cig_op_table <- cigarOpTable(cigar(gal))

First we pick up an alignment with an N in its CIGAR and do coverage()
on it:

  > gal_with_N <- gal[which(cig_op_table[ , "N"] != 0)[1]]

  > gal_with_N
  GAlignments with 1 alignment and 0 metadata columns:
        seqnames strand       cigar    qwidth     start       end     width
           <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
    [1]    chr14      + 55M2117N17M        72  19650072  19652260      2189
             ngap
        <integer>
    [1]         1
    ---
    seqlengths:
                    chr1                 chr10 ...                  chrY
               249250621             135534747 ...              59373566

  > coverage(gal_with_N)$chr14
  integer-Rle of length 107349540 with 5 runs
    Lengths: 19650071       55     2117       17 87697280
    Values :        0        1        0        1        0

Same thing with an alignment with an I in its CIGAR:

  > gal_with_I <- gal[which(cig_op_table[ , "I"] != 0)[1]]

  > gal_with_I
  GAlignments with 1 alignment and 0 metadata columns:
        seqnames strand       cigar    qwidth     start       end     width
           <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
    [1]    chr14      -     64M1I7M        72  19411677  19411747        71
             ngap
        <integer>
    [1]         0
    ---
    seqlengths:
                    chr1                 chr10 ...                  chrY
               249250621             135534747 ...              59373566

  > coverage(gal_with_I)$chr14
  integer-Rle of length 107349540 with 3 runs
    Lengths: 19411676       71 87937793
    Values :        0        1        0

Same thing with an alignment with a D in its CIGAR:

  > gal_with_D <- gal[which(cig_op_table[ , "D"] != 0)[1]]

  > gal_with_D
  GAlignments with 1 alignment and 0 metadata columns:
        seqnames strand       cigar    qwidth     start       end     width
           <Rle>  <Rle> <character> <integer> <integer> <integer> <integer>
    [1]    chr14      +    38M1D34M        72  19659063  19659135        73
             ngap
        <integer>
    [1]         0
    ---
    seqlengths:
                    chr1                 chr10 ...                  chrY
               249250621             135534747 ...              59373566

  > coverage(gal_with_D)$chr14
  integer-Rle of length 107349540 with 3 runs
    Lengths: 19659062       73 87690405
    Values :        0        1        0

Seeing is believing,

Cheers,
H.



I started to look at the applyPileups() in Rsamtools which I can get to
calculate coverage using cigars, but not using the strand or flag
information for filtering. That solution would start from a bam-file
instead of a GAlignments object, and sure I can do the filtering outside R.
But it would be very convenient to do it all from within R.

If there are nice solutions starting from both a GAlignments and a
bam-file it would be great! =)

/Jesper



On Tue, Feb 18, 2014 at 10:52 PM, Michael Lawrence <
lawrence.mich...@gene.com> wrote:

Hi Jesper,

Would you be willing to volunteer your use case? As Herve hinted,
cigarToRleList and friends are low-level helpers. There may be an easier
way to achieve what you want, or an opportunity to improve things.

Michael


On Mon, Feb 17, 2014 at 1:10 AM, Jesper Gådin <jesper.ga...@gmail.com>wrote:

Hi,

Have come across a cigar-vector that is problematic to process.

#load package
library(GenomicAlignments)

#load data (see attached file)
load("2014-02-17-cigarExample.rdata")

#run function cigarToRleList
cigarRle <- cigarToRleList(cigarExample)
Error in .Call2("Rle_constructor", values, lengths, check, 0L, PACKAGE =
"IRanges") :
   integer overflow while summing elements in 'lengths'

sessionInfo()
R Under development (unstable) (2013-11-13 r64209)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
  [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods
[8] base

other attached packages:
[1] GenomicAlignments_0.99.18 Rsamtools_1.15.26
[3] Biostrings_2.31.12        XVector_0.3.6
[5] GenomicRanges_1.15.26     IRanges_1.21.23
[7] BiocGenerics_0.9.3

loaded via a namespace (and not attached):
  [1] BatchJobs_1.1-1135 BBmisc_1.5         BiocParallel_0.5.8
bitops_1.0-6

  [5] brew_1.0-6         codetools_0.2-8    DBI_0.2-7
  digest_0.6.4

  [9] fail_1.2           foreach_1.4.1      iterators_1.0.6    plyr_1.8

[13] RSQLite_0.11.4     sendmailR_1.1-2    stats4_3.1.0       tools_3.1.0

[17] zlibbioc_1.9.0

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel





        [[alternative HTML version deleted]]



_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:    (206) 667-1319

_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

Reply via email to