[Bioc-devel] Merging GInteraction/GenomicInteractions ranges

Luke Klein Tue, 12 Feb 2019 11:35:07 -0800

Hello.  I am planning to develop a new package which extends the 
GenomicInteractions package.  I would like some help/advice on implementing the 
following functionality.


Consider the follow GenomicInteractions object

GenomicInteractions object with 10 interactions and 1 metadata column:
       seqnames1   ranges1     seqnames2   ranges2 |    counts
           <Rle> <IRanges>         <Rle> <IRanges> | <integer>
   [1]      chrA       1-2 ---      chrA      9-10 |         1
   [2]      chrA       1-2 ---      chrA     15-16 |         1
   [3]      chrA       3-4 ---      chrA       3-4 |         1
   [4]      chrA       5-6 ---      chrA       7-8 |         1
   [5]      chrA       5-6 ---      chrA      9-10 |         1
   [6]      chrA       7-8 ---      chrA       7-8 |         1
   [7]      chrA       7-8 ---      chrA     11-12 |         1
   [8]      chrA       7-8 ---      chrA     17-18 |         1
   [9]      chrA      9-10 ---      chrA      9-10 |         1
  [10]      chrA      9-10 ---      chrA     15-16 |         1
  -------
  regions: 8 ranges and 0 metadata columns
  seqinfo: 1 sequence from an unspecified genome; no seqlengths


Which is visually represented thusly



I would like to do the following:

1) I want to group the regions into bins of WxW (in this case, W will be 3), as 
in a quad-tree structure <https://en.wikipedia.org/wiki/Quadtree> with the 
final group being WxW (instead of 2x2).  This will involve 
        - iteratively dividing the matrix into quadrants {upper-left (0), 
upper-right (1), lower-left (2), lower-right(3)} .
        - labeling each subdivision in a new column until the final WxW 
resolution is reached.
        - sorting by the columns




GenomicInteractions object with 10 interactions and 1 metadata column:
       seqnames1   ranges1     seqnames2   ranges2 |    counts     quad1     
quad2
           <Rle> <IRanges>         <Rle> <IRanges> | <integer> <integer> 
<integer>
   [1]      chrA       1-2 ---      chrA      9-10 |         1         0        
 1
   [2]      chrA       1-2 ---      chrA     15-16 |         1         1        
 0
   [3]      chrA       3-4 ---      chrA       3-4 |         1         0        
 0
   [4]      chrA       5-6 ---      chrA       7-8 |         1         0        
 1
   [5]      chrA       5-6 ---      chrA      9-10 |         1         0        
 1
   [6]      chrA       7-8 ---      chrA       7-8 |         1         0        
 3       
   [7]      chrA       7-8 ---      chrA     11-12 |         1         0        
 3
   [8]      chrA       7-8 ---      chrA     17-18 |         1         1        
 2
   [9]      chrA      9-10 ---      chrA      9-10 |         1         0        
 3
  [10]      chrA      9-10 ---      chrA     15-16 |         1         1        
 2
  -------
  regions: 8 ranges and 0 metadata columns
  seqinfo: 1 sequence from an unspecified genome; no seqlengths


Sorting by the two columns yields what I am after.  Of course, I include the 
“quadX” column for illustration only.  Upon implementation, I would like these 
columns hidden from the user.

GenomicInteractions object with 10 interactions and 1 metadata column:
       seqnames1   ranges1     seqnames2   ranges2 |    counts     quad1     
quad2
           <Rle> <IRanges>         <Rle> <IRanges> | <integer> <integer> 
<integer>
   [1]      chrA       3-4 ---      chrA       3-4 |         1         0        
 0
   [2]      chrA       1-2 ---      chrA      9-10 |         1         0        
 1
   [3]      chrA       5-6 ---      chrA       7-8 |         1         0        
 1
   [4]      chrA       5-6 ---      chrA      9-10 |         1         0        
 1
   [5]      chrA       7-8 ---      chrA       7-8 |         1         0        
 3
   [6]      chrA       7-8 ---      chrA     11-12 |         1         0        
 3       
   [7]      chrA      9-10 ---      chrA      9-10 |         1         0        
 3
   [8]      chrA       1-2 ---      chrA     15-16 |         1         1        
 0
   [9]      chrA       7-8 ---      chrA     17-18 |         1         1        
 2
  [10]      chrA      9-10 ---      chrA     15-16 |         1         1        
 2
  -------
  regions: 8 ranges and 0 metadata columns
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

The sorting gives me the quad-tree structure, and each unique quadrant sequence 
defines the group.
  

GenomicInteractions object with 10 interactions and 1 metadata column:
       seqnames1   ranges1     seqnames2   ranges2 |    counts
           <Rle> <IRanges>         <Rle> <IRanges> | <integer>
   [1]      chrA       3-4 ---      chrA       3-4 |         1
   [2]      chrA       1-2 ---      chrA      9-10 |         1
   [3]      chrA       5-6 ---      chrA       7-8 |         1
   [4]      chrA       5-6 ---      chrA      9-10 |         1
   [5]      chrA       7-8 ---      chrA       7-8 |         1
   [6]      chrA       7-8 ---      chrA     11-12 |         1
   [7]      chrA      9-10 ---      chrA      9-10 |         1
   [8]      chrA       1-2 ---      chrA     15-16 |         1
   [9]      chrA       7-8 ---      chrA     17-18 |         1
  [10]      chrA      9-10 ---      chrA     15-16 |         1
  -------
  regions: 8 ranges and 0 metadata columns
  seqinfo: 1 sequence from an unspecified genome; no seqlengths


2) Then I would like to merge the WxW window (i.e. bin the regions), expanding 
the ranges accordingly and adding the counts..  This process will
        - ***identify all range-pairs in the same window and merge them into a 
new range pair with appropriately expanded ranges*** (this is my primary goal)
        - sum the counts for each of the aforementioned range-pairs (i have 
already figured a way to do this)



GenomicInteractions object with 5 interactions and 1 metadata column:
       seqnames1   ranges1     seqnames2   ranges2 |    counts
           <Rle> <IRanges>         <Rle> <IRanges> | <integer>
   [1]      chrA       1-6  ---      chrA      1-6  |         1
   [2]      chrA       1-6  ---      chrA      7-12 |         3
   [3]      chrA       7-12 ---      chrA      7-12 |         3
   [4]      chrA       1-6  ---      chrA     13-18 |         1
   [5]      chrA       7-12 ---      chrA     13-18 |         2
  -------
  regions: 3 ranges and 0 metadata columns
  seqinfo: 1 sequence from an unspecified genome; no seqlengths


NOTE that ranges1 and ranges2 MUST expand so that the region width is 6, though 
the counts will only change if there exists another subrange covered by this 
bin/expansion that contains a positive count.

As always, speed in a concern.

Best,

— Luke Klein
    PhD Student
    Department of Statistics
    University of California, Riverside
    lklei...@ucr.edu







_______________________________________________
Bioc-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/bioc-devel

[Bioc-devel] Merging GInteraction/GenomicInteractions ranges

Reply via email to