Hi Luke, If I understand correctly what you want to do, you can use `findOverlaps()` to create a new GInteractions object from a set of expanded ranges and the original interactions.
anchor.one <- GRanges(c("chr1", "chr1", "chr1", "chr1"), IRanges(c(10, 20, 30, 20), width=5)) anchor.two <- GRanges(c("chr1", "chr1", "chr1", "chr2"), IRanges(c(100, 200, 200, 50), width=5)) interaction_counts <- sample(1:10, 4) test <- GInteractions(anchor1 = anchor.one, anchor2 = anchor.two, counts=interaction_counts) wider_ranges <- reduce(resize(regions(test), fix = "center", width = 10)) first_ol <- findOverlaps(test, wider_ranges, use.region = "first") second_ol <- findOverlaps(test, wider_ranges, use.region = "second") new_gi <- GInteractions(anchor1 = subjectHits(first_ol), anchor2 = subjectHits(second_ol), regions = wider_ranges, counts = test$counts) This will give you a GInteractions object that's the same length as your original object. You said you already had a way of summing the counts for each group of interactions, but you could also incorporate that here. I would do it with dplyr, but I'm sure there's alternatives that introduce fewer dependencies... new_gi_info_df <- data.frame(anchor1 = subjectHits(first_ol), anchor2 = subjectHits(second_ol), counts = test$counts) %>% dplyr::group_by(anchor1, anchor2) %>% dplyr::summarise(counts = sum(counts)) best wishes, Liz On Tue, Feb 19, 2019 at 6:15 PM Luke Klein <lklei...@ucr.edu> wrote: > Thank you for the tip, Aaron! I’m working on the sorting step right now. > > One thing I’m not clear on is how to expand the ranges from one step to > the next. The way GenomicInteractions are structured, there is a Granges > object with all possible ranges, and the GInteractions object is populated > by reference to said interactions. > > What I am going to need is a new GRanges object with the new set of > (expanded) ranges, and a way to map the prior ranges to the new, wider > range. > > For your edification, I’ve included the images in this email (which is > addressed directly to you) so that you can see what I’d written in my first > question. > > Best, > > — Luke Klein > PhD Student > Department of Statistics > University of California, Riverside > lklei...@ucr.edu > > > > > > > > > On Feb 13, 2019, at 3:34 AM, Aaron Lun < > infinite.monkeys.with.keyboa...@gmail.com> wrote: > > > > Note that your visual won't show up for many (all?) of us. Nonetheless, > > I think I know what you want to do. > > > > Your task does not lend itself to vectorization, which makes it > > difficult to write efficient R code. It's not impossible, but it would > > be quite hard to read and debug, and your maintenance programmer will > > be cursing you somewhere down the line. > > > > If speed is truly a concern, I would write this code in C++. This would > > probably be several lines' worth of code: > > > > 1. Compute a pair of bin IDs for each interaction by dividing each > > anchor coordinate by the bin width and truncating the result. (You'll > > need to decide if you want to use the midpoint/start/end, etc.) > > 2. Sort the interactions by the paired bin IDs, e.g., with std::sort. > > 3. Identify each "run" of interactions with the same paired IDs. > > 4. Repeat step 1 within each run (you'll need to offset the anchor > > coordinate before dividing this time). Append the current quadrant to > > the quadrant sequence for return to R at the end of recursion. > > > > Clear, concise, and can be slapped together in less than half an hour > > with Rcpp and C++11, if you know what you're doing. > > > > -A > > > > On Tue, 2019-02-12 at 11:34 -0800, Luke Klein wrote: > >> Hello. I am planning to develop a new package which extends the > >> GenomicInteractions package. I would like some help/advice on > >> implementing the following functionality. > >> > >> Consider the follow GenomicInteractions object > >> > >> GenomicInteractions object with 10 interactions and 1 metadata > >> column: > >> seqnames1 ranges1 seqnames2 ranges2 | counts > >> <Rle> <IRanges> <Rle> <IRanges> | <integer> > >> [1] chrA 1-2 --- chrA 9-10 | 1 > >> [2] chrA 1-2 --- chrA 15-16 | 1 > >> [3] chrA 3-4 --- chrA 3-4 | 1 > >> [4] chrA 5-6 --- chrA 7-8 | 1 > >> [5] chrA 5-6 --- chrA 9-10 | 1 > >> [6] chrA 7-8 --- chrA 7-8 | 1 > >> [7] chrA 7-8 --- chrA 11-12 | 1 > >> [8] chrA 7-8 --- chrA 17-18 | 1 > >> [9] chrA 9-10 --- chrA 9-10 | 1 > >> [10] chrA 9-10 --- chrA 15-16 | 1 > >> ------- > >> regions: 8 ranges and 0 metadata columns > >> seqinfo: 1 sequence from an unspecified genome; no seqlengths > >> > >> > >> Which is visually represented thusly > >> > >> > >> > >> I would like to do the following: > >> > >> 1) I want to group the regions into bins of WxW (in this case, W will > >> be 3), as in a quad-tree structure <https://en.wikipedia.org/wiki/Qua < > https://en.wikipedia.org/wiki/Qua> > >> dtree> with the final group being WxW (instead of 2x2). This will > >> involve > >> - iteratively dividing the matrix into quadrants {upper-left > >> (0), upper-right (1), lower-left (2), lower-right(3)} . > >> - labeling each subdivision in a new column until the final WxW > >> resolution is reached. > >> - sorting by the columns > >> > >> > >> > >> > >> GenomicInteractions object with 10 interactions and 1 metadata > >> column: > >> seqnames1 ranges1 seqnames2 ranges2 > >> | counts quad1 quad2 > >> <Rle> <IRanges> <Rle> <IRanges> | <integer> > >> <integer> <integer> > >> [1] chrA 1-2 --- chrA 9-10 | 1 > >> 0 1 > >> [2] chrA 1-2 --- chrA 15-16 | 1 > >> 1 0 > >> [3] chrA 3-4 --- chrA 3-4 | 1 > >> 0 0 > >> [4] chrA 5-6 --- chrA 7-8 | 1 > >> 0 1 > >> [5] chrA 5-6 --- chrA 9-10 | 1 > >> 0 1 > >> [6] chrA 7-8 --- chrA 7-8 | 1 > >> 0 3 > >> [7] chrA 7-8 --- chrA 11-12 | 1 > >> 0 3 > >> [8] chrA 7-8 --- chrA 17-18 | 1 > >> 1 2 > >> [9] chrA 9-10 --- chrA 9-10 | 1 > >> 0 3 > >> [10] chrA 9-10 --- chrA 15-16 | 1 > >> 1 2 > >> ------- > >> regions: 8 ranges and 0 metadata columns > >> seqinfo: 1 sequence from an unspecified genome; no seqlengths > >> > >> > >> Sorting by the two columns yields what I am after. Of course, I > >> include the “quadX” column for illustration only. Upon > >> implementation, I would like these columns hidden from the user. > >> > >> GenomicInteractions object with 10 interactions and 1 metadata > >> column: > >> seqnames1 ranges1 seqnames2 ranges2 > >> | counts quad1 quad2 > >> <Rle> <IRanges> <Rle> <IRanges> | <integer> > >> <integer> <integer> > >> [1] chrA 3-4 --- chrA 3-4 | 1 > >> 0 0 > >> [2] chrA 1-2 --- chrA 9-10 | 1 > >> 0 1 > >> [3] chrA 5-6 --- chrA 7-8 | 1 > >> 0 1 > >> [4] chrA 5-6 --- chrA 9-10 | 1 > >> 0 1 > >> [5] chrA 7-8 --- chrA 7-8 | 1 > >> 0 3 > >> [6] chrA 7-8 --- chrA 11-12 | 1 > >> 0 3 > >> [7] chrA 9-10 --- chrA 9-10 | 1 > >> 0 3 > >> [8] chrA 1-2 --- chrA 15-16 | 1 > >> 1 0 > >> [9] chrA 7-8 --- chrA 17-18 | 1 > >> 1 2 > >> [10] chrA 9-10 --- chrA 15-16 | 1 > >> 1 2 > >> ------- > >> regions: 8 ranges and 0 metadata columns > >> seqinfo: 1 sequence from an unspecified genome; no seqlengths > >> > >> The sorting gives me the quad-tree structure, and each unique > >> quadrant sequence defines the group. > >> > >> > >> GenomicInteractions object with 10 interactions and 1 metadata > >> column: > >> seqnames1 ranges1 seqnames2 ranges2 | counts > >> <Rle> <IRanges> <Rle> <IRanges> | <integer> > >> [1] chrA 3-4 --- chrA 3-4 | 1 > >> [2] chrA 1-2 --- chrA 9-10 | 1 > >> [3] chrA 5-6 --- chrA 7-8 | 1 > >> [4] chrA 5-6 --- chrA 9-10 | 1 > >> [5] chrA 7-8 --- chrA 7-8 | 1 > >> [6] chrA 7-8 --- chrA 11-12 | 1 > >> [7] chrA 9-10 --- chrA 9-10 | 1 > >> [8] chrA 1-2 --- chrA 15-16 | 1 > >> [9] chrA 7-8 --- chrA 17-18 | 1 > >> [10] chrA 9-10 --- chrA 15-16 | 1 > >> ------- > >> regions: 8 ranges and 0 metadata columns > >> seqinfo: 1 sequence from an unspecified genome; no seqlengths > >> > >> > >> 2) Then I would like to merge the WxW window (i.e. bin the regions), > >> expanding the ranges accordingly and adding the counts.. This > >> process will > >> - ***identify all range-pairs in the same window and merge them > >> into a new range pair with appropriately expanded ranges*** (this is > >> my primary goal) > >> - sum the counts for each of the aforementioned range-pairs (i > >> have already figured a way to do this) > >> > >> > >> > >> GenomicInteractions object with 5 interactions and 1 metadata column: > >> seqnames1 ranges1 seqnames2 ranges2 | counts > >> <Rle> <IRanges> <Rle> <IRanges> | <integer> > >> [1] chrA 1-6 --- chrA 1-6 | 1 > >> [2] chrA 1-6 --- chrA 7-12 | 3 > >> [3] chrA 7-12 --- chrA 7-12 | 3 > >> [4] chrA 1-6 --- chrA 13-18 | 1 > >> [5] chrA 7-12 --- chrA 13-18 | 2 > >> ------- > >> regions: 3 ranges and 0 metadata columns > >> seqinfo: 1 sequence from an unspecified genome; no seqlengths > >> > >> > >> NOTE that ranges1 and ranges2 MUST expand so that the region width is > >> 6, though the counts will only change if there exists another > >> subrange covered by this bin/expansion that contains a positive > >> count. > >> > >> As always, speed in a concern. > >> > >> Best, > >> > >> — Luke Klein > >> PhD Student > >> Department of Statistics > >> University of California, Riverside > >> lklei...@ucr.edu > >> > >> > >> > >> > >> > >> > >> > >> _______________________________________________ > >> Bioc-devel@r-project.org <mailto:Bioc-devel@r-project.org> mailing list > >> https://stat.ethz.ch/mailman/listinfo/bioc-devel < > https://stat.ethz.ch/mailman/listinfo/bioc-devel> > > Hello. I am planning to develop a new package which extends the > GenomicInteractions package. I would like some help/advice on implementing > the following functionality. > > Consider the follow GenomicInteractions object > > GenomicInteractions object with 10 interactions and 1 metadata column: > seqnames1 ranges1 seqnames2 ranges2 | counts > <Rle> <IRanges> <Rle> <IRanges> | <integer> > [1] chrA 1-2 --- chrA 9-10 | 1 > [2] chrA 1-2 --- chrA 15-16 | 1 > [3] chrA 3-4 --- chrA 3-4 | 1 > [4] chrA 5-6 --- chrA 7-8 | 1 > [5] chrA 5-6 --- chrA 9-10 | 1 > [6] chrA 7-8 --- chrA 7-8 | 1 > [7] chrA 7-8 --- chrA 11-12 | 1 > [8] chrA 7-8 --- chrA 17-18 | 1 > [9] chrA 9-10 --- chrA 9-10 | 1 > [10] chrA 9-10 --- chrA 15-16 | 1 > ------- > regions: 8 ranges and 0 metadata columns > seqinfo: 1 sequence from an unspecified genome; no seqlengths > > > Which is visually represented thusly > > > > I would like to do the following: > > 1) I want to group the regions into bins of WxW (in this case, W will be > 3), as in a quad-tree structure <https://en.wikipedia.org/wiki/Quadtree> > with the final group being WxW (instead of 2x2). This will involve > - iteratively dividing the matrix into quadrants {upper-left (0), > upper-right (1), lower-left (2), lower-right(3)} . > - labeling each subdivision in a new column until the final WxW > resolution is reached. > - sorting by the columns > > > > > GenomicInteractions object with 10 interactions and 1 metadata column: > seqnames1 ranges1 seqnames2 ranges2 | counts quad1 > quad2 > <Rle> <IRanges> <Rle> <IRanges> | <integer> <integer> > <integer> > [1] chrA 1-2 --- chrA 9-10 | 1 0 > 1 > [2] chrA 1-2 --- chrA 15-16 | 1 1 > 0 > [3] chrA 3-4 --- chrA 3-4 | 1 0 > 0 > [4] chrA 5-6 --- chrA 7-8 | 1 0 > 1 > [5] chrA 5-6 --- chrA 9-10 | 1 0 > 1 > [6] chrA 7-8 --- chrA 7-8 | 1 0 > 3 > [7] chrA 7-8 --- chrA 11-12 | 1 0 > 3 > [8] chrA 7-8 --- chrA 17-18 | 1 1 > 2 > [9] chrA 9-10 --- chrA 9-10 | 1 0 > 3 > [10] chrA 9-10 --- chrA 15-16 | 1 1 > 2 > ------- > regions: 8 ranges and 0 metadata columns > seqinfo: 1 sequence from an unspecified genome; no seqlengths > > > Sorting by the two columns yields what I am after. Of course, I include > the “quadX” column for illustration only. Upon implementation, I would > like these columns hidden from the user. > > GenomicInteractions object with 10 interactions and 1 metadata column: > seqnames1 ranges1 seqnames2 ranges2 | counts quad1 > quad2 > <Rle> <IRanges> <Rle> <IRanges> | <integer> <integer> > <integer> > [1] chrA 3-4 --- chrA 3-4 | 1 0 > 0 > [2] chrA 1-2 --- chrA 9-10 | 1 0 > 1 > [3] chrA 5-6 --- chrA 7-8 | 1 0 > 1 > [4] chrA 5-6 --- chrA 9-10 | 1 0 > 1 > [5] chrA 7-8 --- chrA 7-8 | 1 0 > 3 > [6] chrA 7-8 --- chrA 11-12 | 1 0 > 3 > [7] chrA 9-10 --- chrA 9-10 | 1 0 > 3 > [8] chrA 1-2 --- chrA 15-16 | 1 1 > 0 > [9] chrA 7-8 --- chrA 17-18 | 1 1 > 2 > [10] chrA 9-10 --- chrA 15-16 | 1 1 > 2 > ------- > regions: 8 ranges and 0 metadata columns > seqinfo: 1 sequence from an unspecified genome; no seqlengths > > The sorting gives me the quad-tree structure, and each unique quadrant > sequence defines the group. > > > GenomicInteractions object with 10 interactions and 1 metadata column: > seqnames1 ranges1 seqnames2 ranges2 | counts > <Rle> <IRanges> <Rle> <IRanges> | <integer> > [1] chrA 3-4 --- chrA 3-4 | 1 > [2] chrA 1-2 --- chrA 9-10 | 1 > [3] chrA 5-6 --- chrA 7-8 | 1 > [4] chrA 5-6 --- chrA 9-10 | 1 > [5] chrA 7-8 --- chrA 7-8 | 1 > [6] chrA 7-8 --- chrA 11-12 | 1 > [7] chrA 9-10 --- chrA 9-10 | 1 > [8] chrA 1-2 --- chrA 15-16 | 1 > [9] chrA 7-8 --- chrA 17-18 | 1 > [10] chrA 9-10 --- chrA 15-16 | 1 > ------- > regions: 8 ranges and 0 metadata columns > seqinfo: 1 sequence from an unspecified genome; no seqlengths > > > 2) Then I would like to merge the WxW window (i.e. bin the regions), > expanding the ranges accordingly and adding the counts.. This process will > - ***identify all range-pairs in the same window and merge them > into a new range pair with appropriately expanded ranges*** (this is my > primary goal) > - sum the counts for each of the aforementioned range-pairs (i > have already figured a way to do this) > > > > GenomicInteractions object with 5 interactions and 1 metadata column: > seqnames1 ranges1 seqnames2 ranges2 | counts > <Rle> <IRanges> <Rle> <IRanges> | <integer> > [1] chrA 1-6 --- chrA 1-6 | 1 > [2] chrA 1-6 --- chrA 7-12 | 3 > [3] chrA 7-12 --- chrA 7-12 | 3 > [4] chrA 1-6 --- chrA 13-18 | 1 > [5] chrA 7-12 --- chrA 13-18 | 2 > ------- > regions: 3 ranges and 0 metadata columns > seqinfo: 1 sequence from an unspecified genome; no seqlengths > > > NOTE that ranges1 and ranges2 MUST expand so that the region width is 6, > though the counts will only change if there exists another subrange covered > by this bin/expansion that contains a positive count. > > As always, speed in a concern. > > Best, > > — Luke Klein > PhD Student > Department of Statistics > University of California, Riverside > lklei...@ucr.edu <mailto:lklei...@ucr.edu> > _______________________________________________ > Bioc-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/bioc-devel > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel