On Mon, Nov 2, 2015 at 6:39 PM, Hervé Pagès <hpa...@fredhutch.org> wrote:
> Hi, > > @Pete: > > 2a- I would just compare pairs of adjacent elements, taking > advantage of the fact that <= is vectorized and cheap. So something > like: > > setMethod("is.unsorted", "Vector", > function(x, na.rm=FALSE, strictly=FALSE) > { > if (length(x) <= 1L) > return(FALSE) > x1 <- head(x, n=-1) > x2 <- tail(x, n=-1) > if (strictly) > return(any(x1 >= x2)) > any(x1 > x2) > } > ) > > Since this will work on any Vector derivative for which <= and > subsetting are defined, it's a good candidate for being the default > "is.unsorted" method for Vector objects. I'll add it to S4Vectors. > > 2b- The semantic of is.unsorted() on a GRangesList object or any > List object in general should be sapply(x, is.unsorted), for > consistency with order(), sort(), etc: > > > sort(IntegerList(4:3, 1:-2)) > IntegerList of length 2 > [[1]] 3 4 > [[2]] -2 -1 0 1 > > I'll add this too. > > 2c - That won't be needed. The default method for Vector objects will > work on RangedSummarizedExperiment objects (<= and 1D subsetting are > defined and along the same dimension). > > @Gabe: > > See ?`GenomicRanges-comparison` for how the order of genomic ranges > is defined. > > @Michael: > > Calling base::.gt() in a loop sounds indeed very inefficient. What > about having base::is.unsorted() do the above on "objects" instead? > base::.gt() seems to also require that the object is subsettable so > the requirements are the same. > > Then we wouldn't need the default "is.unsorted" method for Vector > objects, only a default "anyNA" method for Vector objects that always > returns FALSE (plus some specific ones for Rle and other Vector > derivatives that support NAs). > > Yea, I assumed it did what you suggested before looking at it. It would be the right place to fix this. > Thanks, > H. > > > On 11/02/2015 05:35 PM, Michael Lawrence wrote: > >> The notion of sortedness is already formally defined, which is why we have >> an order method, etc. >> >> The base is.unsorted implementation for "objects" ends up calling >> base::.gt() for each adjacent pair of elements, which is likely too slow >> to >> be practical, so we probably should add a custom method. >> >> This does bring up the tangential question of whether GenomicRanges should >> have an anyNA method that returns FALSE (and similarly an is.na() >> method), >> although we have never defined the concept of a "missing range". >> >> Michael >> >> On Mon, Nov 2, 2015 at 4:55 PM, Gabe Becker <becker.g...@gene.com> wrote: >> >> Pete, >>> >>> What does sorted mean for granges? If the starts are sorted but the ends >>> aren't does that count? What if only the ends are but the ranges are on >>> the >>> negative strand? >>> >>> Do we consider seqlevels to be ordinal in the order the levels are >>> returned >>> from seqlevels ()? That usually makes sense, but does it always? >>> >>> In essence I'm asking if sortedness is a well enough defined term for an >>> is.sorted method to make sense. >>> >>> Best, >>> ~G >>> On Nov 2, 2015 4:27 PM, "Peter Hickey" <peter.hic...@gmail.com> wrote: >>> >>> Hi all, >>>> >>>> I sometimes want to test whether a GRanges object (or some object with >>>> a GRanges slot, e.g., a SummarizedExperiment object) is (un)sorted. >>>> There is no is.unsorted,GRanges-method or, rather, it defers to >>>> is.unsorted,ANY-method. I'm unsure that deferring to the >>>> is.unsorted,ANY-method is what is really desired when a user calls >>>> is.unsorted on a GRanges object, and it will certainly return a >>>> (possibly unrelated) warning - "In is.na(x) : is.na() applied to >>>> non-(list or vector) of type 'S4'". >>>> >>>> >>>> For this reason, I tend to use is.unsorted(order(x)) when x is a >>>> GRanges object. This workaround is also used, for example, by minfi >>>> ( >>>> https://github.com/kasperdanielhansen/minfi/blob/master/R/blocks.R#L121 >>>> >>> ). >>> >>>> However, this is slow because it essentially sorts the object to test >>>> whether it is already sorted. >>>> >>>> >>>> So, to my questions: >>>> >>>> 1. Have I overlooked a fast way to test whether a GRanges object is >>>> >>> sorted? >>> >>>> 2a. Could a is.unsorted,GenomicRanges-method be added to the >>>> GenomicRanges package? Side note, I'm unsure at which level to define >>>> this method, e.g., GRanges vs. GenomicRanges. >>>> 2b. Is it possible to have a sensible definition and implementation >>>> for is.unsorted,GRangesList-method? >>>> 2c. Could a is.unsorted,RangedSummarizedExperiment-method be added to >>>> the SummarizedExperiment package? >>>> >>>> I started working on a patch for 2a/2c, but wanted to ensure I hadn't >>>> overlooked something obvious. Also, I'm sure 2a/2b/2c could be written >>>> much more efficiently at the C-level but I'm afraid this might be a >>>> bit beyond my abilities to integrate nicely with the existing code. >>>> >>>> Thanks, >>>> Pete >>>> >>>> _______________________________________________ >>>> Bioc-devel@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>>> >>>> >>> [[alternative HTML version deleted]] >>> >>> _______________________________________________ >>> Bioc-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/bioc-devel >>> >>> >> [[alternative HTML version deleted]] >> >> _______________________________________________ >> Bioc-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/bioc-devel >> >> > -- > Hervé Pagès > > Program in Computational Biology > Division of Public Health Sciences > Fred Hutchinson Cancer Research Center > 1100 Fairview Ave. N, M1-B514 > P.O. Box 19024 > Seattle, WA 98109-1024 > > E-mail: hpa...@fredhutch.org > Phone: (206) 667-5791 > Fax: (206) 667-1319 > [[alternative HTML version deleted]] _______________________________________________ Bioc-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/bioc-devel