On 01/08/2015 01:30 PM, peter dalgaard wrote:
If you look at the definition of %in%, you'll find that it is implemented using
match, so if we did as you suggest, I give it about three days before someone
suggests to inline the function call...
But you wouldn't bet money on that right? Because
I was thinking something like:
setequal <- function(x,y) {
xu = unique(x)
yu = unique(y)
if (length(xu) != length(yu)) { return FALSE; }
return (all( match( xu, yu, 0L ) > 0L ) )
}
This lets you fail early for cheap (skipping the allocation from the
">0L"s). Whether or not this goes fast depends
Try this out. It looks like a 2X speedup for some cases and a wash in
others. "unique" does two allocations, but skipping the "> 0L" allocation
could make up for it.
library(microbenchmark)
library(RUnit)
x = sample.int(1e4, 1e5, TRUE)
y = sample.int(1e4, 1e5, TRUE)
set_equal <- function(x, y)
Currently unique() does duplicated() internally and then extracts. One
could make a countUnique that simply counts, rather than allocate the
logical return value of duplicated(). But so much of the cost is in the
hash operation that it probably won't help much, but that might depend on
the sizes of
How about unique them both and compare the lengths? It's less work,
especially allocation.
Pete
Peter M. Haverty, Ph.D.
Genentech, Inc.
phave...@gene.com
On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard wrote:
> If you look at the definition of %in%, you'll find that it i
> why is there no setcontains()?
Several packages define is.subset(), which I am assuming is what you are
proposing, but it its arguments reversed. E.g., package:algstat has
is.subset <- function(x, y) all(x %in% y)
containsQ <- function(y, x) all(x %in% y)
and package:rje has essentially t
If you look at the definition of %in%, you'll find that it is implemented using
match, so if we did as you suggest, I give it about three days before someone
suggests to inline the function call... Readability of source code is not
usually our prime concern.
The && idea does have some merit, th