Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread Hervé Pagès
On 01/08/2015 01:30 PM, peter dalgaard wrote: If you look at the definition of %in%, you'll find that it is implemented using match, so if we did as you suggest, I give it about three days before someone suggests to inline the function call... But you wouldn't bet money on that right? Because

Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread Peter Haverty
I was thinking something like: setequal <- function(x,y) { xu = unique(x) yu = unique(y) if (length(xu) != length(yu)) { return FALSE; } return (all( match( xu, yu, 0L ) > 0L ) ) } This lets you fail early for cheap (skipping the allocation from the ">0L"s). Whether or not this goes fast depends

Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread Peter Haverty
Try this out. It looks like a 2X speedup for some cases and a wash in others. "unique" does two allocations, but skipping the "> 0L" allocation could make up for it. library(microbenchmark) library(RUnit) x = sample.int(1e4, 1e5, TRUE) y = sample.int(1e4, 1e5, TRUE) set_equal <- function(x, y)

Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread Michael Lawrence
Currently unique() does duplicated() internally and then extracts. One could make a countUnique that simply counts, rather than allocate the logical return value of duplicated(). But so much of the cost is in the hash operation that it probably won't help much, but that might depend on the sizes of

Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread Peter Haverty
How about unique them both and compare the lengths? It's less work, especially allocation. Pete Peter M. Haverty, Ph.D. Genentech, Inc. phave...@gene.com On Thu, Jan 8, 2015 at 1:30 PM, peter dalgaard wrote: > If you look at the definition of %in%, you'll find that it i

Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread William Dunlap
> why is there no setcontains()? Several packages define is.subset(), which I am assuming is what you are proposing, but it its arguments reversed. E.g., package:algstat has is.subset <- function(x, y) all(x %in% y) containsQ <- function(y, x) all(x %in% y) and package:rje has essentially t

Re: [Rd] setequal: better readability, reduced memory footprint, and minor speedup

2015-01-08 Thread peter dalgaard
If you look at the definition of %in%, you'll find that it is implemented using match, so if we did as you suggest, I give it about three days before someone suggests to inline the function call... Readability of source code is not usually our prime concern. The && idea does have some merit, th