Or do all the subsetting in one pass - [ will use a hashmap. Hadley
On Thu, Oct 30, 2014 at 12:05 PM, William Dunlap <wdun...@tibco.com> wrote: > You can try using an environment instead of a list. > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > > On Thu, Oct 30, 2014 at 10:02 AM, Thomas Nyberg <tomnyb...@gmail.com> wrote: >> Thanks to all for the help everyone! For the moment I'll stick with Bill's >> solution, but I'll check out the other recommendations as well. >> >> Regarding the issue of slow looks ups for lists, are there any hash map >> implementations in R that are faster? I like using fairly simple logic and >> data structures when prototyping and then only optimize code when and where >> it's necessary which is why I'm curious about these basic objects. >> >> On another note, is there a vector style implementation that changes the >> vectors in place? If I'm not mistaken, the append operation creates and >> returns a new vector each time which is line with the functional nature of >> R. If there were some way to have it mutable, it could be much faster. This >> is fairly standard in many languages. Behind the scenes memory is allocated >> at say 2 times the current size so that you only need log(n) extensions when >> building up a vector like this. Are there any such equivalents in R? I >> presume that lists are mutable (am I wrong?), but they seem to have the >> lookup slowdown problem. >> >> Again thanks a lot! >> >> Cheers, >> Thomas >> >> >> On 10/30/2014 12:05 PM, William Dunlap wrote: >>> >>> Repeatedly extending vectors takes a lot of time. You can do what you >>> want with >>> d2 <- split(values, factor(numbers, levels=unique(numbers))) >>> If you would like the labels on d2 to be in numeric order then you can >>> simplify that to >>> d3 <- split(values, numbers) >>> >>> Bill Dunlap >>> TIBCO Software >>> wdunlap tibco.com >>> >>> >>> On Thu, Oct 30, 2014 at 8:17 AM, Thomas Nyberg <tomnyb...@gmail.com> >>> wrote: >>>> >>>> Hello, >>>> >>>> I want to do the following: Given a set of (number, value) pairs, I want >>>> to >>>> create a list l so that l[[toString(number)]] returns the vector of >>>> values >>>> associated to that number. It is hundreds of times slower than the >>>> equivalent that I would write in python. I'm pretty new to R so I bet I'm >>>> using its data structures inefficiently, but I've tried more or less >>>> everything I can think of and can't really speed it up. I have done some >>>> profiling which helped me find problem areas, but I couldn't speed things >>>> up >>>> even with that information. I'm thinking I'm just fundamentally using R >>>> in a >>>> silly way. >>>> >>>> I've included code for the different versions. I wrote the python code in >>>> a >>>> style to make it as clear to R programmers as possible. Thanks a lot! Any >>>> help would be greatly appreciated! >>>> >>>> Cheers, >>>> Thomas >>>> >>>> >>>> R code (with two versions depending on commenting): >>>> >>>> ----- >>>> >>>> numbers <- numeric(0) >>>> for (i in 1:5) { >>>> numbers <- c(numbers, sample(1:30000, 10000)) >>>> } >>>> >>>> values <- numeric(0) >>>> for (i in 1:length(numbers)) { >>>> values <- append(values, sample(1:10, 1)) >>>> } >>>> >>>> starttime <- Sys.time() >>>> >>>> d = list() >>>> for (i in 1:length(numbers)) { >>>> number = toString(numbers[i]) >>>> value = values[i] >>>> if (is.null(d[[number]])) { >>>> #if (number %in% names(d)) { >>>> d[[number]] <- c(value) >>>> } else { >>>> d[[number]] <- append(d[[number]], value) >>>> } >>>> } >>>> >>>> endtime <- Sys.time() >>>> >>>> print(format(endtime - starttime)) >>>> >>>> ----- >>>> >>>> uncommented version: "45.64791 secs" >>>> commented version: "1.423056 mins" >>>> >>>> >>>> >>>> Another version of R code: >>>> >>>> ----- >>>> >>>> numbers <- numeric(0) >>>> for (i in 1:5) { >>>> numbers <- c(numbers, sample(1:30000, 10000)) >>>> } >>>> >>>> values <- numeric(0) >>>> for (i in 1:length(numbers)) { >>>> values <- append(values, sample(1:10, 1)) >>>> } >>>> >>>> starttime <- Sys.time() >>>> >>>> d = list() >>>> for (number in unique(numbers)) { >>>> d[[toString(number)]] <- numeric(0) >>>> } >>>> for (i in 1:length(numbers)) { >>>> number = toString(numbers[i]) >>>> value = values[i] >>>> d[[number]] <- append(d[[number]], value) >>>> } >>>> >>>> endtime <- Sys.time() >>>> >>>> print(format(endtime - starttime)) >>>> >>>> ----- >>>> >>>> "47.15579 secs" >>>> >>>> >>>> >>>> The python code: >>>> >>>> ----- >>>> >>>> import random >>>> import time >>>> >>>> numbers = [] >>>> for i in range(5): >>>> numbers += random.sample(range(30000), 10000) >>>> >>>> values = [] >>>> for i in range(len(numbers)): >>>> values.append(random.randint(1, 10)) >>>> >>>> starttime = time.time() >>>> >>>> d = {} >>>> for i in range(len(numbers)): >>>> number = numbers[i] >>>> value = values[i] >>>> if d.has_key(number): >>>> d[number].append(value) >>>> else: >>>> d[number] = [value] >>>> >>>> endtime = time.time() >>>> >>>> print endtime - starttime, "seconds" >>>> >>>> ----- >>>> >>>> 0.123021125793 seconds >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.