Hi, perhaps pre-generating the list before processing would speed it up significantly. Though it may still be slower than python.
e.g. try something like: d = as.list(rep(NA,length(numbers))) rather than: d = list() Olivier. On Thu, 30 Oct 2014 11:17:59 -0400 Thomas Nyberg <tomnyb...@gmail.com> wrote: > Hello, > > I want to do the following: Given a set of (number, value) pairs, I > want to create a list l so that l[[toString(number)]] returns the > vector of values associated to that number. It is hundreds of times > slower than the equivalent that I would write in python. I'm pretty > new to R so I bet I'm using its data structures inefficiently, but > I've tried more or less everything I can think of and can't really > speed it up. I have done some profiling which helped me find problem > areas, but I couldn't speed things up even with that information. I'm > thinking I'm just fundamentally using R in a silly way. > > I've included code for the different versions. I wrote the python > code in a style to make it as clear to R programmers as possible. > Thanks a lot! Any help would be greatly appreciated! > > Cheers, > Thomas > > > R code (with two versions depending on commenting): > > ----- > > numbers <- numeric(0) > for (i in 1:5) { > numbers <- c(numbers, sample(1:30000, 10000)) > } > > values <- numeric(0) > for (i in 1:length(numbers)) { > values <- append(values, sample(1:10, 1)) > } > > starttime <- Sys.time() > > d = list() > for (i in 1:length(numbers)) { > number = toString(numbers[i]) > value = values[i] > if (is.null(d[[number]])) { > #if (number %in% names(d)) { > d[[number]] <- c(value) > } else { > d[[number]] <- append(d[[number]], value) > } > } > > endtime <- Sys.time() > > print(format(endtime - starttime)) > > ----- > > uncommented version: "45.64791 secs" > commented version: "1.423056 mins" > > > > Another version of R code: > > ----- > > numbers <- numeric(0) > for (i in 1:5) { > numbers <- c(numbers, sample(1:30000, 10000)) > } > > values <- numeric(0) > for (i in 1:length(numbers)) { > values <- append(values, sample(1:10, 1)) > } > > starttime <- Sys.time() > > d = list() > for (number in unique(numbers)) { > d[[toString(number)]] <- numeric(0) > } > for (i in 1:length(numbers)) { > number = toString(numbers[i]) > value = values[i] > d[[number]] <- append(d[[number]], value) > } > > endtime <- Sys.time() > > print(format(endtime - starttime)) > > ----- > > "47.15579 secs" > > > > The python code: > > ----- > > import random > import time > > numbers = [] > for i in range(5): > numbers += random.sample(range(30000), 10000) > > values = [] > for i in range(len(numbers)): > values.append(random.randint(1, 10)) > > starttime = time.time() > > d = {} > for i in range(len(numbers)): > number = numbers[i] > value = values[i] > if d.has_key(number): > d[number].append(value) > else: > d[number] = [value] > > endtime = time.time() > > print endtime - starttime, "seconds" > > ----- > > 0.123021125793 seconds > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html and provide commented, > minimal, self-contained, reproducible code. -- Olivier Crouzet, PhD Laboratoire de Linguistique -- EA3827 Université de Nantes Chemin de la Censive du Tertre - BP 81227 44312 Nantes cedex 3 France phone: (+33) 02 40 14 14 05 (lab.) (+33) 02 40 14 14 36 (office) fax: (+33) 02 40 14 13 27 e-mail: olivier.crou...@univ-nantes.fr http://www.lling.univ-nantes.fr/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.