Hi all,

I am working with data.table objects within nested foreach loops and I am
having trouble creating the results object the way I would prefer.

Code below with sample data:

library(iterators)
library(data.table)
library(foreach)

#generate dummy data
set.seed(1212)
sample1 <- data.frame(parentid=round((runif(50000, min=1, max=50000))),
childid=round(runif(100000, min=1, max=100000)))
length(unique(sample1$parentid))

#get unique parents
sample1uniq <- as.data.frame(unique(sample1$parentid))
names(sample1uniq) <- "parentid"

#convert original dataset to data.table
sample1 <- data.table(sample1)
setkey(sample1,parentid)

#convert unique ids to data.table
sample1uniq <- data.table(sample1uniq)
setkey(sample1uniq,parentid)

#a random sample of 5K to users to scan against
sample2uniq_idx <- sample(1:nrow(sample1uniq), size=5000)
sample2uniq <- sample1uniq[sample2uniq_idx]
sample2uniq <- data.table(sample2uniq)
setkey(sample2uniq,parentid)

#construct iterators
sample1uniq_iter <- iter(sample1uniq)
sample2uniq_iter <- iter(sample2uniq)

outerresults <- foreach (x = sample1uniq_iter, .combine=rbind,
.packages=c('foreach','doParallel', 'data.table')) %dopar% {
  b <- sample1[J(x)]                          #ith parent
  b2 <- as.data.frame(b)[,2]  #ith parent's children

  foreach (y = sample2uniq_iter, .combine=rbind) %dopar% {
    c <- sample1[J(y)]                          #jth parent
    c2 <- as.data.frame(c)[,2]  #jth parent's children

    common <- length(intersect(b2, c2))

    if (common>0) {
              uni <- length(union(b2, c2))
              results <- list(u1=x, u2=y, inter=common, union=uni)
    }
  }
}

Note that all tasks can be done in parallel with no dependency issues.

I was expecting the results to come out like this (made up):

u1 u2 inter union
1  2  10      20
1  3  4        10
1  4  7        15
1  5  6        10
2  3  10      20
2  4  4        10
3  5  7        10
4  5  6        10

But they don't. Do I need to implement a different combine function? Any
other ideas/help will be appreciated.

thx

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to