[R] table() generating NAs when there are no NAs in the underlying data

James Savage Mon, 20 May 2013 01:46:22 -0700

Hi all,
Just a quick question:
I want to generate a column of counts of a particular variable. The easiest way 
seems to be using table(). For reasonably small amounts of data, there seems to 
be no problem.
C <- data.frame(A1 = sample(1:1000, 100000, replace = TRUE), B1 = 
sample(1:1000, 100000, replace = TRUE))
C$countC <- table(C$A1)[C$A1]
summary(C$countC)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
 65      94     101     101     108     132


However, if I'm building a table from a larger set (note that now I'm sampling 
from 1:10k, rather than 1:1k), it generates NAs, despite there being no NAs in 
the data I'm building the table from:
C <- data.frame(A1 = sample(1:10000, 100000, replace = TRUE), B1 = 
sample(1:10000, 100000, replace = TRUE))
C$countC <- table(C$A1)[C$A1]
summary(C$A1)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  1    2512    5005    5008    7502   10000

summary(C$countC)
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's
1.00    8.00   10.00   10.18   12.00   25.00       7
Note that if you cannot replicate this on your computer, try increasing the 
size of the set to sample from (setting it at 1000000 did the trick for a 
colleague of mine).

The problem appears not to occur if the data are not in a data-frame.
A <- sample(1:10000, 1000000, replace = TRUE)
summary(table(as.factor(A))[A])
Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
57      94     101     101     108     144

It seems to only have thrown NAs only for the last few categories (in the case 
sampling from 100k, only 99998, 99999 and 100000). That makes it manageable, 
but definitely not ideal.
I also posted this question to Stack Overflow, and users there have contributed 
a work-around. However, I would like to know why table() is exhibiting this 
behaviour.

Cheers, Jim



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] table() generating NAs when there are no NAs in the underlying data

Reply via email to