Hello,

I've been using a pre-release version of R v 2.8.0 for Windows for the last 
couple months.  I think that there have been consistent problems with 
subsetting data sets, but I had usually been able to find work-arounds or was 
unable to confirm this as a bug.  I think now I have, and would love advice on 
what to do if I've made some error.

The data set in question ("c") has 500,000 observations and 44 variables.  The 
problematic variable, "month," takes integer values 1:12, and all are present 
in the data set:

> unique(c$month)
 [1] 11 10  9  8 12  1  7  4  6  2  5  3

However, I can't select observations of c for certain values of month:

> c[c$month==11,]
 [1] STATE        DISTRICT     TALUK        VILLAGE      TYPE         SERIALNO  
   INTDATE      QH101P      
 [9] QH114        QH115A1      QH115B1      QH115C1      QH115A2      QH115B2   
   QH115C2      QH115A3     
[17] QH115B3      QH115C3      QH115A4      QH115B4      QH115C4      QH115A5   
   QH115B5      QH115C5     
[25] QH116        QH117A1      QH117B1      QH117C1      QH117A2      QH117B2   
   QH117C2      QH117A3     
[33] QH117B3      QH117C3      QH117A4      QH117B4      QH117C4      QH117A5   
   QH117B5      QH117C5     
[41] phase        year         month        stdistid.rch
<0 rows> (or 0-length row.names)

I get the same result for c[c[,43]==11,], and 

> length(c$month[c$month==11])
[1] 0

This is true for most values of month (1,2,4,5,7,8,10,11), but the multiples of 
3 work, apparently correctly.

Other variables do not have this problem (the columns shift in the email, but 
these three observations have month=11):

> c[c$STATE==11,][1:3,]
      STATE DISTRICT TALUK VILLAGE TYPE SERIALNO INTDATE QH101P QH114 QH115A1 
QH115B1 QH115C1 QH115A2 QH115B2 QH115C2 QH115A3 QH115B3
87556    11        2     1       1    1        5    1187      6     0       0   
    0       0       0       0       0       0       0
87557    11        2     1       1    1       10    1187      3     0       0   
    0       0       0       0       0       0       0
87558    11        2     1       1    1       14    1187      5     0       0   
    0       0       0       0       0       0       0
      QH115C3 QH115A4 QH115B4 QH115C4 QH115A5 QH115B5 QH115C5 QH116 QH117A1 
QH117B1 QH117C1 QH117A2 QH117B2 QH117C2 QH117A3 QH117B3 QH117C3
87556       0       0       0       0       0       0       0     0       0     
  0       0       0       0       0       0       0       0
87557       0       0       0       0       0       0       0     0       0     
  0       0       0       0       0       0       0       0
87558       0       0       0       0       0       0       0     0       0     
  0       0       0       0       0       0       0       0
      QH117A4 QH117B4 QH117C4 QH117A5 QH117B5 QH117C5 phase year month 
stdistid.rch
87556       0       0       0       0       0       0     1 1998    11         
1102
87557       0       0       0       0       0       0     1 1998    11         
1102
87558       0       0       0       0       0       0     1 1998    11         
1102

The data set is called directly from a csv file, where all variables should be 
stored in the same way, and using as.numeric(as.character(c$month)) does not 
help.  Nor does restarting R, restarting the computer, or trying the operation 
on smaller subsets of c.  I'd appreciate any help you an provide.

Sincerely,
Alan Cohen

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to