On Aug 2, 2009, at 7:02 PM, Noah Silverman wrote:

Hi,

It seems as if the problem was caused by an odd quirk of the "scale"
function.

Some of my data have NA entries.

So, I substitute 0 for any NA with:
rawdata[is.na(rawdata)] <- 0

Perhaps this would have done what you intended:

rawdata[is.na(rawdata), ] <- 0

# But this is added _only_ as a matter of coding behavior. See below.


I then scale the data.

For some reason that I don't understand, I find some NA back in the data
after the scale command.
But, issuing the same 0 substitution AFTER the scale command makes
everything work again.
rawdata[is.na(rawdata)] <- 0

It "works" because rawdata has been converted by scale() to a matrix which can be accessed as a vector.



The notion of adding zeroes for NA seems "so wrong". And the idea that you might get the same results of doing so before scale() as after scale() seems additionally bizarre.



VERY strange behavior.


Your behavior might be seen as VERY strange by some.

--
D


-N

On 8/2/09 3:57 PM, J Dougherty wrote:
On Sunday 02 August 2009 02:34:43 pm Noah Silverman wrote:

The column names have to obfuscated, but here are 10 rows of the data.

label   c0      c1      c2      c3      c4      c5      c6      c7      c8      
c9      c10     c11     c12     c13
c14     c15     c16     c17     c18     c19     c20     c21     c22     c23     
c24     c25     c26     c27
c28     c29     c30     c31     c32     c33     c34     c35     c36     c37     
c38     c39     c40     c41
c42     c43     c44     c45     c46     c47     c48     c49     c50     c51     
c52     c53     c54     c55
c56     c57     c58     c59     c60     c61     c62     c63     c64     c65     
c66
sick    2008-12-28_1    95.609  5       3.3     1.35    0       1       35      
9.6666  0       0
0.0833  1       0.0833  1       0.1428  7       3       2.035714286     6.5     
94.8481
53.846  12      -4.69   1.25    0.5062  0.0522  0.1808  3       0.5126  0.0694
0.2061  94.9288         8.3125  0.0247  7.5833  9.3     35      9.6666  0       0
0.0833  1       0.0833  1       0.1428  7       3       2.035714286     6.5     
94.8481
53.846  12      -4.69   1.25    0.5062  0.0522  0.1808  3       0.5126  0.0694
0.2061  94.9288         8.3125  0.0247  7.5833  9.3
well    2008-12-28_1    95.338  1       11      3.2     3       2       11      
7.0277  0.0555  2
0.1666 6 0.1666 5 0.238 18 11 2.541666667 2.022727273 94.7733
38.461  36      6.07    7.5555  0.5928  0.0955  0.2871  0       0.5434  0.0679
0.2283  95.9003         5.1736  0.0847  7.3333  28      11      7.0277  0.0555  
2
0.1666 6 0.1666 5 0.238 18 11 2.541666667 2.022727273 94.7733
38.461  36      6.07    7.5555  0.5928  0.0955  0.2871  0       0.5434  0.0679
0.2283  95.9003         5.1736  0.0847  7.3333  28
well    2008-12-28_1    95.204  2       7.4     2.75    4       1       22      
8.4545  0       0
0       0       0       0       0       6       4       2.791666667     2.5625  
94.8444         61.538  11      2.84
3.0909  0.5693  0.0641  0.2738  0       0.5874  0.1011  0.2803  94.9769
8.1363  0.0467  5.4545  10      22      8.4545  0       0       0       0       
0       0       0       6       4
2.791666667 2.5625 94.8444 61.538 11 2.84 3.0909 0.5693 0.0641 0.2738 0 0.5874 0.1011 0.2803 94.9769 8.1363 0.0467 5.4545 10
sick    2008-12-28_1    95.204  14      48
        0       3       25      8.7045  0.0909  4       0.2045  9       0.2045  
4       0.2666  11      8
4.409090909     0       95.0006         15.384  44      1.76    7.409   0.4475  
0.0285
0.1206  0       0.5094  0.058   0.1931  92.9455         7.2613  0.0532  4.5227
82      25      8.7045  0.0909  4       0.2045  9       0.2045  4       0.2666  
11      8
4.409090909     0       95.0006         15.384  44      1.76    7.409   0.4475  
0.0285
0.1206 0 0.5094 0.058 0.1931 92.9455 7.2613 0.0532 4.5227 82
well    2008-12-28_1    95.07   13      26
        1       1       11      8.1     0.0666  2       0.1666  5       0.1666  
0       0       21      16
2.571428571     1.984375        94.825  30.769  30      -4.69   -0.7999         
0.5166
0.0624  0.2078  0       0.5306  0.0792  0.2398  95.2282         7.575   0.0715
3.4333  44      11      8.1     0.0666  2       0.1666  5       0.1666  0       
0       21      16
2.571428571     1.984375        94.825  30.769  30      -4.69   -0.7999         
0.5166
0.0624  0.2078  0       0.5306  0.0792  0.2398  95.2282         7.575   0.0715
3.4333  44
well    2008-12-28_1    95.07   9       16
        0       4       39      9.4117  0       0       0.0588  1       0.0588  
0       0       3       25      3.916666667
2.96 94.8177 30.769 17 -20.84 -15.8234 0.8205 0.3333 0.6666 0
0.6054  0.1287  0.3292  95.3232         6.9117  0.076   2.647   16      39
9.4117  0       0       0.0588  1       0.0588  0       0       3       25      
3.916666667     2.96
94.8177         30.769  17      -20.84  -15.8234        0.8205  0.3333  0.6666  0
0.6054  0.1287  0.3292  95.3232         6.9117  0.076   2.647   16
sick    2008-12-28_1    94.936  6       11
        4       1       28      7.725   0.075   3       0.125   5       0.125   
0       0       6       2       4       1.75
94.7815         46.153  40      6.07    12.5    0.5014  0.0621  0.1972  6       
0.523
0.0742  0.2035  95.794  6.0625  0.046   7.25    12      28      7.725   0.075   
3
0.125 5 0.125 0 0 6 2 4 1.75 94.7815 46.153 40 6.07 12.5
0.5014  0.0621  0.1972  6       0.523   0.0742  0.2035  95.794  6.0625
0.046   7.25    12
well    2008-12-28_1    94.803  11      13
        0       5       35      7.125   0.0937  3       0.1562  5       0.1562  
5       0.2     18      17
1.555555556     2.794117647     95.0398         38.461  32      10.38   8.4063  
0.5804
0.0871  0.2627  1       0.558   0.0738  0.2324  92.4367         5.289   0.0722
9.125   16      35      7.125   0.0937  3       0.1562  5       0.1562  5       
0.2     18      17
1.555555556     2.794117647     95.0398         38.461  32      10.38   8.4063  
0.5804
0.0871 0.2627 1 0.558 0.0738 0.2324 92.4367 5.289 0.0722 9.125 16
well    2008-12-28_1    94.67   4       38
        5       1       11      8.9642  0.0357  1       0.1428  4       0.1428  
4       0.2105  11      13
3.772727273     4.307692308     94.8451         23.076  28      -5.76   -4      
0.3269  0
0.0833  0       0.5222  0.0616  0.2079  94.9668         8.6696  0.0663  4.6428
14      11      8.9642  0.0357  1       0.1428  4       0.1428  4       0.2105  
11      13
3.772727273     4.307692308     94.8451         23.076  28      -5.76   -4      
0.3269  0
0.0833 0 0.5222 0.0616 0.2079 94.9668 8.6696 0.0663 4.6428 14
well    2008-12-28_1    94.537  12      39
0 1 35 9.4444 0 0 0 0 0 0 0 2 7 2.5 2.892857143 94.878 23.076 9 -12.23 -9.6666 0.4428 0 0.0857 0 0.5411 0.0849 0.25
94.54   8.9166  0.0296  6.1111  67      35      9.4444  0       0       0       
0       0       0       0
2 7 2.5 2.892857143 94.878 23.076 9 -12.23 -9.6666 0.4428 0
0.0857  0       0.5411  0.0849  0.25    94.54   8.9166  0.0296  6.1111  67



Your initial post mentions 70 columns in your data table, yet the example shows 67 counting the initial "labels" term in the header. I would suggest adding "row.names = NULL" to force row numbers and see how that behaves, e.g.

rawdata<- read.table("r_work/train_data.csv", header=T, sep=",",
                        na.strings=0, row.names = NULL)

Otherwise, you might want to consult the R Manual where it states:

header a logical value indicating whether the file contains the names of the variables as its first line. If missing, the value is determined from the file format: header is set to TRUE if and only if the first row contains one
                fewer field than the number of columns.

So, you might also want to count up your column names in the header line.

JWDougherty


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to