I am new to R, so I am sure I am making a simple mistake.  I am including 
complete information in hopes
someone can help me.

Basically my data in R looks good, I write it to a file, and every value is off 
by 1.

Here is my flow:

> str(prediction)
 Factor w/ 10 levels "0","1","2","3",..: 3 1 10 10 4 8 1 4 1 4 ...
 - attr(*, "names")= chr [1:28000] "1" "2" "3" "4" ...
> print(prediction)
    1     2     3     4     5     6     7     8     9    10    11    12    13   
 14    15    16    17    18    19    20    21    22    23 
    2     0     9     9     3     7     0     3     0     3     5     7     4   
  0     4     3     3     1     9     0     9     1     1 

ok, so it shows my values are 2, 0, 9, 9, 3 etc

# I write my file out
write(prediction, file="prediction.csv")

# look at the first 10 values
$ head -10 prediction.csv 
3 1 10 10 4
8 1 4 1 4
6 8 5 1 5
4 4 2 10 1
10 2 2 6 8
5 3 8 5 8
8 6 5 3 7
3 6 6 2 7
8 8 5 10 9
8 9 3 7 8

The complete work of what I did was as follows:

# First I load in a dataset, label the first column as a factor
> dataset <- read.csv('train.csv',head=TRUE)
> dataset$label <- as.factor(dataset$label)

# it has 42000 obs. 785 variables
> str(dataset)
'data.frame':   42000 obs. of  785 variables:
 $ label   : Factor w/ 10 levels "0","1","2","3",..: 2 1 2 5 1 1 8 4 6 4 ...
 $ pixel0  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ pixel1  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ pixel2  : int  0 0 0 0 0 0 0 0 0 0 ...
  [list output truncated]

# I make a sampling testset and trainset
> index <- 1:nrow(dataset)
> testindex <- sample(index, trunc(length(index)*30/100))
> testset <- dataset[testindex,]
> trainset <- dataset[-testindex,]

# build model, predict, view
> model  <- svm(label~., data = trainset, type="C-classification", 
> kernel="radial", gamma=0.0000001, cost=16)
> prediction <- predict(model, testset)
> tab <- table(pred = prediction, true = testset[,1])
    true
pred    0    1    2    3    4    5    6    7    8    9
   0 1210    0    3    1    0    5    7    2    5    8
   1    0 1415    2    0    2    1    0    7    5    0
   2    0    2 1127   12    3    0    2    7    2    0
   3    0    0    7 1296    0   10    0    2   15    6
   4    1    1    8    2 1201    2    4    3    5   16
   5    3    1    0   13    0 1100    3    1    2    3
   6    3    0    3    0    5    9 1263    0    1    0
   7    0    2    9    6    6    1    0 1296    1   13
   8    3    5    7   11    1    2    0    2 1190    4
   9    1    1    2    3   17    2    0    4    4 1190


Ok everything looks great up to this point..........so I try to apply my model 
to a "real" testset, which is the same format as my previous
dataset, except it does not have the label/factor column, so its 28000 obs 784 
variables:

> testset <- read.csv('test.csv',head=TRUE)
> str(testset)
'data.frame':   28000 obs. of  784 variables:
 $ pixel0  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ pixel1  : int  0 0 0 0 0 0 0 0 0 0 ...
 $ pixel2  : int  0 0 0 0 0 0 0 0 0 0 ...
  [list output truncated]

> prediction <- predict(model, testset)
> summary(prediction)
   0    1    2    3    4    5    6    7    8    9 
2780 3204 2824 2767 2771 2516 2744 2898 2736 2760 
> print(prediction)
    1     2     3     4     5     6     7     8     9    10    11    12    13   
 14    15    16    17    18    19    20    21    22    23 
    2     0     9     9     3     7     0     3     0     3     5     7     4   
  0     4     3     3     1     9     0     9     1     1 
   24    25    26    27    28    29    30    31    32    33    34    35    36   
 37    38    39    40    41    42    43    44    45    46 
    5     7     4     2     7     4     7     7     5     4     2     6     2   
  5     5     1     6     7     7     4     9     8     7 
  [list output truncated]

> write(prediction, file="prediction.csv")
$ head -10 prediction.csv 
3 1 10 10 4
8 1 4 1 4
6 8 5 1 5
4 4 2 10 1
10 2 2 6 8
5 3 8 5 8
8 6 5 3 7
3 6 6 2 7
8 8 5 10 9
8 9 3 7 8


I am obviously making a mistake.  Everything is off by a value of 1.


Can someone tell me what I am doing wrong?

Brian



        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to