I am new to R, so I am sure I am making a simple mistake. I am including complete information in hopes someone can help me.
Basically my data in R looks good, I write it to a file, and every value is off by 1. Here is my flow: > str(prediction) Factor w/ 10 levels "0","1","2","3",..: 3 1 10 10 4 8 1 4 1 4 ... - attr(*, "names")= chr [1:28000] "1" "2" "3" "4" ... > print(prediction) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 2 0 9 9 3 7 0 3 0 3 5 7 4 0 4 3 3 1 9 0 9 1 1 ok, so it shows my values are 2, 0, 9, 9, 3 etc # I write my file out write(prediction, file="prediction.csv") # look at the first 10 values $ head -10 prediction.csv 3 1 10 10 4 8 1 4 1 4 6 8 5 1 5 4 4 2 10 1 10 2 2 6 8 5 3 8 5 8 8 6 5 3 7 3 6 6 2 7 8 8 5 10 9 8 9 3 7 8 The complete work of what I did was as follows: # First I load in a dataset, label the first column as a factor > dataset <- read.csv('train.csv',head=TRUE) > dataset$label <- as.factor(dataset$label) # it has 42000 obs. 785 variables > str(dataset) 'data.frame': 42000 obs. of 785 variables: $ label : Factor w/ 10 levels "0","1","2","3",..: 2 1 2 5 1 1 8 4 6 4 ... $ pixel0 : int 0 0 0 0 0 0 0 0 0 0 ... $ pixel1 : int 0 0 0 0 0 0 0 0 0 0 ... $ pixel2 : int 0 0 0 0 0 0 0 0 0 0 ... [list output truncated] # I make a sampling testset and trainset > index <- 1:nrow(dataset) > testindex <- sample(index, trunc(length(index)*30/100)) > testset <- dataset[testindex,] > trainset <- dataset[-testindex,] # build model, predict, view > model <- svm(label~., data = trainset, type="C-classification", > kernel="radial", gamma=0.0000001, cost=16) > prediction <- predict(model, testset) > tab <- table(pred = prediction, true = testset[,1]) true pred 0 1 2 3 4 5 6 7 8 9 0 1210 0 3 1 0 5 7 2 5 8 1 0 1415 2 0 2 1 0 7 5 0 2 0 2 1127 12 3 0 2 7 2 0 3 0 0 7 1296 0 10 0 2 15 6 4 1 1 8 2 1201 2 4 3 5 16 5 3 1 0 13 0 1100 3 1 2 3 6 3 0 3 0 5 9 1263 0 1 0 7 0 2 9 6 6 1 0 1296 1 13 8 3 5 7 11 1 2 0 2 1190 4 9 1 1 2 3 17 2 0 4 4 1190 Ok everything looks great up to this point..........so I try to apply my model to a "real" testset, which is the same format as my previous dataset, except it does not have the label/factor column, so its 28000 obs 784 variables: > testset <- read.csv('test.csv',head=TRUE) > str(testset) 'data.frame': 28000 obs. of 784 variables: $ pixel0 : int 0 0 0 0 0 0 0 0 0 0 ... $ pixel1 : int 0 0 0 0 0 0 0 0 0 0 ... $ pixel2 : int 0 0 0 0 0 0 0 0 0 0 ... [list output truncated] > prediction <- predict(model, testset) > summary(prediction) 0 1 2 3 4 5 6 7 8 9 2780 3204 2824 2767 2771 2516 2744 2898 2736 2760 > print(prediction) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 2 0 9 9 3 7 0 3 0 3 5 7 4 0 4 3 3 1 9 0 9 1 1 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 5 7 4 2 7 4 7 7 5 4 2 6 2 5 5 1 6 7 7 4 9 8 7 [list output truncated] > write(prediction, file="prediction.csv") $ head -10 prediction.csv 3 1 10 10 4 8 1 4 1 4 6 8 5 1 5 4 4 2 10 1 10 2 2 6 8 5 3 8 5 8 8 6 5 3 7 3 6 6 2 7 8 8 5 10 9 8 9 3 7 8 I am obviously making a mistake. Everything is off by a value of 1. Can someone tell me what I am doing wrong? Brian [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.