A followup to my own post, I believe I figured this out, but if I should be doing something different please correct:
> prediction.out <- levels(prediction)[prediction] > write(prediction.out, file="prediction.csv") This gives me my correctly adjusted values Brian On Nov 20, 2012, at 2:30 PM, Brian Feeny wrote: > I am new to R, so I am sure I am making a simple mistake. I am including > complete information in hopes > someone can help me. > > Basically my data in R looks good, I write it to a file, and every value is > off by 1. > > Here is my flow: > >> str(prediction) > Factor w/ 10 levels "0","1","2","3",..: 3 1 10 10 4 8 1 4 1 4 ... > - attr(*, "names")= chr [1:28000] "1" "2" "3" "4" ... >> print(prediction) > 1 2 3 4 5 6 7 8 9 10 11 12 13 > 14 15 16 17 18 19 20 21 22 23 > 2 0 9 9 3 7 0 3 0 3 5 7 4 > 0 4 3 3 1 9 0 9 1 1 > > ok, so it shows my values are 2, 0, 9, 9, 3 etc > > # I write my file out > write(prediction, file="prediction.csv") > > # look at the first 10 values > $ head -10 prediction.csv > 3 1 10 10 4 > 8 1 4 1 4 > 6 8 5 1 5 > 4 4 2 10 1 > 10 2 2 6 8 > 5 3 8 5 8 > 8 6 5 3 7 > 3 6 6 2 7 > 8 8 5 10 9 > 8 9 3 7 8 > > The complete work of what I did was as follows: > > # First I load in a dataset, label the first column as a factor >> dataset <- read.csv('train.csv',head=TRUE) >> dataset$label <- as.factor(dataset$label) > > # it has 42000 obs. 785 variables >> str(dataset) > 'data.frame': 42000 obs. of 785 variables: > $ label : Factor w/ 10 levels "0","1","2","3",..: 2 1 2 5 1 1 8 4 6 4 ... > $ pixel0 : int 0 0 0 0 0 0 0 0 0 0 ... > $ pixel1 : int 0 0 0 0 0 0 0 0 0 0 ... > $ pixel2 : int 0 0 0 0 0 0 0 0 0 0 ... > [list output truncated] > > # I make a sampling testset and trainset >> index <- 1:nrow(dataset) >> testindex <- sample(index, trunc(length(index)*30/100)) >> testset <- dataset[testindex,] >> trainset <- dataset[-testindex,] > > # build model, predict, view >> model <- svm(label~., data = trainset, type="C-classification", >> kernel="radial", gamma=0.0000001, cost=16) >> prediction <- predict(model, testset) >> tab <- table(pred = prediction, true = testset[,1]) > true > pred 0 1 2 3 4 5 6 7 8 9 > 0 1210 0 3 1 0 5 7 2 5 8 > 1 0 1415 2 0 2 1 0 7 5 0 > 2 0 2 1127 12 3 0 2 7 2 0 > 3 0 0 7 1296 0 10 0 2 15 6 > 4 1 1 8 2 1201 2 4 3 5 16 > 5 3 1 0 13 0 1100 3 1 2 3 > 6 3 0 3 0 5 9 1263 0 1 0 > 7 0 2 9 6 6 1 0 1296 1 13 > 8 3 5 7 11 1 2 0 2 1190 4 > 9 1 1 2 3 17 2 0 4 4 1190 > > > Ok everything looks great up to this point..........so I try to apply my > model to a "real" testset, which is the same format as my previous > dataset, except it does not have the label/factor column, so its 28000 obs > 784 variables: > >> testset <- read.csv('test.csv',head=TRUE) >> str(testset) > 'data.frame': 28000 obs. of 784 variables: > $ pixel0 : int 0 0 0 0 0 0 0 0 0 0 ... > $ pixel1 : int 0 0 0 0 0 0 0 0 0 0 ... > $ pixel2 : int 0 0 0 0 0 0 0 0 0 0 ... > [list output truncated] > >> prediction <- predict(model, testset) >> summary(prediction) > 0 1 2 3 4 5 6 7 8 9 > 2780 3204 2824 2767 2771 2516 2744 2898 2736 2760 >> print(prediction) > 1 2 3 4 5 6 7 8 9 10 11 12 13 > 14 15 16 17 18 19 20 21 22 23 > 2 0 9 9 3 7 0 3 0 3 5 7 4 > 0 4 3 3 1 9 0 9 1 1 > 24 25 26 27 28 29 30 31 32 33 34 35 36 > 37 38 39 40 41 42 43 44 45 46 > 5 7 4 2 7 4 7 7 5 4 2 6 2 > 5 5 1 6 7 7 4 9 8 7 > [list output truncated] > >> write(prediction, file="prediction.csv") > $ head -10 prediction.csv > 3 1 10 10 4 > 8 1 4 1 4 > 6 8 5 1 5 > 4 4 2 10 1 > 10 2 2 6 8 > 5 3 8 5 8 > 8 6 5 3 7 > 3 6 6 2 7 > 8 8 5 10 9 > 8 9 3 7 8 > > > I am obviously making a mistake. Everything is off by a value of 1. > > > Can someone tell me what I am doing wrong? > > Brian > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.