I have a data set with about 30,000 training cases and 103 variable.
I've trained an SVM (using the e1071 package) for a binary classifier {0,1}. The accuracy isn't great.
I used a grid search over the C and G parameters with an RBF kernel to find the best settings.
I remember that for least squares, R has a nice stepwise function that will try combining subsets of variables to find the optimal result. Clearly, this doesn't exist for SVMs as a built in function.
As an experiment, I simply grabbed the first 50 variables and repeated the training/grid search procedure. The results were significantly better. Since the date is VERY noisy, my guess is that eliminating some of the variables eliminated some noise that resulted in better results.
With a grid of 100 parameter settings (10 for C, 10 for G) and 106 variables, trying every combination would be prohibitively time consuming.
Can anyone suggest an approach to seek the ideal subset of variables for my SVM classifier?
Thanks! ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.