I'll give it a try,
Thanks!
-N
On 1/6/11 11:34 PM, Steve Lianoglou wrote:
Hi,
On Fri, Jan 7, 2011 at 2:10 AM, Noah Silverman<n...@smartmediacorp.com> wrote:
I have a data set with about 30,000 training cases and 103 variable.
I've trained an SVM (using the e1071 package) for a binary classifier {0,1}.
The accuracy isn't great.
I used a grid search over the C and G parameters with an RBF kernel to find
the best settings.
I remember that for least squares, R has a nice stepwise function that will
try combining subsets of variables to find the optimal result. Clearly,
this doesn't exist for SVMs as a built in function.
As an experiment, I simply grabbed the first 50 variables and repeated the
training/grid search procedure. The results were significantly better.
Since the date is VERY noisy, my guess is that eliminating some of the
variables eliminated some noise that resulted in better results.
With a grid of 100 parameter settings (10 for C, 10 for G) and 106
variables, trying every combination would be prohibitively time consuming.
Can anyone suggest an approach to seek the ideal subset of variables for my
SVM classifier?
Sounds like a job for the types of approaches found in the penalizedSVM package:
http://cran.r-project.org/web/packages/penalizedSVM/index.html
-steve
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.