Predicting whilst confused is unlikely to produce sound predictions... my vote is for finding out why before believing anything.
>>> Noah Silverman <n...@smartmediacorp.com> 09/07/09 8:33 PM >>> Hi, I have a strange one for the group. We have a system that predicts probabilities using a fairly standard svm (e1017). We are looking at probabilities of a binary outcome. The input data is generated by a perl script that calculates a bunch of things, fetches data from a database, etc. We train the system on 30,000 examples and then test the system on an unseen set of 5,000 records. The "real world" results on the test set looked VERY good. We were really happy with our model. The, we noticed that there was a big error in our data generation script and one of the values (an average of sorts.) was being calculated incorrectly. (The perl script failed to clear two iterators, so they both grew with every record.) As an quick experiment, we removed that item from our data set and re-ran the process. The results were not very good. Perhaps 75% as good as training with the "wrong" factor included. So, this is really a philosophical question. Do we: 1) Shrug and say, "who cares", the SVM figured it out and likes that bad data item for some inexplicable reason 2) Tear into the math and try to figure out WHY the SVM is predicting more accurately Any opinions?? Thanks! ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}} ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.