You both make good points.

Ideally, it would be nice to know WHY it works.

Without digging into too much verbiage, the system is designed to predict the outcome of certain events. The "broken" model predicts outcomes correctly much more frequently than one with the broken data withheld. So, to answer Mark's question, we say it's "better" because we see much better results with our "broken" model when applied to real-world data used for testing.

I have one theory.

The data is listed in our CSV file from newest to oldest. We are supposed to calculated a valued that is an "average" of some items. We loop through some queries to our database and increment two variables - $total_found and $total_score. The final value is simply $total_score / $total_found.

Our programmer forgot to reset both $total_score and $total_found back to zero for each record we process. So both grow.

I think that this may, in a way, be some warped form of a recency weighted score. The newer records will have a score more affected by their "contribution" to the wrongly growing totals. A record that is closer to the end of the data set will be starting with HUGE values for $total_score and $total_found, so addition of its values will have very little effect.

We've done the following so far today (Note, scores are just relative to indicate performance. Higher is better)
1) Run with "bad" data = 6.9
2) Run with "bad" data missing = 5.5
3) Run with "correct" data = ?? (We're running now, will take a few hours to compute.)


I might also try to plot the bad data. It would be interesting to see what shape it has...










On 9/7/09 1:05 PM, Mark Knecht wrote:
On Mon, Sep 7, 2009 at 12:33 PM, Noah Silverman<n...@smartmediacorp.com>  wrote:
<SNIP
So, this is really a philosophical question.  Do we:
    1) Shrug and say, "who cares", the SVM figured it out and likes that bad
data item for some inexplicable reason
    2) Tear into the math and try to figure out WHY the SVM is predicting
more accurately

Any opinions??

Thanks!

Boy, I'd sure think you'd want to know why it worked with the 'wrong'
calculations. It's not that the math is wrong, really, but rather that
it wasn't what you thought it was. I cannot see why you wouldn't want
to know why this mistake helped. Won't future project benefit?

Just my 2 cents,
Mark


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to