Re: [R] Confused - better empirical results with error in data

Noah Silverman Mon, 07 Sep 2009 13:22:41 -0700

You both make good points.

Ideally, it would be nice to know WHY it works.

Without digging into too much verbiage, the system is designed topredict the outcome of certain events. The "broken" model predictsoutcomes correctly much more frequently than one with the broken datawithheld. So, to answer Mark's question, we say it's "better" because wesee much better results with our "broken" model when applied toreal-world data used for testing.


I have one theory.

The data is listed in our CSV file from newest to oldest. We aresupposed to calculated a valued that is an "average" of some items. Weloop through some queries to our database and increment two variables -$total_found and $total_score. The final value is simply $total_score /$total_found.

Our programmer forgot to reset both $total_score and $total_found backto zero for each record we process. So both grow.

I think that this may, in a way, be some warped form of a recencyweighted score. The newer records will have a score more affected bytheir "contribution" to the wrongly growing totals. A record that iscloser to the end of the data set will be starting with HUGE values for$total_score and $total_found, so addition of its values will have verylittle effect.

We've done the following so far today (Note, scores are just relativeto indicate performance. Higher is better)

1) Run with "bad" data = 6.9
2) Run with "bad" data missing = 5.5

3) Run with "correct" data = ?? (We're running now, will take a fewhours to compute.)

I might also try to plot the bad data. It would be interesting to seewhat shape it has...











On 9/7/09 1:05 PM, Mark Knecht wrote:

On Mon, Sep 7, 2009 at 12:33 PM, Noah Silverman<n...@smartmediacorp.com>  wrote:
<SNIP

So, this is really a philosophical question.  Do we:
    1) Shrug and say, "who cares", the SVM figured it out and likes that bad
data item for some inexplicable reason
    2) Tear into the math and try to figure out WHY the SVM is predicting
more accurately

Any opinions??

Thanks!

Boy, I'd sure think you'd want to know why it worked with the 'wrong'
calculations. It's not that the math is wrong, really, but rather that
it wasn't what you thought it was. I cannot see why you wouldn't want
to know why this mistake helped. Won't future project benefit?

Just my 2 cents,
Mark


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Confused - better empirical results with error in data

Reply via email to