On Fri, Aug 28, 2009 at 3:44 PM, Lars Aronsson <l...@aronsson.se> wrote:
> We can try to find out which edits are reverts, assuming that the > previous edit was an act of vandalism. But that's a bad assumption. It gives both false positives and false negatives, and it gives a significant number of each. I gave examples of each above. My samples were tiny, but 38% of reverts were not reverts of vandalism, and 40% of vandalism was not reverted by a means detected by this strategy. And there is no reason to believe that the error is consistent over time, so these numbers are useless when it comes to determining whether or not the problem is increasing. That way we can conclude > which articles were vandalized and how long it took to revert > them. Your simplistic version of assuming that the previous edit was an act of vandalism makes the conclusion of "how long it took to revert" pretty obviously flawed, doesn't it? In your simplistic assumption (which is even worse than the one used by Robert), you're simply measuring the average time between edits. Any acts of vandalism which take more than one edit to find and fix are excluded. Now Robert's methodology wasn't quite that bad. It allowed for reverts separated by one or more other edits. But it had no way to detect an act of vandalism which lasted for hundreds of edits, was discovered by someone reading the text, and was removed without reference to the original edit with an edit summary such as "Barrack Obama was born in Hawaii". And these acts of vandalism are the worst. They last the longest, they do the most harm when they are read, they get the most views, etc. Any methodology which excludes them is systemically biased. _______________________________________________ foundation-l mailing list foundation-l@lists.wikimedia.org Unsubscribe: https://lists.wikimedia.org/mailman/listinfo/foundation-l