On Tue, Nov 16, 2010 at 04:27:55PM -0800, John Hardin wrote: > On Tue, 16 Nov 2010, dar...@chaosreigns.com wrote: > > >On 11/16, John Hardin wrote: > >>I don't think you'd ever see good results in the mass checks. The > >>masscheck corpora retain spam for an extended period (several > >>months) and DoB-style rules would only hit for a few days after the > >>spam run initially sent the message. > > > >Are you telling me the mass checks use test results from when the mass > >checks are run, not when the email is received? > > Correct.
Unless of course --reuse is used, which is the recommended way. But not always possible depending on the corpus. > >>Which is not to say such rules wouldn't have value, just that they > >>can't be meaningfully evaluated by masschecks. > > > >Yeah, that would be some pretty broken behavior for the mass checks. > > How so? If masscheck only considered the rules that hit when the > message was first received (or more properly, was first added to the > corpora), how would you ever test new rules against the existing > corpora? > > > >DCC, Xen... so many things would get scored... not usefully. > > Only if and when the data expires. How long are DCC checksums kept? > And hosts stay on Zen until the ISP removes them. Network mass checks are only done once a week. That's a huge gap for getting reliable stats. I wonder if anyone has done a quick comparison with and without --reuse, it would be nice to see the difference.