Anton Bryl writes: > I have a question about the SpamAssassin data corpus. > > In one article published in 2003 it is written: "...corpus we adopted is > available at www.spamassassin.org. This archive contains 2100 spam and > 2107 non-spam messages." This description do not fit the present corpus. > Was there an old version with the described size and is it possible to > get it now? Thank You.
Sounds like a typo -- the README page at http://spamassassin.apache.org/publiccorpus/readme.html contains full details of the changes made to the corpus over time, and it has not changed greatly in message numbers since Oct 2002. It's worth noting that it's never contained more than about 1900 spam messages... --j.