On Mon, Nov 2, 2009 at 1:01 AM, Christian Perrier <bubu...@debian.org> wrote: > Quoting Lee Winter (lee.j.i.win...@gmail.com): > >> I did the most recent three months of 2009, but the density was pretty low. > > I haven't checked the wiki and I'm not online right now, but please > take care to register this in the page.
I am a little hesitant to edit the page because I don't understand the process and found no doc or howto. > >> >> > Old archives are also missing reviews, particularly a few from 2005 >> > and nearly all from 2004, not to mention older archives. >> >> So I started at the beginning (part of 1998) and went to the end of >> 2002. If I have time this week I will look at 2003-2005. > > Ditto. > >> > Please take some time to do this work. This is not that time >> > consuming: one month can be reviewed in about 10-15 minutes....even >> > less when you're used to methods for spotting spams. >> >> The work is pretty tedious and reviewing non-spam emails five time is >> extremely inefficient. Consider a solution that would allow one >> person to scan the archive to generate a list of spam targets. If the >> other four reviewers only had to review the listed spam candidates >> they would not have to waste their time reviewing non-spam. > > I'm sure the listmasters would welcome such improvements but, well, we > already have a very good tool. > > Also, restricting the list to what the first person has identified > would increase the risk of missing some spams. > > When I worked on the entire archive, I finally dropped the web > interface and used an alternative method: > > - download the list archives as mailboxes > - pass them through my CRM114 spam filter > - open them in my MUA (mutt) > - tag spam messages (being processed by CRM114, most spams are already > identified by CRM114 markers) > - bounce them to the spam report mail addresse > (report-lists...@lists.debian.org) with the following key macro: > > macro index \eL "breport-lists...@lists.debian.org\no\nq" "report as spam to > Debian lists" > > I found this much more efficient. Sounds like the beginning/foundation of an automation script. If the candidates can be found mechanically, then there is a potential tradeoff available. We have 11 years = 132 months; times 5 reviewers = 660 reviewer-months. At 10-15 min each that is 110-165 man-hours. That's a lot of manual effort. Just how important are the last few messages that would make it through a (purposfully loose) mechanical filter? If the whole mess could be 98% cleaned up with say, 5 man-hours then it would be a tremendous efficiency improvement. > Downloading list archives as mailboxes is only accessible to Debian > developers but I can provide them to people who might need them. In the '80s I spent a lot of time doing natural language processing software, so I may be more tuned up than the typical reviewer. But I find it more efficient to review the author/subject/thread indicies and inspect message content only to confirm the presence of spam in a suspect message. So offline access to the archive would not help me. -- Lee -- To UNSUBSCRIBE, email to debian-boot-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org