On Thu, 2017-11-16 at 09:15 +0000, Sebastian Arcus wrote: > On 15/11/17 18:11, Martin Gregorie wrote: > > On Wed, 2017-11-15 at 14:44 +0000, Sebastian Arcus wrote: > > </snip> > > > > I initially decided that an archive was A Good Thing to have, > > simply because retrieving mail from it should be a lot faster than > > searching through huge mail folders. This turned out to be true in > > practice: the archive currently holds 183,000 emails and a worst > > case search takes around 30 seconds to return a list of hits > > (running on a 3 GHz dual Athlon system with 4GB RAM and Fedora 25 > > as its OS). > > Thank you for the details. How do you search the archive? With grepĀ > directly on the server? > Using SQL queries.
The two main tables in the database hold e-mail addresses and messages respectively plus there are many-to-many links between the two that are implemented with a third table that holds the link type ('To' or 'From') and an additional table containing subject text - this has a one-to-many relationship with the messages. The SA plugin just looks at the From header in the message being checked and, if it finds that address in the database, sees if there are any 'To' links associated with it. If there are, then the message gets negative points. As I said, this SQL query is actually run against a database view that combines the address and link tables. Since the rows on these tables are small and the tables are indexed on address and link type, the query is very fast. If you want to know more about the archive, look here: http://www.libelle-systems.c3487738.myzen.co.uk/mailarchive/ Ignore the licensing stuff: I initially thought I might be onto a revenue source, but remarkably few people use mail archives. I should remove the license management code and open source the archive but so far haven't got round to doing that. Martin