Gary,

I'm trying to introduce the idea of a MySQL backend to Timo over at Dovecot. He has done a little work in that direction already. But - I'm throwing this idea out there right now just to get people thinking. I'm hoping that in the next year as people think this through that some serious development will occur. I think that as people say AH HA that development will progress.

Gary W. Smith wrote:

Marc,

 

We have had to approach this in a similar fashion.  We have large volume email accounts under cyrus as well as a custom spam filtering system (behind SA).  Here is the approach we did.

 

We have cyrus setup on multiple partitions based upon the directories.  This allows us to upgrade individual sets of directories based on load.  Though this approach isn’t the best it works well.  We have over 500gb on a single server.

 

We have had a problem with spam, just like everyone else.  The spam no longer hits many of our user accounts.  Instead it is inserted into a database and they are sent a daily digest (or they can look it up).  We started with a simple set of tables which in testing grew very large (5gb) with our test set.  In production this would have been 100gb.  We only retain 15 days…

 

To accomplish this we looking into splitting up the data just like we did for cyrus.  We broke that single table down into x tables (x being defined as a tweakable number – for prod we use 200).  We use random allocation to put an email into one of the tables.  This becomes important as the data is separated from some basic information which allows us to keep these files on x number of spindles or network devices and managed in a much simpler fashion.

 

We have been looking at imap based on db’s as their backends and are still in the air on them as they don’t meet all of our requirements right now (in their stable form) but going forward I think that SQL emails might become our designed transport.  Our SQL servers for handling this are clustered machines, each with about 600gb disk space, under linux-ha and DRBD.  This is also then replicated to a matching offsite database cluster.

 

I believe that there is a use for a technology focused more around databases (actually there are some right now just very specific to themselves and not really configurable) that will replace existing named systems (such as uw-imap and cyrus).  I would guess that these tools themselves might start that implementation within themselves (hint hint) so we don’t have to turn to the alternative imap systems.

 

Anyway, this stuff exists and some of us use certain concepts already applied.  Implementation is simple in many cases.

 

 


From: Marc Perkel [mailto:[EMAIL PROTECTED]]
Sent: Friday, June 09, 2006 1:19 PM
To: users@spamassassin.apache.org
Subject: The Future of Email is SQL

 

After considerable experimenting and thinking things through I thought 
I'd start a thread on the future of email to start planting the seeds of 
where MTA development needs to go. I'm convinced that someday soon we 
will all realize that MBOX and MAILDIR are obsolete technologies and 
that the future is going to be SQL based storage.
 
First - before everyone starts screaming about speed comparisons, I'm 
not going to go there. Every storage technology has it's advantages and 
disadvantages but I'm just going to say that SQL based mail storage is 
fast enough. The advantages of SQL has to do with power and not with 
speed. Those who would choose it would do so because they want to do new 
things that you can do with a database and can't do without one.
 
SQL has several advantages. You don't have t deal with the quirks of the 
underlying file system or OS. It takes care of all the locking issues 
and indexing and makes it so that multiple applications can seamlessly 
access the data. With an SQL backend email can be stored from the MTA, 
read from and IMAP client that accesses the same database, and the spam 
filtering engine will have access to the stored email as well.
 
To give you some examples of what could be done .....
 
Suppose a spammer sends 1000 phishing spams to your users and then you 
figure out that the 1000 spams already delivered is spam. With a 
database you can do a query to retroactively delete spam that was 
already delivered to the mailboxes. This could also be used to 
retroactively delete viruses already delivered.
 
Spam filtering programs can lookup existing email in existing folders 
and compare it with new email already deliverd to help determine more 
accurately if a message is spam or not. For example, if the host server 
has a reputation for 100% ham then it can deliver new email without 
running it through Spam Assassin. If programs like Spamassassin can 
access existing email in existing folders it can evaluate new email 
using tricks no one has yet considered.
 
SQL databases allow for multiple masters and slaves and replication that 
lets you create a cluster that never fails under any conditions. It 
would be far easier to create a system that is always on and always 
backed up.
 
An SQL backend allows you to use a wide variety of tools, programming 
languages, operating systems in order for you to easily integrate more 
easily than non database systems.
 
And - this is important - once you have a database then new things that 
no one has yet thought of will be possible and new things we've never 
heard of will be developed because the new power will lend to the 
development of more tricks than you can do without database power.
 
My point here is - think outside the box. I'm going to be lobbying IMAP 
server developers to include SQL backends. exim could pipe data into a 
local delivery agent, or it can have features written to write directly 
to the SQL backend.
 
Thoughts ..... ?
 
 
-- 
## List details at http://www.exim.org/mailman/listinfo/exim-users 
## Exim details at http://www.exim.org/
## Please use the Wiki with this list - http://www.exim.org/eximwiki/
 


Reply via email to