On Tue, 21 Jun 2011 20:03:57 +0100 Dominic Benson <domi...@lenny.cus.org> wrote:
> To be fair to MySQL, these days it is pretty solid. There are > potentially dangerous configuration options, but there are in > Postgres too, and you can turn them off. Have you had a bad > experience with a recent version? No, not really, but MySQL is broken in so many ways I try to stay away from it. Many of the design flaws in http://sql-info.de/mysql/gotchas.html remain unfixed. For example, even in MySQL 5.0.5, 'select 1/0;' returns NULL. PostgreSQL more sensibly raises an exception. And while 5.0.5 no longer lets you insert '2003-02-31' into a DATE field, the INSERT command does not fail. A SELECT gives you back 0000-00-00. Hence: I do not trust MySQL with my data. (If an INSERT followed by a SELECT does not give me back exactly what I inserted, then the INSERT command *MUST FAIL* for me to trust the DB.) [...] > In the absence of writes, even MyISAM won't cause locking problems; > that said I can easily see that CDB would be faster. My question is > why does the speed matter, rather than the overall capacity? When you are scanning 5-10 million messages/day as some of our installations do, speed matters. > Surely the extra fraction of a millisecond is insignificant in the > passage of the message. Well, it's not "fractions of a millisecond". For an email with a couple of hundred tokens, it can be a couple of milliseconds. When 1000 processes doing Bayes lookups are hitting the database all at the same time, it can be more than just a couple of milliseconds. And that can be enough to require more concurrent scanning processes, more memory, etc, etc. It really does matter on busy systems. [...] > Very true. But with tiny datasets like these, it's all in memory > anyway - and given the read-almost-entirely workload, SQL replication > works rather well. Indeed, given how small and how well, it is > reasonable to have a server-local replica just as you do with CDB. True. However, CDB is more suitable for simple key/value lookups than a SQL database. For this particular data set and workload, SQL makes no sense. Even TCP or UNIX-domain socket connections to a DB server are overkill. Regards, David.