On Tue, 2007-05-22 at 09:58 -0500, Troy Benjegerdes wrote: > Best case, when all the nodes, and the network is up, locking latency > shouldn't be much longer than say twice the RTT. But what really > matters, and causes all the nasty bugs that even single-master > replication systems have to deal with is the *worst case* latency. So > everything is going along fine, and then due to a surge in incoming > spam, one of your switches starts dropping 2% of the packets, and the > server holding a lock starts taking 50ms instead of 1ms to respond to an > incoming packet. > > Now your previous lock latency of 1ms could easily extend into seconds if > a couple of responses to lock requests don't get through. And your 16 > node imap cluster is now 8 times slower than a single server, instead of > 8 times faster ;)
If you're so worried about that, you could create another internal network just for replication :) > The nasty part about this for imap is that we can't ever have a UID be > handed out without *confirming* that it's been replicated to another > server before sending out the packet. Otherwise you can get in the > situation where node A sends out a new UID to a client out it's public > NIC card, while in the meantime, it's internal NIC melted so the update > never got propagated, so node B,C, and D decides "ooops, node A is > dead, we are stealing his lock", and B takes over the lock and allocates > the same UID to a different message, and now the CEO didn't get that > notice from the SEC to save all his emails. When the servers sync up again they'll notice the duplicated UID and both of the emails will be assigned a new UID to fix the situation. This conflict handling will have to be done in any case.
signature.asc
Description: This is a digitally signed message part