LinkedIn uses the first method for cross DC mirroring. For the second method, there are 2 main issues. (1) Kafka depends on the ZK service to be always available. For a ZK cluster to be available, you need a majority of ZK servers to be up. If you set up a ZK cluster spanning only 2 data centers, a single DC failure may make the ZK cluster unavailable. You can set up a ZK cluster spanning 3 or more DCs, which allows you tolerate at least 1 DC failure. (2) Long network latency across DCs. In order for the follow to keep up with the leader in a different DC, you need to tune parameters like replica.lag.max.messages, replica.lag.time.max.ms, and replica.socket.receive.buffer.bytes to amortize the long network latency.
Thanks, Jun On Sat, Jun 29, 2013 at 10:50 AM, Yu, Libo <libo...@citi.com> wrote: > The first method may lose message if cluster A is permanently down or > cannot > restart right away as B always lags behind A. Even with mirroring, B has > to wait > to get missing msg until A is back. So it is not ideal. What type of > solution did > you use at linkedin? > > Regards, > > Libo > > > -----Original Message----- > From: Joel Koshy [mailto:jjkosh...@gmail.com] > Sent: Friday, June 28, 2013 8:59 PM > To: users@kafka.apache.org > Subject: Re: failover strategy > > The second method (replication across DCs) is not recommended. > The first set up would work provided the set of topics you are mirroring > from A->B is disjoint from the set of topics you are mirroring from B->A > (i.e., to avoid a mirroring loop). > > Joel > > On Fri, Jun 28, 2013 at 5:29 PM, Yu, Libo <libo...@citi.com> wrote: > > Hi, > > > > I can think of two failover strategies. I am not sure which one is the > right way to go. > > > > First method. set up kafka server A on cluster 1 and set up another > server B on cluster 2. > > The two clusters are in different data centers. Use customized > > mirrormaker to sync between the two servers. Use one server in > > production and use the other one as contingency. If server A is down, > server B will be used (this can be transparent to publishers/consumers). > > There may be a lag between the two servers before server A is down . > > But after A is back, the customized mirrormaker can sync the two. And > > eventually B will have all the data A had before the failure. > > > > Second method. Set up one kafka server using cluster 1 and cluster 2. > > When creating a topic , always use two replications. For each > > partition, assign one replication to a broker in cluster 1 and assign > > the other replication to a broker in cluster 2. So kafka will handle the > syncing and failover for the two clusters. Is that a right (expected) way > to use kafka? > > > > > > Regards, > > > > Libo > > >