One option would be to use Cassandra for the multi data center replication and have Kafka Consumes update the Cassandra ring in each data center.
This allows you to have active / active / active applications. It also allows you to choose a ring and map/reduce the data out of it (like with Pig) which will saturate that ring so you want to-do it in an availability zone where your application is not serving out of. Every data center has its own "commit log" thanks to the Kafka broker cluster and data center replication from a database with a Cassandra ring. This is the setup I have used before and continue to work on now. I have been meaning to start open sourcing some of this (it uses Thrift also). On Thu, Oct 31, 2013 at 8:37 AM, Muhzin <rmuh...@gmail.com> wrote: > Hi, > > We are architecting an application in AWS with multi region deployment > and were evaluating kafka as the messaging system for data replication > and activity tracking. > > The initial idea is to have the producer and broker cluster to reside in > Ireland. We wanted to know the feasibility of having a remote consumer in > Singapore. > Found a link to improve throughput for a remote consumer - > http://bit.ly/1cp7T6Z > #) Has anybody done something similar - if so, could you please shed some > light on how latency will play and some of the best practices that could be > done. > #) What would be a good way to secure the communication if it is happening > over the internet. We were thinking about SSL tunnel (using OpenVPN) as > described in http://aws.amazon.com/articles/0639686206802544. Thoughts? > > -- > BR > Muhsin >