If you have a kafka partition that is replicated to 3 nodes the partition varies (in time) thus making the colocation pointless. You can only produce and consume to/from the leader.
/svante 2015-11-12 9:00 GMT+01:00 Young, Ben <ben.yo...@sungard.com>: > Hi, > > Any thoughts on this? Perhaps Kafka is not the best way to go for this, > but the docs do mention transaction/replication logs as a use case, and I'd > have thought locality would have been important for that? > > Thanks, > Ben > > -----Original Message----- > From: Young, Ben [mailto:ben.yo...@sungard.com] > Sent: 06 November 2015 08:20 > To: users@kafka.apache.org > Subject: Locality question > > Hi, > > I've had a look over the website and searched the archives, but I can't > find any obvious answers to this, so apologies if it's been asked before. > > I'm investigating potentially using Kafka for the transaction log for our > in-memory database technology. The idea is the Kafka partitioning and > replication will "automatically" give us sharding and hot-standby > capabilities in the db (obviously with a fair amount of work). > > The database can ingest hundreds of gigabytes of data extremely quickly, > easily enough to saturate any reasonable network connection, so I've > thought about co-locating the db on the same nodes of the kafka cluster > that actually store the data, to cut out the network entirely from the > loading process. We'd also probably want the db topology to be defined > first, and the kafka partitioning to follow. I can see how to use the > partitioner class to assign a specific partition to a key, but I can't > currently see how to assume partitions to known machines upfront. Is this > possible? > > Does the plan sound reasonable in general? I've also considered a log > shipping approach like Flume, but Kafka seems simplest all round, and a > really like the idea of just being able to set the log offset to zero to > reload on startup. > > Thanks, > Ben Young > > > Ben Young . Principal Software Engineer . Adaptiv . > > >