Hi, I've had a look over the website and searched the archives, but I can't find any obvious answers to this, so apologies if it's been asked before.
I'm investigating potentially using Kafka for the transaction log for our in-memory database technology. The idea is the Kafka partitioning and replication will "automatically" give us sharding and hot-standby capabilities in the db (obviously with a fair amount of work). The database can ingest hundreds of gigabytes of data extremely quickly, easily enough to saturate any reasonable network connection, so I've thought about co-locating the db on the same nodes of the kafka cluster that actually store the data, to cut out the network entirely from the loading process. We'd also probably want the db topology to be defined first, and the kafka partitioning to follow. I can see how to use the partitioner class to assign a specific partition to a key, but I can't currently see how to assume partitions to known machines upfront. Is this possible? Does the plan sound reasonable in general? I've also considered a log shipping approach like Flume, but Kafka seems simplest all round, and a really like the idea of just being able to set the log offset to zero to reload on startup. Thanks, Ben Young Ben Young . Principal Software Engineer . Adaptiv .