Locality question

Young, Ben Fri, 06 Nov 2015 00:21:07 -0800

Hi,

I've had a look over the website and searched the archives, but I can't find 
any obvious answers to this, so apologies if it's been asked before.


I'm investigating potentially using Kafka for the transaction log for our 
in-memory database technology. The idea is the Kafka partitioning and 
replication will "automatically" give us sharding and hot-standby capabilities 
in the db (obviously with a fair amount of work). 

The database can ingest hundreds of gigabytes of data extremely quickly, easily 
enough to saturate any reasonable network connection, so I've thought about 
co-locating the db on the same nodes of the kafka cluster that actually store 
the data, to cut out the network entirely from the loading process. We'd also 
probably want the db topology to be defined first, and the kafka partitioning 
to follow. I can see how to use the partitioner class to assign a specific 
partition to a key, but I can't currently see how to assume partitions to known 
machines upfront. Is this possible? 

Does the plan sound reasonable in general? I've also considered a log shipping 
approach like Flume, but Kafka seems simplest all round, and a really like the 
idea of just being able to set the log offset to zero to reload on startup.

Thanks,
Ben Young


Ben Young . Principal Software Engineer . Adaptiv .

Locality question

Reply via email to