I hadn't noticed the UDP requirement before, that does complicate things, and unless you're in absolute control of the network path, some data loss is virtually guaranteed. Are you allowed to have more than one "collector/producer" machine so that that if one fails you won't be stuck? If you can have multiple collector/producer machines, is UDP multicasting (with later deduplication) is an option?
Absolutely no data loss and no duplication can be pretty high standard-- doable, but are there there some aspects of your high-level design that can be changed to more easily accommodate it? Here, we use the stock java producer client, but we are transitioning to a custom one that offers better guarantees under asynchronous operation. Our use case is for logging data, so losing some or having it delivered late won't stop anyone's progress, and we have retry logic built in to each step in the chain. We probably lose some records here or there, but not enough to drastically alter any outcomes for a user. --Tom On Thu, Jan 30, 2014 at 7:35 AM, Thibaud Chardonnens <thibaud...@gmail.com>wrote: > Thanks for your reply, but I am missing something, how do you push the > data to a specific topic in your example? Through which client? > > Le 30 janv. 2014 à 15:16, Tom Brown <tombrow...@gmail.com> a écrit : > > > Why go with a fancy multithreaded producer architecture? Why not rely on > a > > simple python/perl/whatever implementation and let a scalable web server > > handle the threading issues? > >