Thibaud, Sounds like one of your issues will be upstream of Kafka. Robust and UDP aren't something I usually think of together unless you have additional bookkeeping to detect and request lost messages. 8MB/s shouldn't be much of a problem unless the messages are very small and looking for individual commits. You also have the challenge of having the server process/machine/network go away after the UDP message is received but before it can be pushed to Kafka.
Beyond that, there are a lot of server frameworks that work fine. I use Dropwizard mostly since I like Java, though it doesn't support UDP resources. There are plenty of options there, but that's probably not a Kafka issue. On Thu, Jan 30, 2014 at 6:38 AM, Philip O'Toole <phi...@loggly.com> wrote: > Well, you could start by looking at the Kafka Producer source code for some > ideas. We have built plenty of solid software on that. > > As to your goal of building something solid, robust, and critical. All I > can say is you then need to keep your Producer as simple as possible -- the > simpler it is, the less like it is to crash, have bugs, and you must test > it very well. Get the data to Kafka as fast as possible, so the chance of > losing any due to a crash are very small. Take a long time to test it. The > Producers I have written (in C++) run for weeks without going down (and > then we usually bring them down on purpose for upgrades). However, they > were in test for months too. > > http://www.youtube.com/watch?v=LpNbjXFPyZ0 > > > On Thu, Jan 30, 2014 at 6:31 AM, Thibaud Chardonnens > <thibaud...@gmail.com>wrote: > > > Thanks for your quick answer. > > Yes, sorry it's probably too broad but my main question was if there is > > any best practices to build a robust, fault-tolerant producer that > > guarantees that no data will be dropped while listening on the port. > > From my point of view the producer will be the most critical part in the > > system, if something goes wrong with it, the workflow will be stopped and > > data will be lost. > > > > Do you have by any chance a pointer to an existing implementation of a > > such producer? > > > > Thanks > > > > > > Le 30 janv. 2014 à 15:13, Philip O'Toole <phi...@loggly.com> a écrit : > > > > > What exactly are you struggling with? Your question is too broad. What > > you want to do is eminently possible, having done it myself from scratch. > > > > > > Philip > > > > > >> On Jan 30, 2014, at 6:00 AM, Thibaud Chardonnens < > thibaud...@gmail.com> > > wrote: > > >> > > >> Hello -- I am struggling about how to design a robust implementation > of > > a producer. > > >> > > >> My use case is quite simple: > > >> I want to process a relatively big stream (~8MB/s) with Storm. Kafka > > will be used as intermediate between the stream and Storm. The stream is > > sent to a specific server on a specific port (through UDP). So Storm will > > be the consumer and I need to write a producer (basically in Java) that > > will listen on that specific port and send messages to a Kafka topic. > > >> > > >> Kafka and Storm are well designed and fault-tolerant, if a node goes > > down the whole environment continues to work properly etc... Therefore my > > producer will be a single point of failure in the workflow. Moreover, > > writing a such producer is not so easy, I'll need to write a > multithreaded > > server to keep up with the throughput of the stream without guarantee > that > > no data will be dropped... > > >> > > >> So I would like to know if there is some best practices to write a > such > > producer or is there an other (maybe simpler) way to do? > > >> > > >> Thanks, > > >> Thibaud > > > > >