Re: Help is processing huge data through Kafka-storm cluster

2014-06-19 Thread hsy...@gmail.com
To clarify for my last email, by 10 nodes, I mean 10 kafka partitions distributed in 10 different brokers. In my test, datatorrent can scale up linearly with kafka partitions without any problem. Whatever you produce to kafka, it can easily take into your application. And I'm quite sure it can hand

Re: Help is processing huge data through Kafka-storm cluster

2014-06-19 Thread Shaikh Ahmed
Hi All, Thanks for your valuable comments. Sure, I will give a try with Samza and Data Torrent. Meanwhile, I sharing screenshot of Storm UI. Please have a look at it. Kafka producer is able to push 35 million messages to broker in two hours with the of approx. 4k messages per second. On other s

Re: Help is processing huge data through Kafka-storm cluster

2014-06-17 Thread Neha Narkhede
Samza is an open source stream processing framework built on top of Kafka and YARN. It is high throughput, scalable and has in built state management and fault tolerance support. Though I may be biased, it is worth taking a look :-) Thanks, Neha On Tue, Jun 17, 2014 at 10:55 AM, Robert Rodgers

Re: Help is processing huge data through Kafka-storm cluster

2014-06-17 Thread Robert Rodgers
we have been experimenting with Samza which is also worth a look. It's basically a topic-to-topic node on Yarn. On Jun 17, 2014, at 10:44 AM, hsy...@gmail.com wrote: > Hi Shaikh, > > I heard some throughput bottleneck of storm. It cannot really scale up with > kafka. > I recommend you to try

Re: Help is processing huge data through Kafka-storm cluster

2014-06-17 Thread hsy...@gmail.com
Hi Shaikh, I heard some throughput bottleneck of storm. It cannot really scale up with kafka. I recommend you to try DataTorrent platform(https://www.datatorrent.com/) The platform itself is not open-source but it has a open-source library ( https://github.com/DataTorrent/Malhar) which contains a

Re: Help is processing huge data through Kafka-storm cluster

2014-06-15 Thread Robert Hodges
+1 for detailed examination of metrics. You can see the main metrics here: https://kafka.apache.org/documentation.html#monitoring Jconsole is very helpful for looking quickly at what is going on. Cheers, Robert On Sun, Jun 15, 2014 at 7:49 AM, pushkar priyadarshi < priyadarshi.push...@gmail.c

Re: Help is processing huge data through Kafka-storm cluster

2014-06-15 Thread Robert Hodges
Hi Riyaz, There are a number of reasons that you may be getting low performance. Here are some questions to get started: 1. How big are your messages? To meet your throughput requirement you need a minimum of 10K messages per second continuously. You specified a replication factor of 3 so at a

Re: Help is processing huge data through Kafka-storm cluster

2014-06-15 Thread pushkar priyadarshi
and one more thing.using kafka metrices you can easily monitor at what rate you are able to publish on to kafka and what speed your consumer(in this case your spout) is able to drain messages out of kafka.it's possible that due to slowly draining out even publishing rate in worst case might get eff

Re: Help is processing huge data through Kafka-storm cluster

2014-06-15 Thread pushkar priyadarshi
what throughput are you getting from your kafka cluster alone?Storm throughput can be dependent on what processing you are actually doing from inside it.so must look at each component starting from kafka first. Regards, Pushkar On Sat, Jun 14, 2014 at 8:44 PM, Shaikh Ahmed wrote: > Hi, > > Dai