To clarify for my last email, by 10 nodes, I mean 10 kafka partitions
distributed in 10 different brokers. In my test, datatorrent can scale up
linearly with kafka partitions without any problem. Whatever you produce to
kafka, it can easily take into your application. And I'm quite sure it can
hand
Hi All,
Thanks for your valuable comments.
Sure, I will give a try with Samza and Data Torrent.
Meanwhile, I sharing screenshot of Storm UI. Please have a look at it.
Kafka producer is able to push 35 million messages to broker in two hours
with the of approx. 4k messages per second. On other s
Samza is an open source stream processing framework built on top of Kafka
and YARN. It is high throughput, scalable and has in built state management
and fault tolerance support. Though I may be biased, it is worth taking a
look :-)
Thanks,
Neha
On Tue, Jun 17, 2014 at 10:55 AM, Robert Rodgers
we have been experimenting with Samza which is also worth a look. It's
basically a topic-to-topic node on Yarn.
On Jun 17, 2014, at 10:44 AM, hsy...@gmail.com wrote:
> Hi Shaikh,
>
> I heard some throughput bottleneck of storm. It cannot really scale up with
> kafka.
> I recommend you to try
Hi Shaikh,
I heard some throughput bottleneck of storm. It cannot really scale up with
kafka.
I recommend you to try DataTorrent platform(https://www.datatorrent.com/)
The platform itself is not open-source but it has a open-source library (
https://github.com/DataTorrent/Malhar) which contains a
+1 for detailed examination of metrics. You can see the main metrics here:
https://kafka.apache.org/documentation.html#monitoring
Jconsole is very helpful for looking quickly at what is going on.
Cheers, Robert
On Sun, Jun 15, 2014 at 7:49 AM, pushkar priyadarshi <
priyadarshi.push...@gmail.c
Hi Riyaz,
There are a number of reasons that you may be getting low performance.
Here are some questions to get started:
1. How big are your messages? To meet your throughput requirement you need
a minimum of 10K messages per second continuously. You specified a
replication factor of 3 so at a
and one more thing.using kafka metrices you can easily monitor at what rate
you are able to publish on to kafka and what speed your consumer(in this
case your spout) is able to drain messages out of kafka.it's possible that
due to slowly draining out even publishing rate in worst case might get
eff
what throughput are you getting from your kafka cluster alone?Storm
throughput can be dependent on what processing you are actually doing from
inside it.so must look at each component starting from kafka first.
Regards,
Pushkar
On Sat, Jun 14, 2014 at 8:44 PM, Shaikh Ahmed wrote:
> Hi,
>
> Dai