Hi all,
What are the use cases where technologies like Kafka , Storm , Flink , , Hive ,
Hadoop and Spark differentiates ?
Is there a good material online or book to refer for this ?
Thanks,
Shibha
Hi Emmanuel,
You can firstly run a kafka producer perf (bin/kafka-producer-perf-test.sh)
test with your storm consumers and kafka consumer perf (bin/
kafka-consumer-perf.test.sh) test with your own producers respectively to
see if the bottleneck is really in kafka.
Thanks,
Manu Zhang
On Mon, Mar
Hi Emmanuel,
Can you post your kafka server.properties and in your producer are your
distributing your messages into all kafka topic partitions.
--
Harsha
On March 20, 2015 at 12:33:02 PM, Emmanuel (ele...@msn.com) wrote:
Kafka on test cluster:
2 Kafka nodes, 2GB, 2CPUs
3 Zookeeper no
Kafka on test cluster: 2 Kafka nodes, 2GB, 2CPUs3 Zookeeper nodes, 2GB, 2CPUs
Storm:3 nodes, 3CPUs each, on the same Zookeeper cluster as Kafka.
1 topic, 5 partitions, replication x2
Whether I use 1 slot for the Kafka Spout or 5 slots (=#partitions), the
throughput seems about the same.
I can't se
Hi,
I am runnning a Kafka-storm topology .
The topology works exactly as expected in LocalCluster mode. However in
Distributed mode the Workers are not starting , supervisor has not started.
Only nimbus starts
I am trying to run the topology via eclipse.
//Providing the details for zkHosts
To clarify for my last email, by 10 nodes, I mean 10 kafka partitions
distributed in 10 different brokers. In my test, datatorrent can scale up
linearly with kafka partitions without any problem. Whatever you produce to
kafka, it can easily take into your application. And I'm quite sure it can
hand
Hi All,
Thanks for your valuable comments.
Sure, I will give a try with Samza and Data Torrent.
Meanwhile, I sharing screenshot of Storm UI. Please have a look at it.
Kafka producer is able to push 35 million messages to broker in two hours
with the of approx. 4k messages per second. On other s
Samza is an open source stream processing framework built on top of Kafka
and YARN. It is high throughput, scalable and has in built state management
and fault tolerance support. Though I may be biased, it is worth taking a
look :-)
Thanks,
Neha
On Tue, Jun 17, 2014 at 10:55 AM, Robert Rodgers
we have been experimenting with Samza which is also worth a look. It's
basically a topic-to-topic node on Yarn.
On Jun 17, 2014, at 10:44 AM, hsy...@gmail.com wrote:
> Hi Shaikh,
>
> I heard some throughput bottleneck of storm. It cannot really scale up with
> kafka.
> I recommend you to try
Hi Shaikh,
I heard some throughput bottleneck of storm. It cannot really scale up with
kafka.
I recommend you to try DataTorrent platform(https://www.datatorrent.com/)
The platform itself is not open-source but it has a open-source library (
https://github.com/DataTorrent/Malhar) which contains a
+1 for detailed examination of metrics. You can see the main metrics here:
https://kafka.apache.org/documentation.html#monitoring
Jconsole is very helpful for looking quickly at what is going on.
Cheers, Robert
On Sun, Jun 15, 2014 at 7:49 AM, pushkar priyadarshi <
priyadarshi.push...@gmail.c
Hi Riyaz,
There are a number of reasons that you may be getting low performance.
Here are some questions to get started:
1. How big are your messages? To meet your throughput requirement you need
a minimum of 10K messages per second continuously. You specified a
replication factor of 3 so at a
and one more thing.using kafka metrices you can easily monitor at what rate
you are able to publish on to kafka and what speed your consumer(in this
case your spout) is able to drain messages out of kafka.it's possible that
due to slowly draining out even publishing rate in worst case might get
eff
what throughput are you getting from your kafka cluster alone?Storm
throughput can be dependent on what processing you are actually doing from
inside it.so must look at each component starting from kafka first.
Regards,
Pushkar
On Sat, Jun 14, 2014 at 8:44 PM, Shaikh Ahmed wrote:
> Hi,
>
> Dai
Hi,
Daily we are downloaded 28 Million of messages and Monthly it goes up to
800+ million.
We want to process this amount of data through our kafka and storm cluster
and would like to store in HBase cluster.
We are targeting to process one month of data in one day. Is it possible?
We have setup
rough the zookeeper 3.3.4 dependency included with
> kafka_2.9.2-0.8.1 and has more than one dependency that uses scala 2.8.x.
>
> If this is where it's coming from, a solution is to just exclude jline from
> the kafka/storm-kafka dependency in your pom.xml. ex:
>
>
>
where it's coming from, a solution is to just exclude jline from
the kafka/storm-kafka dependency in your pom.xml. ex:
org.apache.kafka
kafka_2.9.2
0.8.1
jline
jline
You can also check your dependency tree and look for other references to
other scala ver
I am using kafka with storm. I am using maven to build my topology and I am
using scala 2.9.2 same as I am using kafka_2.9.2_0.8.1.
Topology build perfectly using maven. But hwn I submit the topology to
storm I get the following Exception:
java.lang.NoSuchMethodError: scala.Predef$.int2Integer(I)
ka 0.8+ with Storm 0.9+, while using Apache Avro
> as the data serialization format.
>
> https://github.com/miguno/kafka-storm-starter
>
> Since the integration of the latest Kafka and Storm versions have been a
> popular topic on the mailing lists (read: many questions/threads) I ho
Hi everyone,
to sweeten the upcoming long weekend I have released code examples that
show how to integrate Kafka 0.8+ with Storm 0.9+, while using Apache
Avro as the data serialization format.
https://github.com/miguno/kafka-storm-starter
Since the integration of the latest Kafka and Storm
ion, storm seems to
> have all these workers but they way it seems to me the order in which these
> items are processed off the queue is very random correct?
>
> In my use case order is very important so using something like storm would
> not be suitable right?
>
> I first learne
using something like storm would
not be suitable right?
I first learned of kafka + storm based on a post by someone from loggly,
but loggly can process items randomly I would imagine because at the end of
the day each log item is timestamped so after it is processed and indexed
things would be fine
suitable right?
I first learned of kafka + storm based on a post by someone from loggly,
but loggly can process items randomly I would imagine because at the end of
the day each log item is timestamped so after it is processed and indexed
things would be fine.
But if your use case is such that processing
y all the data for 2:00:00 PM does not arrive
> > at
> > > a
> > > > time and the application has to wait for all the data to arrive to
> > > perform
> > > > certain analytics.
> > > >
> > > > For example, say at 2:00:00 pm I g
o arrive to
> > perform
> > > certain analytics.
> > >
> > > For example, say at 2:00:00 pm I get 990 points and another 10 points
> > (say
> > > I know beforehand that there would be 1000 points of data per
> > millisecond)
> > > arrive at 2:00:40
gt; I know beforehand that there would be 1000 points of data per
> millisecond)
> > arrive at 2:00:40 PM. Now I have to wait for all the data to arrive to
> > perform analytics.
> >
> > Where should I place my application logic: (1) In Kafka, (2) In Storm or
> > should
d when
> I get all the points for a particular time than only I give it to
> Kafka/Storm.
>
> I am confused :) Any help would be appreciated. Sorry for any grammatical
> errors as I just was thinking aloud and jotting down my question.
>
> Regards,
> Yavar
points for a particular time than only I give it to
Kafka/Storm.
I am confused :) Any help would be appreciated. Sorry for any grammatical
errors as I just was thinking aloud and jotting down my question.
Regards,
Yavar
28 matches
Mail list logo