Re: Help is processing huge data through Kafka-storm cluster

Robert Hodges Sun, 15 Jun 2014 08:44:02 -0700

+1 for detailed examination of metrics.  You can see the main metrics here:


https://kafka.apache.org/documentation.html#monitoring

Jconsole is very helpful for looking quickly at what is going on.

Cheers, Robert


On Sun, Jun 15, 2014 at 7:49 AM, pushkar priyadarshi <
priyadarshi.push...@gmail.com> wrote:

> and one more thing.using kafka metrices you can easily monitor at what rate
> you are able to publish on to kafka and what speed your consumer(in this
> case your spout) is able to drain messages out of kafka.it's possible that
> due to slowly draining out even publishing rate in worst case might get
> effected as if consumer lags behind too much then it will result into disk
> seeks while consuming the older messages.
>
>
> On Sun, Jun 15, 2014 at 8:16 PM, pushkar priyadarshi <
> priyadarshi.push...@gmail.com> wrote:
>
> > what throughput are you getting from your kafka cluster alone?Storm
> > throughput can be dependent on what processing you are actually doing
> from
> > inside it.so must look at each component starting from kafka first.
> >
> > Regards,
> > Pushkar
> >
> >
> > On Sat, Jun 14, 2014 at 8:44 PM, Shaikh Ahmed <rnsr.sha...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> Daily we are downloaded 28 Million of messages and Monthly it goes up to
> >> 800+ million.
> >>
> >> We want to process this amount of data through our kafka and storm
> cluster
> >> and would like to store in HBase cluster.
> >>
> >> We are targeting to process one month of data in one day. Is it
> possible?
> >>
> >> We have setup our cluster thinking that we can process million of
> messages
> >> in one sec as mentioned on web. Unfortunately, we have ended-up with
> >> processing only 1200-1700 message per second.  if we continue with this
> >> speed than it will take min 10 days to process 30 days of data, which is
> >> the relevant solution in our case.
> >>
> >> I suspect that we have to change some configuration to achieve this
> goal.
> >> Looking for help from experts to support me in achieving this task.
> >>
> >> *Kafka Cluster:*
> >> Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of
> >> storage. We have total 11 nodes kafka cluster spread across these two
> >> servers.
> >>
> >> *Kafka Configuration:*
> >> producer.type=async
> >> compression.codec=none
> >> request.required.acks=-1
> >> serializer.class=kafka.serializer.StringEncoder
> >> queue.buffering.max.ms=100000
> >> batch.num.messages=10000
> >> queue.buffering.max.messages=100000
> >> default.replication.factor=3
> >> controlled.shutdown.enable=true
> >> auto.leader.rebalance.enable=true
> >> num.network.threads=2
> >> num.io.threads=8
> >> num.partitions=4
> >> log.retention.hours=12
> >> log.segment.bytes=536870912
> >> log.retention.check.interval.ms=60000
> >> log.cleaner.enable=false
> >>
> >> *Storm Cluster:*
> >> Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48
> GB
> >> of RAM and 8TB of storage. These servers are shared with hbase cluster.
> >>
> >> *Kafka spout configuration*
> >> kafkaConfig.bufferSizeBytes = 1024*1024*8;
> >> kafkaConfig.fetchSizeBytes = 1024*1024*4;
> >> kafkaConfig.forceFromStart = true;
> >>
> >> *Topology: StormTopology*
> >> Spout           - Partition: 4
> >> First Bolt     -  parallelism hint: 6 and Num tasks: 5
> >> Second Bolt -  parallelism hint: 5
> >> Third Bolt     -   parallelism hint: 3
> >> Fourth Bolt   -  parallelism hint: 3 and Num tasks: 4
> >> Fifth Bolt      -  parallelism hint: 3
> >> Sixth Bolt     -  parallelism hint: 3
> >>
> >> *Supervisor configuration:*
> >>
> >> storm.local.dir: "/app/storm"
> >> storm.zookeeper.port: 2181
> >> storm.cluster.mode: "distributed"
> >> storm.local.mode.zmq: false
> >> supervisor.slots.ports:
> >>     - 6700
> >>     - 6701
> >>     - 6702
> >>     - 6703
> >> supervisor.worker.start.timeout.secs: 180
> >> supervisor.worker.timeout.secs: 30
> >> supervisor.monitor.frequency.secs: 3
> >> supervisor.heartbeat.frequency.secs: 5
> >> supervisor.enable: true
> >>
> >> storm.messaging.netty.server_worker_threads: 2
> >> storm.messaging.netty.client_worker_threads: 2
> >> storm.messaging.netty.buffer_size: 52428800 #50MB buffer
> >> storm.messaging.netty.max_retries: 25
> >> storm.messaging.netty.max_wait_ms: 1000
> >> storm.messaging.netty.min_wait_ms: 100
> >>
> >>
> >> supervisor.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true"
> >> worker.childopts: "-Xmx2048m -Djava.net.preferIPv4Stack=true"
> >>
> >>
> >> Please let me know if more information needed..
> >>
> >> Thanks in advance.
> >>
> >> Regards,
> >> Riyaz
> >>
> >
> >
>

Re: Help is processing huge data through Kafka-storm cluster

Reply via email to