we have been experimenting with Samza which is also worth a look. It's basically a topic-to-topic node on Yarn.
On Jun 17, 2014, at 10:44 AM, hsy...@gmail.com wrote: > Hi Shaikh, > > I heard some throughput bottleneck of storm. It cannot really scale up with > kafka. > I recommend you to try DataTorrent platform(https://www.datatorrent.com/) > > The platform itself is not open-source but it has a open-source library ( > https://github.com/DataTorrent/Malhar) which contains a kafka ingestion > functions. > The library is pretty cool, it can scale up dynamically with kafka > partitions and is fully HA. > > And in your case you might be able to use the platform for free.(It's free > if your application doesn't require large amount of memory) > > With datatorrent platform and the open-source library I can scale my > application up to 300k/s (10 nodes, 3 replica, 1kb msg, 0.8.0 client). > I heard the performance of kafka client has been improved for 0.8.1 release > :) > > Best, > Siyuan > > > On Sat, Jun 14, 2014 at 8:14 AM, Shaikh Ahmed <rnsr.sha...@gmail.com> wrote: > >> Hi, >> >> Daily we are downloaded 28 Million of messages and Monthly it goes up to >> 800+ million. >> >> We want to process this amount of data through our kafka and storm cluster >> and would like to store in HBase cluster. >> >> We are targeting to process one month of data in one day. Is it possible? >> >> We have setup our cluster thinking that we can process million of messages >> in one sec as mentioned on web. Unfortunately, we have ended-up with >> processing only 1200-1700 message per second. if we continue with this >> speed than it will take min 10 days to process 30 days of data, which is >> the relevant solution in our case. >> >> I suspect that we have to change some configuration to achieve this goal. >> Looking for help from experts to support me in achieving this task. >> >> *Kafka Cluster:* >> Kafka is running on two dedicated machines with 48 GB of RAM and 2TB of >> storage. We have total 11 nodes kafka cluster spread across these two >> servers. >> >> *Kafka Configuration:* >> producer.type=async >> compression.codec=none >> request.required.acks=-1 >> serializer.class=kafka.serializer.StringEncoder >> queue.buffering.max.ms=100000 >> batch.num.messages=10000 >> queue.buffering.max.messages=100000 >> default.replication.factor=3 >> controlled.shutdown.enable=true >> auto.leader.rebalance.enable=true >> num.network.threads=2 >> num.io.threads=8 >> num.partitions=4 >> log.retention.hours=12 >> log.segment.bytes=536870912 >> log.retention.check.interval.ms=60000 >> log.cleaner.enable=false >> >> *Storm Cluster:* >> Storm is running with 5 supervisor and 1 nimbus on IBM servers with 48 GB >> of RAM and 8TB of storage. These servers are shared with hbase cluster. >> >> *Kafka spout configuration* >> kafkaConfig.bufferSizeBytes = 1024*1024*8; >> kafkaConfig.fetchSizeBytes = 1024*1024*4; >> kafkaConfig.forceFromStart = true; >> >> *Topology: StormTopology* >> Spout - Partition: 4 >> First Bolt - parallelism hint: 6 and Num tasks: 5 >> Second Bolt - parallelism hint: 5 >> Third Bolt - parallelism hint: 3 >> Fourth Bolt - parallelism hint: 3 and Num tasks: 4 >> Fifth Bolt - parallelism hint: 3 >> Sixth Bolt - parallelism hint: 3 >> >> *Supervisor configuration:* >> >> storm.local.dir: "/app/storm" >> storm.zookeeper.port: 2181 >> storm.cluster.mode: "distributed" >> storm.local.mode.zmq: false >> supervisor.slots.ports: >> - 6700 >> - 6701 >> - 6702 >> - 6703 >> supervisor.worker.start.timeout.secs: 180 >> supervisor.worker.timeout.secs: 30 >> supervisor.monitor.frequency.secs: 3 >> supervisor.heartbeat.frequency.secs: 5 >> supervisor.enable: true >> >> storm.messaging.netty.server_worker_threads: 2 >> storm.messaging.netty.client_worker_threads: 2 >> storm.messaging.netty.buffer_size: 52428800 #50MB buffer >> storm.messaging.netty.max_retries: 25 >> storm.messaging.netty.max_wait_ms: 1000 >> storm.messaging.netty.min_wait_ms: 100 >> >> >> supervisor.childopts: "-Xmx1024m -Djava.net.preferIPv4Stack=true" >> worker.childopts: "-Xmx2048m -Djava.net.preferIPv4Stack=true" >> >> >> Please let me know if more information needed.. >> >> Thanks in advance. >> >> Regards, >> Riyaz >>