We are using storm in the product and we currently have different variations of 
spouts and bolts. I am using yourkit profiler to profile each bolt. While doing 
so, I am seeing the bolt threads sleep most of the time. The way I am testing 
is using more tasks for the spout, so it can generate more data to see how fast 
the bolt the can process them. I am tweaking the parallelism hint for both 
spout and bolt. Each topology has only 2 components - one spout and a bolt. 
This way I can focus on the methods of the bolt and spout.

The attached screen shots show the consumeBatchWhenAvailable method in the 
Disruptor queue takes more time. I tried increasing the buffer size for the 
executor receive queue and send queue to 16384.

Questions:

1.       Why does the spout thread shows as sleeping most of the time?

2.       Why does storm methods take more time?
Any suggestions on how we can increase the utilization? The CPU utilization 
seems to be less. Also Spout is not injesting data from external source. The 
spout generates data (primitive data variables) to be emitted as a tuple. The 
generation code takes 6% of the time. I do see a thread stack as shown below 
where it shows the CPU time for wait strategy is 8s, when I configured the 
sleep wait strategy as 10 ms.


Thread-16-SourceInbound [SLEEPING] CPU time: 8s
java.lang.Thread.sleep(long)
backtype.storm.spout.SleepSpoutWaitStrategy.emptyEmit(long)
backtype.storm.daemon.executor$fn__3373$fn__3388$fn__3417.invoke()
backtype.storm.util$async_loop$fn__464.invoke()
clojure.lang.AFn.run()
java.lang.Thread.run()

Any suggestions would be a big help.

Thanks,
Srividhya

This email and any files transmitted with it are confidential, proprietary and 
intended solely for the individual or entity to whom they are addressed. If you 
have received this email in error please delete it immediately.

Reply via email to