We have been testing storm from 0.9.0.1 until 0.9.4 (I have not tried 0.9.5
yet but I don't see any significant differences there), and unfortunately
we could not even have a clean run for over 30 minutes on a cluster of 5
high-end nodes. zookeeper is also set up on these nodes but on different
disks.

I have huge troubles to give my data analytics topology a stable run. So I
tried the simplest topology I can think of, just an emtpy bolt, no io
except for reading from kafka queue.

Just to report my latest testing on 0.9.4 with this empty bolt (kakfa topic
partition=1, spout task #=1, bolt #=20 with field grouping, msg size=1k).
After 26 minutes, nimbus orders to kill the topology as it believe the
topology is dead, then after another 2 minutes, another kill, then another
after another 4 minutes, and on and on.

I can understand there might be issues in the coordination among nimbus,
worker and executor (e.g., heartbeats). But are there any doable
workarounds? I wish there are as so many of you are using it in production
:-)

I deeply appreciate any suggestions that could even make my toy topology
working!

Fang

Reply via email to