Spouts tuple generation rate

Wilson Akio Higashino Tue, 24 Feb 2015 12:39:42 -0800

Dear all,

I have a simple topology composed of a spout followed by three bolts, and I
want to measure the processing latency as a function of the tuple incoming
rate.


To execute this test, I created a Spout that from time to time "create" a
new tuple and emit it to the topology. In order to control the generation
rate, I simply sleep for a configurable period. The code follows the
general idea present in some of the "storm-starter" topologies:

   public void nextTuple() {
        Utils.sleep(SLEEP_TIME);

        // Create test tuple and emit
   }


For "slow" rates the spout can generate tuples with good accuracy. For
example, if I sleep for 10 milliseconds, the rate should be around 100
tuples/second - and I get around 92 tuples/second.
However, if I increase the rate, the error becomes very large (for example,
for 1 millisecond sleep, I get only 650 tuples/second instead of the
theoretical 1000 tuples/second).

In addition:

- Everything is running on a single Worker.

- Generally, there are no tuples waiting on any of the receiving / sending
queues.

- The code generating the tuple is not a bottleneck, because when I remove
the Utils.sleep line I get a generation rate of over 10,000 tuples /
second. This result also shows me that the topology can handle larger rates
without problems.


I understand that the way I am programming the "nextTuple" method only
guarantees an upper bound on the generation rate, but I would like to have
better control over it.

My questions are:

- Is there anything on Storm internals that justify this behaviour? I
thought it could be related to the "SpoutWaitStrategy" associated with the
Spout, but I switched to other strategies and didn't have any effect.

- Any ideas / thoughts on how I could better control the tuple generation
rate other than using this sleep / awake pattern?


I appreciate your help.

Regards,

Wilson

Spouts tuple generation rate

Reply via email to