I can't think of anything specific, but this is what I would do. I would add an 
input queue inside the bolt (like the ShellBolt does). This does not help you 
in any way other than you could monitor the input queue (size/latency) since 
you own it. This should also let you emulate in tests some scenarios by 
artificially filling up that queue, regardless of storm.


________________________________
From: Luke Rohde <[email protected]>
Sent: Monday, March 9, 2015 4:22 PM
To: [email protected]
Subject: Re: throughput problem

Any thoughts?

On Thu, Mar 5, 2015 at 1:39 PM Luke Rohde 
<[email protected]<mailto:[email protected]>> wrote:
I've been trying to figure this one out and can't seem to come up with an 
answer, hopefully someone can enlighten me.
The simple picture is that I have a spout, a CPU-bound task, and an IO-bound 
task that's sending data out over the network. The IO-bound task is making 
batches and doing async calls with the batches, so it's not blocking (and 
indeed its capacity is very low). However, when sending this data (instead of a 
no-op on that task) I only get about half the throughput as I would otherwise.
What I've looked into:
 - the thread pool used to execute the async call has plenty of capacity
 - network IO is not saturated
 - max spout pending is not in play
 - execute latency is very low on the network bound task
Does anyone have any ideas? Is there a config that could possibly be 
rate-limiting incoming data to that task since it's juggling many unacked 
tuples?

Reply via email to