I can't think of anything specific, but this is what I would do. I would add an input queue inside the bolt (like the ShellBolt does). This does not help you in any way other than you could monitor the input queue (size/latency) since you own it. This should also let you emulate in tests some scenarios by artificially filling up that queue, regardless of storm.
________________________________ From: Luke Rohde <[email protected]> Sent: Monday, March 9, 2015 4:22 PM To: [email protected] Subject: Re: throughput problem Any thoughts? On Thu, Mar 5, 2015 at 1:39 PM Luke Rohde <[email protected]<mailto:[email protected]>> wrote: I've been trying to figure this one out and can't seem to come up with an answer, hopefully someone can enlighten me. The simple picture is that I have a spout, a CPU-bound task, and an IO-bound task that's sending data out over the network. The IO-bound task is making batches and doing async calls with the batches, so it's not blocking (and indeed its capacity is very low). However, when sending this data (instead of a no-op on that task) I only get about half the throughput as I would otherwise. What I've looked into: - the thread pool used to execute the async call has plenty of capacity - network IO is not saturated - max spout pending is not in play - execute latency is very low on the network bound task Does anyone have any ideas? Is there a config that could possibly be rate-limiting incoming data to that task since it's juggling many unacked tuples?
