thanks a lot for the explanation. if I understand it correctly it basically back pressure from C*, it's telling me that it's overloaded and that I need to back off.
I better start a few more nodes, I guess. T# On Thu, May 30, 2013 at 10:47 PM, Robert Coli <rc...@eventbrite.com> wrote: > On Thu, May 30, 2013 at 8:24 AM, Theo Hultberg <t...@iconara.net> wrote: > > I'm using Cassandra 1.2.4 on EC2 (3 x m1.large, this is a test cluster), > and > > my application is talking to it over the binary protocol (I'm using JRuby > > and the cql-rb driver). I get this error quite frequently: "Too many in > > flight hints: 2411" (the exact number varies) > > > > Has anyone any idea of what's causing it? I'm pushing the cluster quite > hard > > with writes (but no reads at all). > > The code that produces this message (below) sets the bound based on > the number of available processors. It is a bound of number of in > progress hints. An in progress hint (for some reason redundantly > referred to as "in flight") is a hint which has been submitted to the > executor which will ultimately write it to local disk. If you get > OverloadedException, this means that you were trying to write hints to > this executor so fast that you risked OOM, so Cassandra refused to > submit your hint to the hint executor and therefore (partially) failed > your write. > > " > private static volatile int maxHintsInProgress = 1024 * > FBUtilities.getAvailableProcessors(); > [... snip ...] > for (InetAddress destination : targets) > { > // avoid OOMing due to excess hints. we need to do this > check even for "live" nodes, since we can > // still generate hints for those if it's overloaded or > simply dead but not yet known-to-be-dead. > // The idea is that if we have over maxHintsInProgress > hints in flight, this is probably due to > // a small number of nodes causing problems, so we should > avoid shutting down writes completely to > // healthy nodes. Any node with no hintsInProgress is > considered healthy. > if (totalHintsInProgress.get() > maxHintsInProgress > && (hintsInProgress.get(destination).get() > 0 && > shouldHint(destination))) > { > throw new OverloadedException("Too many in flight > hints: " + totalHintsInProgress.get()); > } > " > > If Cassandra didn't return this exception, it might OOM while > enqueueing your hints to be stored. Giving up on trying to enqueue a > hint for the failed write is chosen instead. The solution is to reduce > your write rate, ideally by enough that you don't even queue hints in > the first place. > > =Rob >