Hey

On 02/02/2013 01:40 AM, Chris Neal wrote:
Thanks for the help Juhani :) I'll take a look with Ganglia and see what things look like.

Any thoughts on keeping the ExecSource.batchSize, MemoryChannel.transactionCapacity, AvroSink.batch-size, and HDFSSink.batchSize the same?

It's not really important, so long as the avro batch size is less than or equal to the channel transaction capacity. The HDFS sinks batch size is independent of them both.

I looked at the MemoryChannel code, and noticed that there is a timeout parameter passed to doCommit(), where the execption is being thrown. Just for fun, I increased it from the default to 10 seconds, and now things are running smoothly with the same config as before. It's been running for about 24 hours now. A step in the right direction anyway! :)


If that fixed it, it sounds like your data is just very bursty and sometimes gets fed in faster than it's drained out. The solution to that would be either to enlarge your temporary buffer(the mem channel), to throttle the incoming data(probably not possible) or to increase drain speed(more sinks running in parallel)

Thanks again.
Chris

On Thu, Jan 31, 2013 at 8:12 PM, Juhani Connolly <[email protected] <mailto:[email protected]>> wrote:

    Hi Chris,

    The most likely cause of that error is that the sinks are draining
    requests slower than your sources are feeding fresh data. Over
    time it will fill up the capacity of your memory channel, which
    will then start refusing additional put requests.

    You can confirm this by connecting with jmx or ganglia.

    If the write is extremely bursty, it's possible that it's just
    temporarily going over the sink consumption rate, and increasing
    the channel capacity could work. Otherwise, increasing the avro
    batch size, or adding additional avro sinks(more threads) may also
    help. I think that setting up ganglia monitoring and looking at
    the incoming and outgoing event counts and channel fill states
    helps a lot in diagnosing these bottlenecks, you should look into
    doing that.


    On 02/01/2013 02:01 AM, Chris Neal wrote:
    Hi all.

    I need some thoughts on sizing/tuning of the above (common) route
    in FlumeNG to maximize throughput.  Here is my setup:

    *Source JVM (ExecSource/MemoryChannel/AvroSink):*
    -Xmx4g
    -Xms4g
    -XX:MaxDirectMemorySize=256m

    Number of ExecSources in config:  124 (yes, it's a ton.  Can't do
    anything about it :)  The write rate to the source files is
    fairly fast and bursty.

    ExecSource.batchSize = 1000
    (so, when all 124 tail -F instances get 1000 events, they all
    dump to the memory channel)

    MemoryChannel.capacity = 1000000
    MemoryChannel.transactionCapacity = 1000
    (somewhat unclear on what this is.  Docs say "The number of
    events stored in the channel per transaction", but what is a
    "transaction" to a MemoryChannel?)

    AvroSink.batchSize = 1000

    *Destination JVM (AvroSource/FileChannel/HDFSSink)*
    (Cluster of two JVMs on two servers, each configured the same as
    per below)
    -Xms=2g
    -Xmx=2g
    -XX:MaxDirectMemorySize is not defined, so whatever the default is

    AvroSource.threads = 64
    FileChannel.transactionCapacity = 1000
    FileChannel.capacity = 32000000
    HDFSSink.batchSize = 1000
    HDFSSink.threadPoolSize = 64

    With this configuration, in about 5 minutes, I get the common
    Exception:

    "Space for commit to queue couldn't be acquired Sinks are likely
    not keeping up with sources, or the buffer size is too tight"

    on the Source JVM.  It is no where near the 4g max, rather only
    at about 2.5g.

    I'm wondering about the logic of having all the batch
    sizes/transaction sizes 1000.  My thought was that would keep
    from fragmenting the transfer of data, but maybe that's flawed?
     Should the sizes be different?

    Also curious about increasing the MaxDirectMemorySize to
    something larger than 256MB?  I tried removing it altogether in
    my Source JVM (which makes the size unbounded), but that didn't
    seem to make a difference.

    I'm having some trouble figuring out where the backup is
    happening, and how to open up the gates. :)

    Thanks in advance for any suggestions.
    Chris



Reply via email to