Re: Storing message off heap, message compression, and storing the whole message as headers.

Justin Reock Mon, 20 Apr 2015 06:35:29 -0700

"You made a statement that sounds like "the JVM
can only use half its memory, because the other half has to be kept free
for GCing", which doesn't match my experience at all.  I've observed G1GC
to successfully GC when the heap was nearly 100% full, I'm certain it's not
a problem for CMS because CMS is a non-compacting Old Gen GC strategy -
that's why it's subject to fragmentation - and I believe that ParallelGC
does in-place compaction so it wouldn't require additional memory though I
haven't directly observed it during a GC.²


In general, because ActiveMQ makes use of so many tiny objects in memory,
I recommend people set aside around twice the necessary heap to allow
compaction to occur efficiently, even with G1GC.  The compaction algorithm
works by copying fragmented data into a free contiguous area of heap, so,
the more free heap you have, the higher the guarantee that lots of objects
can be compacted without requiring a GC.

-Justin  

On 4/20/15, 9:24 AM, "Tim Bain" <tb...@alumni.duke.edu> wrote:

>I'm confused about what would drive the need for this.
>
>Is it the ability to hold more messages than your JVM size allows?  If so,
>we already have both KahaDB and LevelDB; what does Chronicle offer that
>those other two don't?
>
>Is it because you see some kind of inefficiency in how ActiveMQ uses
>memory
>or how the JVM's GC strategies work?  If so, can you elaborate on what
>you're concerned about?  (You made a statement that sounds like "the JVM
>can only use half its memory, because the other half has to be kept free
>for GCing", which doesn't match my experience at all.  I've observed G1GC
>to successfully GC when the heap was nearly 100% full, I'm certain it's
>not
>a problem for CMS because CMS is a non-compacting Old Gen GC strategy -
>that's why it's subject to fragmentation - and I believe that ParallelGC
>does in-place compaction so it wouldn't require additional memory though I
>haven't directly observed it during a GC.  Please either correct my
>interpretation of what your statement or provide the data you're basing it
>on.)
>
>One difference in GC behavior with what you're proposing is that under
>your
>algorithm you'd GC each message at least twice (once when it's received
>and
>put into Chronicle, and once when it's pulled from Chronicle and sent
>onward, plus any additional reads needed to operate on the message such as
>if a new subscriber with a non-matching selector connected to the broker)
>instead of just once under the current algorithm.  On the other hand, your
>GCs should all be from Young Gen (and cheap) whereas the current algorithm
>would likely push many of its messages to Old Gen.  Old Gen GCs are more
>expensive under ParallelGC, though they're no worse under G1GC and CMS.
>So
>it's a trade-off under ParallelGC (maybe better, maybe worse) and a loss
>under the other two.
>
>One other thing: this would give compression at rest, but not in motion,
>and it comes at the expense of two serialization/deserialization and
>compression/decompression operations per broker traversed.  Maybe being
>able to store more messages in a given amount of memory is worth it to you
>(your volumes seem a lot higher than ours, and than most installations'),
>but latency and throughput matter more to us than memory usage so we'd
>live
>with using more memory to avoid the extra operations.
>
>The question about why to use message bodies at all is an interesting one,
>though the ability to compress the body once and have it stay compressed
>through multiple network writes is a compelling reason in the near term.
>
>Tim
>On Apr 19, 2015 6:06 PM, "Kevin Burton" <bur...@spinn3r.com> wrote:
>
>> I¹ve been thinking about how messages are stored in the broker and ways
>>to
>> improve the storage in memory.
>>
>> First, right now, messages are stored in the same heap, and if you¹re
>>using
>> the memory store, like, that¹s going to add up.  This will increase GC
>> latency , and you actually need 2x more memory because you have to have
>> temp memory set aside for GCs.
>>
>> I was thinking about using Chronicle to store the messages off heap
>>using
>> direct buffers.  The downside to this is that the messages need to be
>> serialized/deserialized with each access. But realistically that¹s
>>probably
>> acceptable because you can do something like 1M message deserializations
>> per second.  Which is normally more than the throughput of the broker.
>>
>> Additionally, chronicle supports zlib or snappy compression on the
>>message
>> bodies.  So, while the broker supports message compression now, it
>>doesn¹t
>> support this feature on headers.
>>
>> This would give us header compression!
>>
>> The broker would transparently decompress the headers when reading the
>> message.
>>
>> This then begs the question, why use message bodies at all?  Why not
>>just
>> store an entire message as a set of headers?
>>
>> If you need hierarchy you can do foo.bar.cat.dog style header names.
>>
>>
>> --
>>
>> Founder/CEO Spinn3r.com
>> Location: *San Francisco, CA*
>> blog: http://burtonator.wordpress.com
>> Š or check out my Google+ profile
>> <https://plus.google.com/102718274791889610666/posts>
>> <http://spinn3r.com>
>>

Re: Storing message off heap, message compression, and storing the whole message as headers.

Reply via email to