Ok, thank you! Looking forward to the next release. On 6 August 2013 17:32, Evan Vigil-McClanahan <emcclana...@basho.com> wrote:
> 11 + 4 + 16, so 31. > > 18 bytes there are the actual data, so that can't go away. Since the > allocation sized are going to be word aligned, the least overhead > there is going to be word aligning the entire structure, i.e. where > (key_len + bucket_len + 2 + 18) % 8 == 0, but that sort of > optimization only works with fixed length keys. > > Khash overheads are expected to be small-ish, but are > under-researched, at least by me. I suspect most of the overhead is > coming from the allocator. So moving to tcmalloc is a possible win > there, because it does a better job keeping amortized per-allocation > overheads low for small allocations than libc's malloc, but of course > with the caveats mentioned in my last email (tl;dr test > *exhaustively*, because we don't and likely won't). > > Another possible improvement would be to move to a fixed-length > structure (that points to an allocated location for oversized > key-bucket binaries, but that has a very bad pathological case where > someone selects all keys larger than your fixed size, where you have > the fixed len - 8 as an additional overhead. > > On Tue, Aug 6, 2013 at 2:56 AM, Alexander Ilyin <alexan...@rutarget.ru> > wrote: > > So if you will succeed with all your patches the memory overhead will > > decrease by 22 (=16+4+2) bytes, am I right? > > > > > > On 5 August 2013 16:38, Evan Vigil-McClanahan <emcclana...@basho.com> > wrote: > >> > >> Before I'd done the research, I too thought that the overheads were a > >> much lower, near to what the calculator said, but not too far off. > >> > >> There are a few things that I plan on addressing this release cycle: > >> - 16b per-allocation overhead from using enif_alloc. This allows us > >> a lot of flexibility about which allocator to use, but I suspect that > >> since allocation speed isn't a big bitcask bottleneck, this overhead > >> simply isn't worth it. > >> - 13b per value overhead from naive serialization of the bucket/key > >> value. I have a branch that reduced this by 11 bytes. > >> - 4b per value overhead from a single bit flag that is stored in an > >> int. No patch for this thus far. > >> > >> Additionally, I've found that running with tcmalloc using LD_PRELOAD > >> reduces the cost for bitcask's many allocations, but a) I've never > >> done so in production and b) they say that it never releases memory, > >> which is worrying, although the paging system theoretically should > >> take care of it fairly easily as long as their page usages isn't > >> insane. > >> > >> My original notes looked like this: > >> > >> 1) ~32 bytes for the OS/malloc + khash overhead @ 50M keys > >> (amortized, so bigger for fewer keys, smaller for more keys). > >> 2) + 16 bytes of erlang allocator overhead > >> 3) + 22 bytes for the NIF C structure > >> 4) + 8 bytes for the entry pointer stored in the khash > >> 5) + 13 bytes of kv overhead > >> > >> tcmalloc does what it can for line 1. > >> My patches do what I can for lines 2, 3, and 5. > >> > >> 4 isn't amenable to anything other than a change in the way the keydir > >> is stored, which could also potentially help with 1 (fewer > >> allocations, etc). That, unfortunately, is not very likely to happen > >> soon. > >> > >> So things will get better relatively soon, but there are some > >> architectural limits that will be harder to address. > >> > >> On Mon, Aug 5, 2013 at 1:49 AM, Alexander Ilyin <alexan...@rutarget.ru> > >> wrote: > >> > Evan, > >> > > >> > News about per key overhead of 91 bytes are quite frustrating. When we > >> > were > >> > choosing a key value storage per key metadata size was a crucial point > >> > for > >> > us. We have a simple use case but a lot of data (hundreds of millions > of > >> > items) so we were looking for the ways to reduce memory consumption. > >> > Here and here is stated a value of 40 bytes. 22 bytes in ram > calculator > >> > seemed like a mistake because the following example obviously uses a > >> > value > >> > of 40. > >> > > >> > Anyway, thanks for your response. > >> > > >> > > >> > On 4 August 2013 04:39, Evan Vigil-McClanahan <emcclana...@basho.com> > >> > wrote: > >> >> > >> >> Some responses inline. > >> >> > >> >> On Fri, Aug 2, 2013 at 3:11 AM, Alexander Ilyin < > alexan...@rutarget.ru> > >> >> wrote: > >> >> > Hi, > >> >> > > >> >> > I have a few questions about Riak memory usage. > >> >> > We're using Riak 1.3.1 on a 3 node cluster. According to bitcask > >> >> > capacity > >> >> > calculator > >> >> > > >> >> > > >> >> > ( > http://docs.basho.com/riak/1.3.1/references/appendices/Bitcask-Capacity-Planning/ > ) > >> >> > Riak should use about 30Gb of RAM for out data. Actually, it uses > >> >> > about > >> >> > 45Gb > >> >> > and I can't figure out why. I'm looking at %MEM column in top on > each > >> >> > node > >> >> > for a beam.smp process. > >> >> > >> >> I've recently done some research on this and have filed bugs against > >> >> the calculator, it's a bit wrong and has been that way for a while: > >> >> > >> >> https://github.com/basho/basho_docs/issues/467 > >> >> > >> >> The numbers there look a bit closer to what you're seeing. > >> >> > >> >> The good news is that I am looking into reducing memory consumption > >> >> this development cycle and our next release should see some > >> >> improvements on that front. The bad news is that it may be a while. > >> >> If you want to watch the bitcask repo on github to see when these > >> >> changes go in, it's usually pretty easy to build a new bitcask and > >> >> replace the one that you're running. > >> >> > >> >> > Disk usage is also about 1,5 times more than I have expected (270Gb > >> >> > instead > >> >> > of 180Gb). I rechecked that I have n_val=2 (not 3), it seems > alright. > >> >> > Why > >> >> > this could happen? > >> >> > >> >> There is definitely some overhead on the stored values, especially > >> >> when you're using bitcask. How big are your values? Overheads, if I > >> >> recall correctly, run to a few hundred bytes, but I'll have to ask > >> >> some people to refresh my memory. > >> >> > >> >> > Second question is about performance degradation when Riak uses > >> >> > almost > >> >> > all > >> >> > available memory on the node. We see that 95/99 put percentiles > twice > >> >> > as > >> >> > large for nodes which don't have much free RAM. How much free > memory > >> >> > I > >> >> > should have to keep performance high? > >> >> > >> >> I don't have a good answer for this; when I was working as a CSE we > >> >> generally urged people to start adding nodes when their most limited > >> >> resource (memory, disk, cpu, etc) was 70-80% utilized (as a grossly > >> >> oversimplified rule of thumb). > >> >> > >> >> > And the last question about memory_total metric. riak-admin status > >> >> > returns > >> >> > value which is less than actual memory consumption as seen in the > >> >> > top. > >> >> > According to memory_total description > >> >> > > >> >> > > >> >> > ( > http://docs.basho.com/riak/1.3.1/references/appendices/Inspecting-a-Node/) > >> >> > they should be equal. Why they are not? > >> >> > >> >> Top factors in OS/libc overheads that memory_total cannot see. I'll > >> >> check out the docs and get them amended if they're wrong. > >> > > >> > > > > > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com