On Wed, 25 Apr 2018 17:02:48 +0100 "Burakov, Anatoly" <anatoly.bura...@intel.com> wrote:
> On 14-Feb-18 10:07 AM, Burakov, Anatoly wrote: > > On 14-Feb-18 8:04 AM, Thomas Monjalon wrote: > >> Hi Anatoly, > >> > >> 19/12/2017 12:14, Anatoly Burakov: > >>> * Memory tagging. This is related to previous item. Right now, we > >>> can only ask > >>> malloc to allocate memory by page size, but one could potentially > >>> have > >>> different memory regions backed by pages of similar sizes (for > >>> example, > >>> locked 1G pages, to completely avoid TLB misses, alongside > >>> regular 1G pages), > >>> and it would be good to have that kind of mechanism to > >>> distinguish between > >>> different memory types available to a DPDK application. One > >>> could, for example, > >>> tag memory by "purpose" (i.e. "fast", "slow"), or in other ways. > >> > >> How do you imagine memory tagging? > >> Should it be a parameter when requesting some memory from rte_malloc > >> or rte_mempool? > > > > We can't make it a parameter for mempool without making it a parameter > > for rte_malloc, as every memory allocation in DPDK works through > > rte_malloc. So at the very least, rte_malloc will have it. And as long > > as rte_malloc has it, there's no reason why memzones and mempools > > couldn't - not much code to add. > > > >> Could it be a bit-field allowing to combine some properties? > >> Does it make sense to have "DMA" as one of the purpose? > > > > Something like a bitfield would be my preference, yes. That way we could > > classify memory in certain ways and allocate based on that. Which > > "certain ways" these are, i'm not sure. For example, in addition to > > tagging memory as "DMA-capable" (which i think is a given), one might > > tag certain memory as "non-default", as in, never allocate from this > > chunk of memory unless explicitly asked to do so - this could be useful > > for types of memory that are a precious resource. > > > > Then again, it is likely that we won't have many types of memory in > > DPDK, and any other type would be implementation-specific, so maybe just > > stringly-typing it is OK (maybe we can finally make use of "type" > > parameter in rte_malloc!). > > > >> > >> How to transparently allocate the best memory for the NIC? > >> You take care of the NUMA socket property, but there can be more > >> requirements, like getting memory from the NIC itself. > > > > I would think that we can't make it generic enough to cover all cases, > > so it's best to expose some API's and let PMD's handle this themselves. > > > >> > >> +Cc more people (6WIND, Cavium, Chelsio, Mellanox, Netronome, NXP, > >> Solarflare) > >> in order to trigger a discussion about the ideal requirements. > >> > > > > Hi all, > > I would like to restart this discussion, again :) I would like to hear > some feedback on my thoughts below. > > I've had some more thinking about it, and while i have lots of use-cases > in mind, i suspect covering them all while keeping a sane API is > unrealistic. > > So, first things first. > > Main issue we have is the 1:1 correspondence of malloc heap, and socket > ID. This has led to various attempts to hijack socket id's to do > something else - i've seen this approach a few times before, most > recently in a patch by Srinath/Broadcom [1]. We need to break this > dependency somehow, and have a unique heap identifier. > > Also, since memory allocators are expected to behave roughly similar to > drivers (e.g. have a driver API and provide hooks for init/alloc/free > functions, etc.), a request to allocate memory may not just go to the > heap itself (which is handled internally by rte_malloc), but also go to > its respective allocator. This is roughly similar to what is happening > currently, except that which allocator functions to call will then > depend on which driver allocated that heap. > > So, we arrive at a dependency - heap => allocator. Each heap must know > to which allocator it belongs - so, we also need some kind of way to > identify not just the heap, but the allocator as well. > > In the above quotes from previous mails i suggested categorizing memory > by "types", but now that i think of it, the API would've been too > complex, as we would've ideally had to cover use cases such as "allocate > memory of this type, no matter from which allocator it comes from", > "allocate memory from this particular heap", "allocate memory from this > particular allocator"... It gets complicated pretty fast. > > What i propose instead, is this. In 99% of time, user wants our hugepage > allocator. So, by default, all allocations will come through that. In > the event that user needs memory from a specific heap, we need to > provide a new set of API's to request memory from a specific heap. > > Do we expect situations where user might *not* want default allocator, > but also *not* know which exact heap he wants? If the answer is no > (which i'm counting on :) ), then allocating from a specific malloc > driver becomes as simple as something like this: > > mem = rte_malloc_from_heap("my_very_special_heap"); > > (stringly-typed heap ID is just an example) > > So, old API's remain intact, and are always passed through to a default > allocator, while new API's will grant access to other allocators. > > Heap ID alone, however, may not provide enough flexibility. For example, > if a malloc driver allocates a specific kind of memory that is > NUMA-aware, it would perhaps be awkward to call different heap ID's when > the memory being allocated is arguably the same, just subdivided into > several blocks. Moreover, figuring out situations like this would likely > require some cooperation from the allocator itself (possibly some > allocator-specific API's), but should we add malloc heap arguments, > those would have to be generic. I'm not sure if we want to go that far, > though. > > Does that sound reasonable? > > Another tangentially related issue raised by Olivier [1] is of > allocating memory in blocks, rather than using rte_malloc. Current > implementation has rte_malloc storing its metadata right in the memory - > this leads to unnecessary memory fragmentation in certain cases, such as > allocating memory page-by-page, and in general polluting memory we might > not want to pollute with malloc metadata. > > To fix this, memory allocator would have to store malloc data > externally, which comes with a few caveats (reverse mapping of pointers > to malloc elements, storing, looking up and accounting for said > elements, etc.). It's not currently planned to work on it, but it's > certainly something to think about :) > > [1] http://dpdk.org/dev/patchwork/patch/36596/ > [2] http://dpdk.org/ml/archives/dev/2018-March/093212.html Maybe the existing rte_malloc which tries to always work like malloc is not the best API for applications? I always thought the Samba talloc API was less error prone since it supports reference counting and hierarchal allocation.