On 5/9/2018 12:09 PM, Yongseok Koh wrote: > This is the new design of Memory Region (MR) for mlx PMD, in order to: > - Accommodate the new memory hotplug model. > - Support non-contiguous Mempool. > > There are multiple layers for MR search. > > L0 is to look up the last-hit entry which is pointed by mr_ctrl->mru (Most > Recently Used). If L0 misses, L1 is to look up the address in a fixed-sized > array by linear search. L0/L1 is in an inline function - > mlx4_mr_lookup_cache(). > > If L1 misses, the bottom-half function is called to look up the address > from the bigger local cache of the queue. This is L2 - mlx4_mr_addr2mr_bh() > and it is not an inline function. Data structure for L2 is the Binary Tree. > > If L2 misses, the search falls into the slowest path which takes locks in > order to access global device cache (priv->mr.cache) which is also a B-tree > and caches the original MR list (priv->mr.mr_list) of the device. Unless > the global cache is overflowed, it is all-inclusive of the MR list. This is > L3 - mlx4_mr_lookup_dev(). The size of the L3 cache table is limited and > can't be expanded on the fly due to deadlock. Refer to the comments in the > code for the details - mr_lookup_dev(). If L3 is overflowed, the list will > have to be searched directly bypassing the cache although it is slower. > > If L3 misses, a new MR for the address should be created - > mlx4_mr_create(). When it creates a new MR, it tries to register adjacent > memsegs as much as possible which are virtually contiguous around the > address. This must take two locks - memory_hotplug_lock and > priv->mr.rwlock. Due to memory_hotplug_lock, there can't be any > allocation/free of memory inside. > > In the free callback of the memory hotplug event, freed space is searched > from the MR list and corresponding bits are cleared from the bitmap of MRs. > This can fragment a MR and the MR will have multiple search entries in the > caches. Once there's a change by the event, the global cache must be > rebuilt and all the per-queue caches will be flushed as well. If memory is > frequently freed in run-time, that may cause jitter on dataplane processing > in the worst case by incurring MR cache flush and rebuild. But, it would be > the least probable scenario. > > To guarantee the most optimal performance, it is highly recommended to use > an EAL option - '--socket-mem'. Then, the reserved memory will be pinned > and won't be freed dynamically. And it is also recommended to configure > per-lcore cache of Mempool. Even though there're many MRs for a device or > MRs are highly fragmented, the cache of Mempool will be much helpful to > reduce misses on per-queue caches anyway. > > '--legacy-mem' is also supported. > > Signed-off-by: Yongseok Koh <ys...@mellanox.com>
<...> > +/** > + * Insert an entry to B-tree lookup table. > + * > + * @param bt > + * Pointer to B-tree structure. > + * @param entry > + * Pointer to new entry to insert. > + * > + * @return > + * 0 on success, -1 on failure. > + */ > +static int > +mr_btree_insert(struct mlx4_mr_btree *bt, struct mlx4_mr_cache *entry) > +{ > + struct mlx4_mr_cache *lkp_tbl; > + uint16_t idx = 0; > + size_t shift; > + > + assert(bt != NULL); > + assert(bt->len <= bt->size); > + assert(bt->len > 0); > + lkp_tbl = *bt->table; > + /* Find out the slot for insertion. */ > + if (mr_btree_lookup(bt, &idx, entry->start) != UINT32_MAX) { > + DEBUG("abort insertion to B-tree(%p):" > + " already exist at idx=%u [0x%lx, 0x%lx) lkey=0x%x", > + (void *)bt, idx, entry->start, entry->end, entry->lkey); This and various other logs causing 32bits build error because of %lx usage. Can you please check them? I am feeling sad to complain a patch like this just because of log format issue, we should find a solution to this issue as community, either checkpatch checks or automated 32bit builds, I don't know.