Hi Jonathon, On Sat, 2019-08-24 at 20:10 -0500, Jonathon Anderson wrote: > On Sat, Aug 24, 2019 at 6:24 PM, Mark Wielaard <m...@klomp.org> wrote: > > But what if realloc moves the block? > > Then all dbg->mem_tails[thread_id] pointers become invalid. > > And this function drops the lock before returning a libdw_memblock*. > > Not quite, dbg->mem_tails is an array of pointers (note the ** in its > definition, and the use of the plural "tails"). So while the > dbg->mem_tails pointer becomes invalid, the dbg->mem_tails[thread_id] > pointers don't. > > It would be a different story if dbg->mem_tails was an array of > libdw_memblocks, but its not.
Aha. Yes, they are pointers to the mem_blocks, not the mem_blocks themselves. The pointer values are moved, but never changed. So this is indeed fine. I was confused. > That would increase the "memory leak" > issue mentioned previously (to ~4K per dead thread) and require more > work on the part of the reallocing thread to initialize the new entries > (which at the moment should reduce to a memset, assuming the compiler > is smart enough and NULL == 0). > > > > > So I think the lock needs to extend beyond this function somehow. Or > > we need to prevent another thread reallocing while we are dealing with > > the assigned memblock. > > No need, in fact we want to drop the lock as soon as possible to let > new threads in for realloc's. Under the assumption that we don't need > to allocate new blocks (extremely) often, the extra cost to relock when > we do should be relatively small. Also see __libdw_allocate. Yes, now that I have a better (correct) mental picture of the data structure, this makes total sense. > As mentioned in other mails, this management scheme isn't (always) > optimally memory efficient, but its speed is good under parallel > stress. Far better than wrapping the whole thing with a single > pthread_mutex_t. I wouldn't tweak it too much if you know it is working correctly now. We do have to make sure it doesn't slow down the single-threaded use case (since most programs still are single threaded). There is the new locking, which slows things down. But the memory use should be about the same since you don't duplicate the mem_blocks till there is access from multiple threads. If there isn't much parallel stress or allocation in practice. That is, even in multi-threaded programs, there is still just one thread that does most/all of the allocations. Then you could maybe optimize a bit by having one "owner" thread for a Dwarf. That would be the first that hits __libdw_alloc_tail. For that "owner thread" you then setup an owner thread_id field, and have one special mem_block allocated. You only expand the mem_blocks once another thread does an allocation. Then the memory use in multi-threaded programs, where only one thread handles a Dwarf object (or happens to allocate) would be the same as for single-threaded programs. But this depends on the usage pattern. Which might vary very between programs. So don't try it, till you know it actually helps for a real world program. And the extra memory use, might not really matter that much in practice anyway. Cheers, Mark