On Fri, Feb 16, 2024 at 10:05 AM Masahiko Sawada <sawada.m...@gmail.com> wrote: > > v61-0007: Runtime-embeddable tids -- Optional for v17, but should > > reduce memory regressions, so should be considered. Up to 3 tids can > > be stored in the last level child pointer. It's not polished, but I'll > > only proceed with that if we think we need this. "flags" iis called > > that because it could hold tidbitmap.c booleans (recheck, lossy) in > > the future, in addition to reserving space for the pointer tag. Note: > > I hacked the tests to only have 2 offsets per block to demo, but of > > course both paths should be tested. > > Interesting. I've run the same benchmark tests we did[1][2] (the > median of 3 runs): > > monotonically ordered int column index: > > master: system usage: CPU: user: 14.91 s, system: 0.80 s, elapsed: 15.73 s > v-59: system usage: CPU: user: 9.67 s, system: 0.81 s, elapsed: 10.50 s > v-62: system usage: CPU: user: 1.94 s, system: 0.69 s, elapsed: 2.64 s
Hmm, that's strange -- this test is intended to delete all records from the last 20% of the blocks, so I wouldn't expect any improvement here, only in the sparse case. Maybe something is wrong. All the more reason to put it off... > I'm happy to see a huge improvement. While it's really fascinating to > me, I'm concerned about the time left until the feature freeze. We > need to polish both tidstore and vacuum integration patches in 5 > weeks. Personally I'd like to have it as a separate patch for now, and > focus on completing the main three patches since we might face some > issues after pushing these patches. I think with 0007 patch it's a big > win but it's still a win even without 0007 patch. Agreed to not consider it for initial commit. I'll hold on to it for some future time. > > 2. Management of memory contexts. It's pretty verbose and messy. I > > think the abstraction could be better: > > A: tidstore currently passes CurrentMemoryContext to RT_CREATE, so we > > can't destroy or reset it. That means we have to do a lot of manual > > work. > > B: Passing "max_bytes" to the radix tree was my idea, I believe, but > > it seems the wrong responsibility. Not all uses will have a > > work_mem-type limit, I'm guessing. We only use it for limiting the max > > block size, and aset's default 8MB is already plenty small for > > vacuum's large limit anyway. tidbitmap.c's limit is work_mem, so > > smaller, and there it makes sense to limit the max blocksize this way. > > C: The context for values has complex #ifdefs based on the value > > length/varlen, but it's both too much and not enough. If we get a bump > > context, how would we shoehorn that in for values for vacuum but not > > for tidbitmap? > > > > Here's an idea: Have vacuum (or tidbitmap etc.) pass a context to > > TidStoreCreate(), and then to RT_CREATE. That context will contain the > > values (for local mem), and the node slabs will be children of the > > value context. That way, measuring memory usage and free-ing can just > > call with this parent context, and let recursion handle the rest. > > Perhaps the passed context can also hold the radix-tree struct, but > > I'm not sure since I haven't tried it. What do you think? > > If I understand your idea correctly, RT_CREATE() creates the context > for values as a child of the passed context and the node slabs as > children of the value context. That way, measuring memory usage can > just call with the value context. It sounds like a good idea. But it > was not clear to me how to address point B and C. For B & C, vacuum would create a context to pass to TidStoreCreate, and it wouldn't need to bother changing max block size. RT_CREATE would use that directly for leaves (if any), and would only create child slab contexts under it. It would not need to know about max_bytes. Modifyng your diagram a bit, something like: - caller-supplied radix tree memory context (the 3 structs -- and leaves, if any) (aset (or future bump?)) - node slab contexts This might only be workable with aset, if we need to individually free the structs. (I haven't studied this, it was a recent idea) It's simpler, because with small fixed length values, we don't need to detect that and avoid creating a leaf context. All leaves would live in the same context as the structs. > Another variant of this idea would be that RT_CREATE() creates the > parent context of the value context to store radix-tree struct. That > is, the hierarchy would be like: > > A MemoryContext (passed by vacuum through tidstore) > - radix tree memory context (store radx-tree struct, control > struct, and iterator) > - value context (aset, slab, or bump) > - node slab contexts The template handling the value context here is complex, and is what I meant by 'C' above. Most fixed length allocations in all of the backend are aset, so it seems fine to use it always. > Freeing can just call with the radix tree memory context. And perhaps > it works even if tidstore passes CurrentMemoryContex to RT_CREATE()? Seems like it would, but would keep some complexity, as I mentioned.