On Thu, Apr 5, 2018 at 5:02 PM, Heikki Linnakangas <hlinn...@iki.fi> wrote: > On 03/04/18 17:20, Claudio Freire wrote: >> >> Ok, rebased patches attached > > > Thanks! I took a look at this. > > First, now that the data structure is more complicated, I think it's time to > abstract it, and move it out of vacuumlazy.c. The Tid Map needs to support > the following operations: > > * Add TIDs, in order (in 1st phase of vacuum) > * Random lookup, by TID (when scanning indexes) > * Iterate through all TIDs, in order (2nd pass over heap) > > Let's add a new source file to hold the code for the tid map data structure, > with functions corresponding those operations. > > I took a stab at doing that, and I think it makes vacuumlazy.c nicer.
About the refactoring to split this into their own set of files and abstract away the underlying structure, I can totally get behind that. The iteration interface, however, seems quite specific for the use case of vacuumlazy, so it's not really a good abstraction. It also copies stuff a lot, so it's quite heavyweight. I'd suggest trying to go for a lighter weight interface with less overhead that is more general at the same time. If it was C++, I'd say build an iterator class. C would do it probably with macros, so you can have a macro to get to the current element, another to advance to the next element, and another to check whether you've reached the end. I can do that if we agree on the points below: > Secondly, I'm not a big fan of the chosen data structure. I think the only > reason that the segmented "multi-array" was chosen is that each "segment" > works is similar to the simple array that we used to have. After putting it > behind the abstraction, it seems rather ad hoc. There are many standard > textbook data structures that we could use instead, and would be easier to > understand and reason about, I think. > > So, I came up with the attached patch. I used a B-tree as the data > structure. Not sure it's the best one, I'm all ears for suggestions and > bikeshedding on alternatives, but I'm pretty happy with that. I would expect > it to be pretty close to the simple array with binary search in performance > characteristics. It'd be pretty straightforward to optimize further, and > e.g. use a bitmap of OffsetNumbers or some other more dense data structure > in the B-tree leaf pages, but I resisted doing that as part of this patch. About the B-tree, however, I don't think a B-tree is a good idea. Trees' main benefit is that they can be inserted to efficiently. When all your data is loaded sequentially, in-order, in-memory and immutable; the tree is pointless, more costly to build, and harder to maintain - in terms of code complexity. In this use case, the only benefit of B-trees would be that they're optimized for disk access. If we planned to store this on-disk, perhaps I'd grant you that. But we don't plan to do that, and it's not even clear doing it would be efficient enough for the intended use. On the other side, using B-trees incurs memory overhead due to the need for internal nodes, can fragment memory because internal nodes aren't the same size as leaf nodes, is easier to get wrong and introduce bugs... I don't see a gain. If you propose its use, at least benchmark it to show some gain. So I don't think B-tree is a good idea, the sorted array already is good enough, and if not, it's at least close to the earlier implementation and less likely to introduce bugs. Furthermore, among the 200-ish messages this thread has accumulated, better ideas have been proposed, better because they do use less memory and are faster (like using bitmaps when possible), but if we can't push a simple refactoring first, there's no chance a bigger rewrite will fare better. Remember, in this use case, using less memory far outweights any other consideration. Less memory directly translates to less iterations over the indexes, because more can be crammed into m_w_m, which is a huge time saving. Far more than any micro-optimization. About 2 years ago, I chose to try to push this simple algorithm first, then try to improve on it with better data structures. Nobody complained at the time (I think, IIRC), and I don't think it fair to go and revisit that now. It just delays getting a solution for this issue for the persuit of "the perfect implementaiton" that might never arrive. Or even if it doesn, there's nothing stopping us from pushing another patch in the future with that better implementation if we wish. Lets get something simple and proven first. > I haven't done any performance testing of this (and not much testing in > general), but at least the abstraction seems better this way. Performance > testing would be good, too. In particular, I'd like to know how this might > affect the performance of lazy_tid_reaped(). That's a hot spot when > vacuuming indexes, so we don't want to add any cycles there. Was there any > ready-made test kits on that in this thread? I didn't see any at a quick > glance, but it's a long thread.. If you dig old messages in the thread, I had attached the scripts I used for benchmarking this. I'm attaching again one version of them (I've been modifying it to suit my purposes at each review round), you'll probably want to tweak it to build test cases good for your purpose here.
vacuumbench.sh
Description: application/shellscript