Hello hackers, David Rowley posted some interesting stuff about an ArrayList[1] that would replace List in many places. (I probably would have replied to that thread if I hadn't changed email account, sorry.) I had similar thoughts when I needed to maintain a sorted vector for my refactoring proposal for the fsync queue, as part of my undo log work, except I came at the problem with some ideas from C++ in mind. However, now this stuff seems to be redundant for now, given some decisions we've arrived at in the thread about how to track the fsync queue, and I'm more focused on getting the undo work done.
I wanted to share what I had anyway, since it seems to be related to David's proposal but make some different trade-offs. Perhaps it will be useful for ideas, code, or examples of what not to do :-) Or perhaps I'll come back to it later. Specialising the comparator and element size of functions like qsort(), binary_search(), unique() is a well known technique for going faster, and the resulting programming environment has better type safety. It's easy to measure a speedup for specialised qsort of small objects (compared to pg_qsort() with function pointer and object size as arguments). Doing that in a way that works well with automatically managed vectors (and also any array, including in shmem etc) seemed like a good plan to me, hence this prototyping work. The basic idea is to use simplehash-style generic programming, like a kind of poor man's C++ standard library. The vector type is instantiated for a given type like Oid etc, and then you can instantiate specialised qsort() etc. The vector has a 'small vector optimisation' where you can hold very small lists without allocating any extra memory until you get past (say) 3 members. I was planning to extend the qsort implementation a bit further to that it could replace both pg_qsort() and the Perl-based tuple qsort generator, and then we'd have only one copy of the algorithm in the tree (the dream of generic programmers). I wanted to do the same with unique(), since I'd already noticed that we have many open coded examples of that algorithm scattered throughout the tree. Some of the gratuitous C++isms should be removed including some nonsense about const qualifiers, use of begin and end rather than data and size (more common in C code), some some other details, and I was planning to fix some of that before I reposted the patch set as part of the larger undo patch set, but now that I'm not going to do that... here is a snapshot of the patch set as-is, with toy examples showing several examples of List-of-Oid relationOids being replaced with a specialised vector (with a missed opportunity for it to be sorted and use binary_search() instead of find()), and the List-of-Node p_joinexprs being replaced with a specialised vector. I think those last things are the sort of thing that David was targeting. [1] https://www.postgresql.org/message-id/flat/CAKJS1f_2SnXhPVa6eWjzy2O9A%3Docwgd0Cj-LQeWpGtrWqbUSDA%40mail.gmail.com -- Thomas Munro https://enterprisedb.com
0001-Add-parameterized-vectors-and-sorting-searching-supp.patch
Description: Binary data
0002-A-few-extra-tricks-for-simplevector.h.patch
Description: Binary data
0003-Convert-type-of-various-relationOids-members-List-oi.patch
Description: Binary data
0004-More-simplevector-improvements.patch
Description: Binary data
0005-Convert-p_joinexprs-from-a-List-to-a-nodep_vector.patch
Description: Binary data