Hello, here some macros for list_make, now we can using list_make(...), not list_make1/2/3 ...
#define MACRO_ARGS(...) __VA_ARGS__ #define LIST_MAKE_1_(narg_, postfix_, ...) list_make ## narg_ ## postfix_(__VA_ARGS__) #define LIST_MAKE_2_(...) LIST_MAKE_1_(__VA_ARGS__) #define LIST_MAKE_3_(...) LIST_MAKE_2_(__VA_ARGS__) #define list_make(...) LIST_MAKE_3_(MACRO_ARGS VA_ARGS_NARGS(__VA_ARGS__), /*empty*/, __VA_ARGS__) #define list_make_int(...) LIST_MAKE_3_(MACRO_ARGS VA_ARGS_NARGS(__VA_ARGS__), _int, __VA_ARGS__) #define list_make_oid(...) LIST_MAKE_3_(MACRO_ARGS VA_ARGS_NARGS(__VA_ARGS__), _oid, __VA_ARGS__) macro VA_ARGS_NARGS defined in c.h How to work: for list_make_int(4,5,6) step 1: LIST_MAKE_3_(MACRO_ARGS VA_ARGS_NARGS(4,5,6), _int, 4,5,6) setp 2: LIST_MAKE_2_(MACRO_ARGS (3), _int, 4,5,6) step 3: LIST_MAKE_1_(3, _int, 4,5,6) step 4: list_make3_int(4,5,6) step 5: list_make3_impl(T_IntList, ((ListCell) {.int_value = (4)}), ((ListCell) {.int_value = (5)}), ((ListCell) {.int_value = (6)})) Or we can define some public macros, like this: #define MACRO_ARGS(...) __VA_ARGS__ #define MACRO_COMBIN_1(prefix_, center_, postfix_, ...) prefix_ ## center_ ## postfix_(__VA_ARGS__) #define MACRO_COMBIN_2(...) MACRO_COMBIN_1(__VA_ARGS__) #define MACRO_COMBIN_3(...) MACRO_COMBIN_2(__VA_ARGS__) #define list_make(...) MACRO_COMBIN_3(list_make, MACRO_ARGS VA_ARGS_NARGS(__VA_ARGS__), /*empty*/, __VA_ARGS__) #define list_make_int(...) MACRO_COMBIN_3(list_make, MACRO_ARGS VA_ARGS_NARGS(__VA_ARGS__), _int, __VA_ARGS__) #define list_make_oid(...) MACRO_COMBIN_3(list_make, MACRO_ARGS VA_ARGS_NARGS(__VA_ARGS__), _oid, __VA_ARGS__) bu...@sohu.com From: Tom Lane Date: 2019-02-24 10:24 To: pgsql-hackers Subject: POC: converting Lists into arrays For quite some years now there's been dissatisfaction with our List data structure implementation. Because it separately palloc's each list cell, it chews up lots of memory, and it's none too cache-friendly because the cells aren't necessarily adjacent. Moreover, our typical usage is to just build a list by repeated lappend's and never modify it, so that the flexibility of having separately insertable/removable list cells is usually wasted. Every time this has come up, I've opined that the right fix is to jack up the List API and drive a new implementation underneath, as we did once before (cf commit d0b4399d81). I thought maybe it was about time to provide some evidence for that position, so attached is a POC patch that changes Lists into expansible arrays, while preserving most of their existing API. The big-picture performance change is that this makes list_nth() a cheap O(1) operation, while lappend() is still pretty cheap; on the downside, lcons() becomes O(N), as does insertion or deletion in the middle of a list. But we don't use lcons() very much (and maybe a lot of the existing usages aren't really necessary?), while insertion/deletion in long lists is a vanishingly infrequent operation. Meanwhile, making list_nth() cheap is a *huge* win. The most critical thing that we lose by doing this is that when a List is modified, all of its cells may need to move, which breaks a lot of code that assumes it can insert or delete a cell while hanging onto a pointer to a nearby cell. In almost every case, this takes the form of doing list insertions or deletions inside a foreach() loop, and the pointer that's invalidated is the loop's current-cell pointer. Fortunately, with list_nth() now being so cheap, we can replace these uses of foreach() with loops using an integer index variable and fetching the next list element directly with list_nth(). Most of these places were loops around list_delete_cell calls, which I replaced with a new function list_delete_nth_cell to go along with the emphasis on the integer index. I don't claim to have found every case where that could happen, although I added debug support in list.c to force list contents to move on every list modification, and this patch does pass check-world with that support turned on. I fear that some such bugs remain, though. There is one big way in which I failed to preserve the old API syntactically: lnext() now requires a pointer to the List as well as the current ListCell, so that it can figure out where the end of the cell array is. That requires touching something like 150 places that otherwise wouldn't have had to be touched, which is annoying, even though most of those changes are trivial. I thought about avoiding that by requiring Lists to keep a "sentinel" value in the cell after the end of the active array, so that lnext() could look for the sentinel to detect list end. However, that idea doesn't really work, because if the list array has been moved, the spot where the sentinel had been could have been reallocated and filled with something else. So this'd offer no defense against the possibility of a stale ListCell pointer, which is something that we absolutely need defenses for. As the patch stands we can have quite a strong defense, because we can check whether the presented ListCell pointer actually points into the list's current data array. Another annoying consequence of lnext() needing a List pointer is that the List arguments of foreach() and related macros are now evaluated each time through the loop. I could only find half a dozen places where that was actually unsafe (all fixed in the draft patch), but it's still bothersome. I experimented with ways to avoid that, but the only way I could get it to work was to define foreach like this: #define foreach(cell, l) for (const List *cell##__foreach = foreach_setup(l, &cell); cell != NULL; cell = lnext(cell##__foreach, cell)) static inline const List * foreach_setup(const List *l, ListCell **cell) { *cell = list_head(l); return l; } That works, but there are two problems. The lesser one is that a not-very-bright compiler might think that the "cell" variable has to be forced into memory, because its address is taken. The bigger one is that this coding forces the "cell" variable to be exactly "ListCell *"; you can't add const or volatile qualifiers to it without getting compiler warnings. There are actually more places that'd be affected by that than by the need to avoid multiple evaluations. I don't think the const markings would be a big deal to lose, and the two places in do_autovacuum that need "volatile" (because of a nearby PG_TRY) could be rewritten to not use foreach. So I'm tempted to do that, but it's not very pretty. Maybe somebody can think of a better solution? There's a lot of potential follow-on work that I've not touched yet: 1. This still involves at least two palloc's for every nonempty List, because I kept the header and the data array separate. Perhaps it's worth allocating those in one palloc. However, right now there's an assumption that the header of a nonempty List doesn't move when you change its contents; that's embedded in the API of the lappend_cell functions, and more than likely there's code that depends on that silently because it doesn't bother to store the results of other List functions back into the appropriate pointer. So if we do that at all I think it'd be better tackled in a separate patch; and I'm not really convinced it's worth the trouble and extra risk of bugs. 2. list_qsort() is now absolutely stupidly defined. It should just qsort the list's data array in-place. But that requires an API break for the caller-supplied comparator, since there'd be one less level of indirection. I think we should change this, but again it'd be better done as an independent patch to make it more visible in the git history. 3. There are a number of places where we've built flat arrays paralleling Lists, such as the planner's simple_rte_array. That's pointless now and could be undone, buying some code simplicity. Various other micro-optimizations could doubtless be done too; I've not looked hard. I haven't attempted any performance measurements on this, but at least in principle it should speed things up a bit, especially for complex test cases involving longer Lists. I don't have any very suitable test cases at hand, anyway. I think this is too late for v12, both because of the size of the patch and because of the likelihood that it introduces a few bugs. I'd like to consider pushing it early in the v13 cycle, though. regards, tom lane