On Thu, Jan 26, 2023 at 7:15 PM David Rowley <dgrowle...@gmail.com> wrote: > > On Thu, 26 Jan 2023 at 23:29, John Naylor <john.nay...@enterprisedb.com> wrote: > > Coming back to this, I wanted to sketch out this idea in a bit more detail. > > > > Have two memtuple arrays, one for first sortkey null and one for first sortkey non-null: > > - Qsort the non-null array, including whatever specialization is available. Existing specialized comparators could ignore nulls (and their ordering) taking less space in the binary. > > - Only if there is more than one sort key, qsort the null array. Ideally at some point we would have a method of ignoring the first sortkey (this is an existing opportunity that applies elsewhere as well). > > - To handle two arrays, grow_memtuples() would need some adjustment, as would any callers that read the final result of an in-memory sort -- they would need to retrieve the tuples starting with the appropriate array depending on NULLS FIRST/LAST behavior. > > Thanks for coming back to this. I've been thinking about this again > recently due to what was discovered in [1].
That was indeed part of the motivation for bringing this up. > Why I'm bringing this up here is that I wondered, in addition to what > you're mentioning above, if we're making some changes to allow > multiple in-memory arrays, would it be worth going a little further > and allowing it so we could have N arrays which we'd try to keep under > L3 cache size. Maybe some new GUC could be used to know what a good > value is. With fast enough disks, it's often faster to use smaller > values of work_mem which don't exceed L3 to keep these batches > smaller. Keeping them in memory would remove the need to write out > tapes. That's interesting. I don't know enough to guess how complex it would be to make "external" merges agnostic about whether the tapes are on disk or in memory. If in-memory sorts were designed analogously to external ones, grow_memtuples would never have to repalloc, it could just allocate a new tape, which could further shave some cycles. My hunch in my last email was that having separate groups of tapes for each null/non-null first sortkey would be complex, because it increases the number of places that have to know about nulls and their ordering. If we wanted to go to N arrays instead of 2, and additionally wanted separate null/non-null treatment, it seems we would still need 2 sets of arrays, one with N non-null-first-sortkey tapes and one with M null-first-sortkey tapes. So using 2 arrays seems like the logical first step. I'd be curious to hear other possible development paths. > I think the slower sorts I found in [2] could also be partially caused > by the current sort specialisation comparators re-comparing the > leading column during a tie-break. I've not gotten around to disabling > the sort specialisations to see if and how much this is a factor for > that test. Right, that's worth addressing independently of the window function consideration. I'm still swapping this area back in my head, but I believe one issue is that state->base.onlyKey signals two things: "one sortkey, not abbreviated". We could add a separate branch for "first key unabbreviated, nkeys>1" -- I don't think we'd need to specialize, just branch -- and instead of state->base.comparetup, call a set of analogous functions that only handle keys 2 and above (comparetup_tail_* ? or possibly just add a boolean parameter compare_first). That would not pose a huge challenge, I think, since they're already written like this: /* Compare the leading sort key */ compare = ApplySortComparator(...); if (compare != 0) return compare; /* Compare additional sort keys */ ... The null/non-null separation would eliminate a bunch of branches in inlined comparators, so we could afford to add another branch for number of keys. I haven't thought through either of these ideas in the gory detail, but I don't yet see any big obstacles. -- John Naylor EDB: http://www.enterprisedb.com