On Sat, 17 Jul 2021 at 01:14, James Coleman <jtc...@gmail.com> wrote: > The only remaining question I have is whether or not costing needs to > change, given the very significant speedup for datum sort.
I'm looking at cost_tuplesort and the only thing that I think might make sense would be to adjust how the input_bytes value is calculated. For now, that's done with the following function that's used in quite a number of places. static double relation_byte_size(double tuples, int width) { return tuples * (MAXALIGN(width) + MAXALIGN(SizeofHeapTupleHeader)); } It seems, at least in the case of Sort, that using SizeofHeapTupleHead is just always wrong as it should be SizeofMinimalTupleHeader. I know that's also the case for Memoize too. I've not checked the other locations. The only thing I can really see that we might do would be not add the MAXALIGN(SizeofHeapTupleHeader) when there's just a single column. We'd need to pass down the number of attributes from create_sort_path() so we'd know when and when not to add that. I'm not saying that we should do this. I'm just saying that I don't really see what else we might do. I can imagine another patch might just want to do a complete overhaul of all locations that use relation_byte_size(). There are various things that function just does not account for. e.g, the fact that we allocate chunks in powers of 2 and that there's a chunk header added on. Of course, "width" is just an estimate, so maybe trying to calculate something too precisely wouldn't be too wise. However, there's a bit of a chicken and the egg problem there as there'd be little incentive to improve "width" unless we started making more accurate use of the value. Anyway, none of the above take into account that the Datum sort is just a little faster, The only thing that exists in the existing cost modal that we could use to adjust the cost of an in memory sort is the comparison_cost. The problem there is that the comparison is exactly the same in both Datum and Tuple sorts. The only thing that really changes between Datum and Tuple sort is the fact that we don't make a MinimalTuple when doing a Datum sort. The cost modal, unfortunately, does not account for that. That kinda makes me think that we should do nothing as if we start to account for making MemoryTuples then we'll just penalise Tuple sorts and that might cause someone to be upset. David