On Thu, Jul 15, 2021 at 10:19 AM David Rowley <dgrowle...@gmail.com> wrote: > > On Fri, 16 Jul 2021 at 01:44, James Coleman <jtc...@gmail.com> wrote: > > > > On Wed, Jul 14, 2021 at 9:22 PM David Rowley <dgrowle...@gmail.com> wrote: > > > > > > On Thu, 15 Jul 2021 at 12:30, Ranier Vilela <ranier...@gmail.com> wrote: > > > > > > > > Em qua., 14 de jul. de 2021 às 21:21, David Rowley > > > > <dgrowle...@gmail.com> escreveu: > > > >> But, in v8 there is no additional branch, so no branch to mispredict. > > > >> I don't really see how your explanation fits. > > > > > > > > In v8 the branch occurs at : > > > > + if (ExecGetResultType(outerPlanState(sortstate))->natts == 1) > > > > > > You do know that branch is in a function that's only executed once > > > during executor initialization, right? > > > > This is why I have a hard time believing there's a "real" change here > > and not the result of either noise or something not really > > controllable like executable layout changing. > > Yeah, I think we likely are at the level where layout changes in the > compiled code are going to make things hard to measure. I just want > to make sure we're not going to end up with some regression that's > actual and not random depending on layout changes of unrelated code. > I think a branch that's taken consistently *should* be predicted > correctly each time. > > Anyway, I think all the comparisons with v7b can safely be ignored. As > Ronan pointed out, v7b has some issues due to it not recording the > sort method in the executor state that leads to it forgetting which > method it used once we start pulling tuples from it. The reproductions > of that are it calling tuplesort_gettupleslot() from the 2nd tuple > onwards regardless of if we've done a datum or tuple sort. > > Ronan's latest results plus John's make me think there's no need to > separate out the node function as I did in v8. However, I do think v6 > could learn a little from v8. I think I'd rather see the sort method > determined in ExecInitSort() rather than ExecSort(). I think > minimising those few extra instructions in ExecSort() might help the > L1 instruction cache.
I ran master/v6/v8 tests for 90s each with David's test script on an AWS c5n.metal instance (so should be immune to noise neighbor issues). Here are comparative results: Test1 Test2 Test3 Test4 Test5 Test6 Test7 Test8 v6 68.66% 0.05% 32.21% -0.83% 12.58% 10.42% -1.48% 50.98% v8 69.78% -0.44% 32.45% -1.11% 12.01% 10.58% -1.40% 49.30% So I see a consistent change in the data, but I don't really see a good explanation for it not being noise. Can't prove that yet though. James