Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

Ranier Vilela Tue, 20 Jul 2021 04:29:07 -0700

Em ter., 20 de jul. de 2021 às 05:35, David Rowley <dgrowle...@gmail.com>
escreveu:


> On Tue, 20 Jul 2021 at 01:10, James Coleman <jtc...@gmail.com> wrote:
> > To be clear up front: I'm in favor of the patch, and I don't want to
> > put unnecessary stumbling blocks up for it getting committed. So if we
> > decide to proceed as is, that's fine with me.
>
> Thanks for making that clear.
>
> > But I'm not sure that the "cost model, unfortunately, does not account
> > for that" is entirely accurate. The end of cost_tuplesort contains a
> > cost "per extracted tuple". It does, however, note that it doesn't
> > charge cpu_tuple_cost, which maybe is what you'd want to fully
> > incorporate this into the model. But given this run_cost isn't about
> > accounting for comparison cost (that's been done earlier) which is the
> > part that'd be the same between tuple and datum sort, it seems to me
> > that we could lower the cpu_operator_cost here by something like 10%
> > if it's byref and 30% if it's byval?
>
> I failed to notice that last part that adds the additional
> cpu_operator_cost.
>
> The default cpu_operator_cost is 0.0025, so with the 10k tuple
> benchmark, that adds an additional charge of 25 to the total cost.
>
> If we take test 1 from my results on v5 as an example:
>
> > Test1 446.1 657.3 147.32%
>
> Looking at explain for that query:
>
> regression=# explain select two from tenk1 order by two offset 1000000;
>                               QUERY PLAN
> ----------------------------------------------------------------------
>  Limit  (cost=1133.95..1133.95 rows=1 width=4)
>    ->  Sort  (cost=1108.97..1133.95 rows=9995 width=4)
>          Sort Key: two
>          ->  Seq Scan on tenk1  (cost=0.00..444.95 rows=9995 width=4)
> (4 rows)
>
> If we want the costs to reflect reality again here then we'd have
> reduce 1133.95 by something like 147.32% (the performance difference).
> That would bring the cost down to 769.72, which is way more than we
> have to play with than the 25 that the cpu_operator_cost * tuples
> gives us.
>
> If we reduced the 25 by 30% in this case, we'd get 17.5 and the total
> cost would become 1126.45.  That's not great considering the actual
> performance indicates that 769.72 would be a better number.
>
> If we look at John's result for test 1: He saw 588 tps on master and
> 998 on v8.  1133.95 / 998.0 * 588.0 = 668.09, so we'd need even more
> to get close to reality on that machine.
>
> My thoughts are that the small surcharge added at the end of
> cost_tuplesort() is just not enough for us to play with.  I think to
> get us closer to fixing this correctly would require a redesign of the
> tuplesort costing entirely. I think that would be about an order of
> magnitude more effort than this patch was, so I really feel like I
> don't want to do this.
>
I understand that redesign would require a lot of work,
but why not do it step by step?


> I kinda feel that since the comparison_cost is always just 2.0 *
> cpu_operator_cost regardless of the number of columns in the sort,
> then if we add too many new smarts to try and properly adjust for this
> new optimization, unless we do a completly new cost modal for this,
> then we might as well be putting lipstick on a pig.
>
I think one first step is naming this 2.0?
Does this magic number don't have a good name?


>
> It sounds like James mostly just mentioned the sorting just to ensure
> it was properly considered and does not really feel strongly that it
> needs to be adjusted.  Does anyone else feel that we should be
> adjusting it?
>
I took a look at cost_tuplesort and I think that some small adjustments
could be made as part of the improvement process.
It is attached.
1. long is a very problematic type; better int64?
2. 1024 can be int, not long?
3. 2 changed all to 2.0 (double)?
4. If disk-based is not needed, IMO can we avoid calling relation_byte_size?

Finally, to at least document (add comments) those conclusions,
would be nice, wouldn't it?

regards,
Ranier Vilela

costsize.patch
Description: Binary data

Re: [PATCH] Use optimized single-datum tuplesort in ExecSort

Reply via email to