Re: Consider the number of columns in the sort cost model

2024-12-02 Thread Dmitry Dolgov
Hi folks, Just wanted to mention, that looks like some CFBot test are failing, something around level_tracking in pgss.

Re: Consider the number of columns in the sort cost model

2024-10-31 Thread Andrei Lepikhov
On 10/31/24 21:00, Alena Rybakina wrote: On 29.10.2024 05:47, Andrei Lepikhov wrote: I played around with the examples a bit and couldn't figure out something. When I added the same values ​​to different columns - firstly in a, later in b, the order of the columns for sort operation doesn't ch

Re: Consider the number of columns in the sort cost model

2024-10-31 Thread Alena Rybakina
I played around with the examples a bit and couldn't figure out something. When I added the same values ​​to different columns - firstly in a, later in b, the order of the columns for sort operation doesn't change. Isn't this a mistake? create table a (x1 int, y1 int); create table b (x2 int,

Re: Consider the number of columns in the sort cost model

2024-10-31 Thread Alena Rybakina
On 29.10.2024 05:47, Andrei Lepikhov wrote: On 10/28/24 16:48, Alena Rybakina wrote: On 23.10.2024 04:39, Andrei Lepikhov wrote: On 15/10/2024 12:15, David Rowley wrote: And the last patch is a demo of how I'm going to use the previous three patches and add one more strategy to improve the ord

Re: Consider the number of columns in the sort cost model

2024-10-28 Thread Andrei Lepikhov
8b5a102d6d33eb4 Mon Sep 17 00:00:00 2001 From: "Andrei V. Lepikhov" Date: Mon, 21 Oct 2024 15:21:34 +0400 Subject: [PATCH v2 2/4] Consider the number of columns in the cost-sort model. The sort node calls the comparison operator for each pair of attributes for each couple of tuples.

Re: Consider the number of columns in the sort cost model

2024-10-28 Thread Alena Rybakina
Hi! Thank you for your work on this subject! On 23.10.2024 04:39, Andrei Lepikhov wrote: On 15/10/2024 12:15, David Rowley wrote: On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov wrote: I think maybe what is worth working on is seeing if you can better estimate the number of tiebreak comparisons

Re: Consider the number of columns in the sort cost model

2024-10-22 Thread Andrei Lepikhov
the plan must be the same as above +SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2 +WHERE t2.x=t1.x +ORDER BY t1.x,t1.y; + +RESET enable_hashjoin; +DROP TABLE sort_ndist_t1, sort_ndist_t2; -- 2.47.0 From eaae44d6e9dda4a295433af4ec6ae18e181046f7 Mon Sep 17 00:00:00 2001 From: "An

Re: Consider the number of columns in the sort cost model

2024-10-14 Thread Andrei Lepikhov
On 10/15/24 12:15, David Rowley wrote: As for your patch, I'm suspicious that the general idea you're proposing is an actual improvement. I didn't intend to treat it as an 'improvement' but as an intermediate patch. The main purpose here is to debate the way & introduce considering of number of

Re: Consider the number of columns in the sort cost model

2024-10-14 Thread David Rowley
On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov wrote: > I am suspicious of that but open to hearing other opinions. The > coefficient incorporates knowledge about how many comparisons will be > made with this sorting operator. The caller can have some additional > knowledge about that. For example,

Re: Consider the number of columns in the sort cost model

2024-10-14 Thread Andrei Lepikhov
ject: [PATCH v3] Introduce the number of columns in the cost-sort model. The sort node calls the comparison operator for each pair of attributes for each couple of tuples. However, the current cost model uses a '2.0*cpu_operator_cost' multiplier, which performs some sort of averaging. This

Re: Consider the number of columns in the sort cost model

2024-10-14 Thread Alena Rybakina
e a question about estimating the cost of an Append node with your sort cost model. I see that if pathkeys is not a subset of subpath pathkeys, then we calculate the cost taking pathkeys into account. However, I didn't notice the estimation like that for the opposite situation. I see that

Re: Consider the number of columns in the sort cost model

2024-10-14 Thread Kirill Reshke
Hi! On Thu, 22 Aug 2024 at 23:46, Andrei Lepikhov wrote: > > Hi, > > I would like to propose a slight elaboration of the sort cost model. > In practice, we frequently see the choice of suboptimal sortings, which > slump performance by 10-50%. > > The key reason here is t

Re: Consider the number of columns in the sort cost model

2024-10-07 Thread Andrei Lepikhov
Introduce the number of columns in the cost-sort model. The sort node calls the comparison operator for each pair of attributes for each couple of tuples. However, the current cost model uses a '2.0*cpu_operator_cost' multiplier, which performs some sort of averaging. This technique c

Consider the number of columns in the sort cost model

2024-08-22 Thread Andrei Lepikhov
Hi, I would like to propose a slight elaboration of the sort cost model. In practice, we frequently see the choice of suboptimal sortings, which slump performance by 10-50%. The key reason here is the optimiser's blindness to the fact that sorting calls a comparison operator for each pa

Cost model improvement for run-time partition prune

2021-04-07 Thread Andy Fan
Currently the cost model ignores the initial partition prune and run time partition prune totally. This impacts includes: 1). The cost of Nest Loop path is highly overrated. 2). And the rows estimator can be very wrong as well some time. We can use the following cases to demonstrate. CREATE TABLE

Re: Disk-based hash aggregate's cost model

2020-09-08 Thread Tomas Vondra
On Mon, Sep 07, 2020 at 01:55:28PM -0700, Jeff Davis wrote: On Sun, 2020-09-06 at 23:21 +0200, Tomas Vondra wrote: I've tested the costing changes on the simplified TPC-H query, on two different machines, and it seems like a clear improvement. Thank you. Committed. So yeah, the patched costi

Re: Disk-based hash aggregate's cost model

2020-09-07 Thread Jeff Davis
On Sun, 2020-09-06 at 23:21 +0200, Tomas Vondra wrote: > I've tested the costing changes on the simplified TPC-H query, on two > different machines, and it seems like a clear improvement. Thank you. Committed. > So yeah, the patched costing is much closer to sort (from the point > of > this cost/

Re: Disk-based hash aggregate's cost model

2020-09-06 Thread Tomas Vondra
Hi, I've tested the costing changes on the simplified TPC-H query, on two different machines, and it seems like a clear improvement. This is using the same cost/duration measure, which I think is pretty neat way to look at this. Sure, it's imperfect (depends on which cost and durations you actua

Re: Disk-based hash aggregate's cost model

2020-09-04 Thread Jeff Davis
On Fri, 2020-09-04 at 21:01 +0200, Tomas Vondra wrote: > Wouldn't it be enough to just use a slot with smaller tuple > descriptor? > All we'd need to do is creating the descriptor in ExecInitAgg after > calling find_hash_columns, and using it for rslot/wslot, and then > "mapping" the attributes in

Re: Disk-based hash aggregate's cost model

2020-09-04 Thread Tomas Vondra
On Fri, Sep 04, 2020 at 11:31:36AM -0700, Jeff Davis wrote: On Fri, 2020-09-04 at 14:56 +0200, Tomas Vondra wrote: Those charts show that the CP_SMALL_TLIST resulted in smaller temp files (per EXPLAIN ANALYZE the difference is ~25%) and also lower query durations (also in the ~25% range). I wa

Re: Disk-based hash aggregate's cost model

2020-09-04 Thread Jeff Davis
On Fri, 2020-09-04 at 14:56 +0200, Tomas Vondra wrote: > Those charts show that the CP_SMALL_TLIST resulted in smaller temp > files > (per EXPLAIN ANALYZE the difference is ~25%) and also lower query > durations (also in the ~25% range). I was able to reproduce the problem, thank you. Only two at

Re: Disk-based hash aggregate's cost model

2020-09-04 Thread Tomas Vondra
On Thu, Sep 03, 2020 at 05:53:43PM -0700, Jeff Davis wrote: On Tue, 2020-09-01 at 23:19 +0200, Tomas Vondra wrote: FWIW any thoughts about the different in temp size compared to CP_SMALL_TLIST? Are you referring to results from a while ago? In this thread I don't see what you're referring to.

Re: Disk-based hash aggregate's cost model

2020-09-03 Thread Jeff Davis
On Tue, 2020-09-01 at 23:19 +0200, Tomas Vondra wrote: > FWIW any thoughts about the different in temp size compared to > CP_SMALL_TLIST? Are you referring to results from a while ago? In this thread I don't see what you're referring to. I tried in a simple case on REL_13_STABLE, with and without

Re: Disk-based hash aggregate's cost model

2020-09-03 Thread Jeff Davis
On Wed, 2020-09-02 at 17:35 -0700, Peter Geoghegan wrote: > On Wed, Sep 2, 2020 at 5:18 PM Jeff Davis wrote: > > create table text10m(t text collate "C.UTF-8", i int, n numeric); > > insert into text10m select s.g::text, s.g, s.g::numeric from > > (select > > (random()*10)::int as g from g

Re: Disk-based hash aggregate's cost model

2020-09-02 Thread Peter Geoghegan
On Wed, Sep 2, 2020 at 5:18 PM Jeff Davis wrote: > create table text10m(t text collate "C.UTF-8", i int, n numeric); > insert into text10m select s.g::text, s.g, s.g::numeric from (select > (random()*10)::int as g from generate_series(1,1000)) s; > explain analyze select distinct t fro

Re: Disk-based hash aggregate's cost model

2020-09-02 Thread Jeff Davis
On Sun, 2020-08-30 at 17:03 +0200, Tomas Vondra wrote: > So I'm wondering if the hashagg is not ignoring similar non-I/O costs > for the spilling case. In particular, the initial section computing > startup_cost seems to ignore that we may need to so some of the stuff > repeatedly - for example we'

Re: Disk-based hash aggregate's cost model

2020-09-01 Thread Peter Geoghegan
On Tue, Sep 1, 2020 at 2:19 AM Tomas Vondra wrote: > FWIW I suspect some of this difference may be due to logical vs. > physical I/O. iosnoop only tracks physical I/O sent to the device, but > maybe we do much more logical I/O and it simply does not expire from > page cache for the sort. It might

Re: Disk-based hash aggregate's cost model

2020-09-01 Thread Tomas Vondra
On Tue, Sep 01, 2020 at 12:58:59PM -0700, Jeff Davis wrote: On Tue, 2020-09-01 at 11:19 +0200, Tomas Vondra wrote: Why? I don't think we need to change costing of in-memory HashAgg. My assumption was we'd only tweak startup_cost for cases with spilling by adding something like (cpu_operator_cost

Re: Disk-based hash aggregate's cost model

2020-09-01 Thread Jeff Davis
On Tue, 2020-09-01 at 11:19 +0200, Tomas Vondra wrote: > Why? I don't think we need to change costing of in-memory HashAgg. My > assumption was we'd only tweak startup_cost for cases with spilling > by > adding something like (cpu_operator_cost * npartitions * ntuples). The code above (the in-memo

Re: Disk-based hash aggregate's cost model

2020-09-01 Thread Tomas Vondra
onable is it to be making these kinds of changes to the cost model right now? I think your analysis is solid, but I'm worried about making more intrusive changes very late in the cycle. I had originally tried to limit the cost model changes to the new plans I am introducing -- that is, Has

Re: Disk-based hash aggregate's cost model

2020-08-31 Thread Jeff Davis
he two cases, so the input_bytes for sort seems to > be > a bit overestimated. Hmm, interesting. How reasonable is it to be making these kinds of changes to the cost model right now? I think your analysis is solid, but I'm worried about making more intrusive changes very late in the cycle. I

Re: Disk-based hash aggregate's cost model

2020-08-30 Thread Tomas Vondra
general concern about how often we get disk-based hash aggregation when work_mem is particularly low, and recursion seems unavoidable. This is generally thought to be a problem in the costing. We discussed two approaches to tweaking the cost model: 1. Penalize HashAgg disk costs by a constant

Re: Disk-based hash aggregate's cost model

2020-08-29 Thread Tomas Vondra
aggregation when work_mem is particularly low, and recursion seems unavoidable. This is generally thought to be a problem in the costing. We discussed two approaches to tweaking the cost model: 1. Penalize HashAgg disk costs by a constant amount. It seems to be chosen a little too often, and we can

Re: Disk-based hash aggregate's cost model

2020-08-28 Thread Jeff Davis
nd recursion seems unavoidable. This is generally > thought to be a problem in the costing. We discussed two approaches to tweaking the cost model: 1. Penalize HashAgg disk costs by a constant amount. It seems to be chosen a little too often, and we can reduce the number of plan changes. 2.

Disk-based hash aggregate's cost model

2020-08-27 Thread Peter Geoghegan
We have a Postgres 13 open item for Disk-based hash aggregate, which is the only non-trivial open item. There is a general concern about how often we get disk-based hash aggregation when work_mem is particularly low, and recursion seems unavoidable. This is generally thought to be a problem in the

Re: Bloom index cost model seems to be wrong

2019-09-26 Thread Michael Paquier
On Wed, Sep 25, 2019 at 05:12:26PM -0300, Alvaro Herrera wrote: > It's not clear to me what the next action should be on this patch. I > think Jeff got some feedback from Tom, but was that enough to expect a > new version to be posted? That was in February; should we now (in late > September) clo

Re: Bloom index cost model seems to be wrong

2019-09-25 Thread Alvaro Herrera
It's not clear to me what the next action should be on this patch. I think Jeff got some feedback from Tom, but was that enough to expect a new version to be posted? That was in February; should we now (in late September) close this as Returned with Feedback? -- Álvaro Herreraht

Re: Bloom index cost model seems to be wrong

2019-07-07 Thread Thomas Munro
On Fri, Mar 1, 2019 at 7:11 AM Jeff Janes wrote: > I'm adding it to the commitfest targeting v13. I'm more interested in > feedback on the conceptual issues rather than stylistic ones, as I would > probably merge the two functions together before proposing something to > actually be committed.

Re: Bloom index cost model seems to be wrong

2019-02-28 Thread Tom Lane
Jeff Janes writes: > Should we be trying to estimate the false positive rate and charging > cpu_tuple_cost and cpu_operator_cost the IO costs for visiting the table to > recheck and reject those? I don't think other index types do that, and I'm > inclined to think the burden should be on the user

Re: Bloom index cost model seems to be wrong

2019-02-28 Thread Jeff Janes
On Sun, Feb 24, 2019 at 11:09 AM Jeff Janes wrote: > I've moved this to the hackers list, and added Teodor and Alexander of the > bloom extension, as I would like to hear their opinions on the costing. > My previous patch had accidentally included a couple lines of a different thing I was workin

Re: Bloom index cost model seems to be wrong

2019-02-24 Thread Jeff Janes
I've moved this to the hackers list, and added Teodor and Alexander of the bloom extension, as I would like to hear their opinions on the costing. On Tue, Feb 12, 2019 at 4:17 PM Tom Lane wrote: > > It's possible that a good cost model for bloom is so far outside > genericc

Re: Cost Model

2017-12-21 Thread Ashutosh Bapat
On Thu, Dec 21, 2017 at 3:09 PM, Neto BR wrote: > > > 2017-12-20 17:32 GMT-02:00 David G. Johnston : > >> On Wed, Dec 20, 2017 at 12:26 PM, neto brpr wrote: >> >>> >>> About what you said, that some settings can be configured by Tablespace? >>> I have already seen this in IBM DB2, but in Postgre

Re: Cost Model

2017-12-21 Thread Neto BR
2017-12-20 17:32 GMT-02:00 David G. Johnston : > On Wed, Dec 20, 2017 at 12:26 PM, neto brpr wrote: > >> >> About what you said, that some settings can be configured by Tablespace? >> I have already seen this in IBM DB2, but in Postgresql as far as I know, >> for example the Random_page_cost and

Re: Cost Model

2017-12-20 Thread neto brpr
2017-12-20 17:34 GMT-02:00 Andres Freund : > On 2017-12-20 17:13:31 -0200, neto brpr wrote: > > Just to explain it better. The idea of ​​differentiating read and write > > parameters (sequential and random) is exactly so that the access plans > can > > be better chosen by the optimizer. But for th

Re: Cost Model

2017-12-20 Thread Andres Freund
On 2017-12-20 17:13:31 -0200, neto brpr wrote: > Just to explain it better. The idea of ​​differentiating read and write > parameters (sequential and random) is exactly so that the access plans can > be better chosen by the optimizer. But for this, the Hash join, merge join, > sorting and other alg

Re: Cost Model

2017-12-20 Thread David G. Johnston
On Wed, Dec 20, 2017 at 12:26 PM, neto brpr wrote: > > About what you said, that some settings can be configured by Tablespace? > I have already seen this in IBM DB2, but in Postgresql as far as I know, > for example the Random_page_cost and Seq_page_cost parameters are > configured for the Integ

Re: Cost Model

2017-12-20 Thread neto brpr
2017-12-20 16:37 GMT-02:00 David G. Johnston : > On Wed, Dec 20, 2017 at 11:26 AM, neto brpr wrote: > >> Dear David >> I have read documentation that you send, but it has only sequential page >> cost and random page cost parameters. What I need, would be a model of >> custo for Differentiate Read

Re: Cost Model

2017-12-20 Thread Alvaro Herrera
neto brpr wrote: > Anyway, It seems that there has not yet been any initiative related to this > in the postgresql community, am I right? Yes. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Cost Model

2017-12-20 Thread neto brpr
optimizer, I believe that differentiating these costs can have a positive impact on the process of choosing access methods... This is just an opinion, I'm not sure. If you said "writes of type X are 100 times as fast as writes of type > Y", then some useful cost model could perh

Re: Cost Model

2017-12-20 Thread David G. Johnston
On Wed, Dec 20, 2017 at 11:26 AM, neto brpr wrote: > Dear David > I have read documentation that you send, but it has only sequential page > cost and random page cost parameters. What I need, would be a model of > custo for Differentiate Read/Write (sequential and random), because in SSDs > the r

Re: Cost Model

2017-12-20 Thread Alvaro Herrera
No plan is going to change usefully because of that, because you can't turn one write into 360 reads or even 1 reads. If you said "writes of type X are 100 times as fast as writes of type Y", then some useful cost model could perhaps be developed. But that's not

Re: Cost Model

2017-12-20 Thread neto brpr
Dear David I have read documentation that you send, but it has only sequential page cost and random page cost parameters. What I need, would be a model of custo for Differentiate Read/Write (sequential and random), because in SSDs the reads and writes have different costs. If you or someone knows a

Re: Cost Model

2017-12-20 Thread David G. Johnston
On Wed, Dec 20, 2017 at 10:29 AM, neto brpr wrote: > Any comment, hint about it or something, please inform me. > The docs contain this - its seem to cover what you describe: ​ https://www.postgresql.org/docs/10/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS David J.

Cost Model

2017-12-20 Thread neto brpr
Hello All Anyone know if there is a patch that changes the PostgreSQL cost model in some way? I'm working with an tuning technique, based in hybrid storage environments (HDDs and SSDs). I need to know if exist any patches that allow postgresql to differentiate, for example, I/O costs of rea