Hi folks,
Just wanted to mention, that looks like some CFBot test are failing,
something around level_tracking in pgss.
On 10/31/24 21:00, Alena Rybakina wrote:
On 29.10.2024 05:47, Andrei Lepikhov wrote:
I played around with the examples a bit and couldn't figure out
something. When I added the same values to different columns - firstly
in a, later in b, the order of the columns for sort operation doesn't
ch
I played around with the examples a bit and couldn't figure out
something. When I added the same values to different columns -
firstly in a, later in b, the order of the columns for sort operation
doesn't change. Isn't this a mistake?
create table a (x1 int, y1 int);
create table b (x2 int,
On 29.10.2024 05:47, Andrei Lepikhov wrote:
On 10/28/24 16:48, Alena Rybakina wrote:
On 23.10.2024 04:39, Andrei Lepikhov wrote:
On 15/10/2024 12:15, David Rowley wrote:
And the last patch is a demo of how I'm going to use the previous
three patches and add one more strategy to improve the ord
8b5a102d6d33eb4 Mon Sep 17 00:00:00 2001
From: "Andrei V. Lepikhov"
Date: Mon, 21 Oct 2024 15:21:34 +0400
Subject: [PATCH v2 2/4] Consider the number of columns in the cost-sort model.
The sort node calls the comparison operator for each pair of attributes for
each couple of tuples.
Hi! Thank you for your work on this subject!
On 23.10.2024 04:39, Andrei Lepikhov wrote:
On 15/10/2024 12:15, David Rowley wrote:
On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov wrote:
I think maybe what is worth working on is seeing if you can better
estimate the number of tiebreak comparisons
the plan must be the same as above
+SELECT t1.x, t1.y FROM sort_ndist_t1 t1, sort_ndist_t2 t2
+WHERE t2.x=t1.x
+ORDER BY t1.x,t1.y;
+
+RESET enable_hashjoin;
+DROP TABLE sort_ndist_t1, sort_ndist_t2;
--
2.47.0
From eaae44d6e9dda4a295433af4ec6ae18e181046f7 Mon Sep 17 00:00:00 2001
From: "An
On 10/15/24 12:15, David Rowley wrote:
As for your patch, I'm suspicious that the general idea you're
proposing is an actual improvement.
I didn't intend to treat it as an 'improvement' but as an intermediate
patch. The main purpose here is to debate the way & introduce
considering of number of
On Tue, 15 Oct 2024 at 17:48, Andrei Lepikhov wrote:
> I am suspicious of that but open to hearing other opinions. The
> coefficient incorporates knowledge about how many comparisons will be
> made with this sorting operator. The caller can have some additional
> knowledge about that. For example,
ject: [PATCH v3] Introduce the number of columns in the cost-sort model.
The sort node calls the comparison operator for each pair of attributes for
each couple of tuples. However, the current cost model uses
a '2.0*cpu_operator_cost' multiplier, which performs some sort of averaging.
This
e a question about estimating the cost of an Append node with your
sort cost model.
I see that if pathkeys is not a subset of subpath pathkeys, then we
calculate the cost taking pathkeys into account. However, I didn't
notice the estimation like that for the opposite situation.
I see that
Hi!
On Thu, 22 Aug 2024 at 23:46, Andrei Lepikhov wrote:
>
> Hi,
>
> I would like to propose a slight elaboration of the sort cost model.
> In practice, we frequently see the choice of suboptimal sortings, which
> slump performance by 10-50%.
>
> The key reason here is t
Introduce the number of columns in the cost-sort model.
The sort node calls the comparison operator for each pair of attributes for
each couple of tuples. However, the current cost model uses
a '2.0*cpu_operator_cost' multiplier, which performs some sort of averaging.
This technique c
Hi,
I would like to propose a slight elaboration of the sort cost model.
In practice, we frequently see the choice of suboptimal sortings, which
slump performance by 10-50%.
The key reason here is the optimiser's blindness to the fact that
sorting calls a comparison operator for each pa
Currently the cost model ignores the initial partition prune and run time
partition prune totally. This impacts includes: 1). The cost of Nest Loop
path
is highly overrated. 2). And the rows estimator can be very wrong as well
some
time. We can use the following cases to demonstrate.
CREATE TABLE
On Mon, Sep 07, 2020 at 01:55:28PM -0700, Jeff Davis wrote:
On Sun, 2020-09-06 at 23:21 +0200, Tomas Vondra wrote:
I've tested the costing changes on the simplified TPC-H query, on two
different machines, and it seems like a clear improvement.
Thank you. Committed.
So yeah, the patched costi
On Sun, 2020-09-06 at 23:21 +0200, Tomas Vondra wrote:
> I've tested the costing changes on the simplified TPC-H query, on two
> different machines, and it seems like a clear improvement.
Thank you. Committed.
> So yeah, the patched costing is much closer to sort (from the point
> of
> this cost/
Hi,
I've tested the costing changes on the simplified TPC-H query, on two
different machines, and it seems like a clear improvement.
This is using the same cost/duration measure, which I think is pretty
neat way to look at this. Sure, it's imperfect (depends on which cost
and durations you actua
On Fri, 2020-09-04 at 21:01 +0200, Tomas Vondra wrote:
> Wouldn't it be enough to just use a slot with smaller tuple
> descriptor?
> All we'd need to do is creating the descriptor in ExecInitAgg after
> calling find_hash_columns, and using it for rslot/wslot, and then
> "mapping" the attributes in
On Fri, Sep 04, 2020 at 11:31:36AM -0700, Jeff Davis wrote:
On Fri, 2020-09-04 at 14:56 +0200, Tomas Vondra wrote:
Those charts show that the CP_SMALL_TLIST resulted in smaller temp
files
(per EXPLAIN ANALYZE the difference is ~25%) and also lower query
durations (also in the ~25% range).
I wa
On Fri, 2020-09-04 at 14:56 +0200, Tomas Vondra wrote:
> Those charts show that the CP_SMALL_TLIST resulted in smaller temp
> files
> (per EXPLAIN ANALYZE the difference is ~25%) and also lower query
> durations (also in the ~25% range).
I was able to reproduce the problem, thank you.
Only two at
On Thu, Sep 03, 2020 at 05:53:43PM -0700, Jeff Davis wrote:
On Tue, 2020-09-01 at 23:19 +0200, Tomas Vondra wrote:
FWIW any thoughts about the different in temp size compared to
CP_SMALL_TLIST?
Are you referring to results from a while ago? In this thread I don't
see what you're referring to.
On Tue, 2020-09-01 at 23:19 +0200, Tomas Vondra wrote:
> FWIW any thoughts about the different in temp size compared to
> CP_SMALL_TLIST?
Are you referring to results from a while ago? In this thread I don't
see what you're referring to.
I tried in a simple case on REL_13_STABLE, with and without
On Wed, 2020-09-02 at 17:35 -0700, Peter Geoghegan wrote:
> On Wed, Sep 2, 2020 at 5:18 PM Jeff Davis wrote:
> > create table text10m(t text collate "C.UTF-8", i int, n numeric);
> > insert into text10m select s.g::text, s.g, s.g::numeric from
> > (select
> > (random()*10)::int as g from g
On Wed, Sep 2, 2020 at 5:18 PM Jeff Davis wrote:
> create table text10m(t text collate "C.UTF-8", i int, n numeric);
> insert into text10m select s.g::text, s.g, s.g::numeric from (select
> (random()*10)::int as g from generate_series(1,1000)) s;
> explain analyze select distinct t fro
On Sun, 2020-08-30 at 17:03 +0200, Tomas Vondra wrote:
> So I'm wondering if the hashagg is not ignoring similar non-I/O costs
> for the spilling case. In particular, the initial section computing
> startup_cost seems to ignore that we may need to so some of the stuff
> repeatedly - for example we'
On Tue, Sep 1, 2020 at 2:19 AM Tomas Vondra
wrote:
> FWIW I suspect some of this difference may be due to logical vs.
> physical I/O. iosnoop only tracks physical I/O sent to the device, but
> maybe we do much more logical I/O and it simply does not expire from
> page cache for the sort. It might
On Tue, Sep 01, 2020 at 12:58:59PM -0700, Jeff Davis wrote:
On Tue, 2020-09-01 at 11:19 +0200, Tomas Vondra wrote:
Why? I don't think we need to change costing of in-memory HashAgg. My
assumption was we'd only tweak startup_cost for cases with spilling
by
adding something like (cpu_operator_cost
On Tue, 2020-09-01 at 11:19 +0200, Tomas Vondra wrote:
> Why? I don't think we need to change costing of in-memory HashAgg. My
> assumption was we'd only tweak startup_cost for cases with spilling
> by
> adding something like (cpu_operator_cost * npartitions * ntuples).
The code above (the in-memo
onable is it to be making these kinds of changes to the cost
model right now? I think your analysis is solid, but I'm worried about
making more intrusive changes very late in the cycle.
I had originally tried to limit the cost model changes to the new plans
I am introducing -- that is, Has
he two cases, so the input_bytes for sort seems to
> be
> a bit overestimated.
Hmm, interesting.
How reasonable is it to be making these kinds of changes to the cost
model right now? I think your analysis is solid, but I'm worried about
making more intrusive changes very late in the cycle.
I
general concern about
how often we get disk-based hash aggregation when work_mem is
particularly low, and recursion seems unavoidable. This is generally
thought to be a problem in the costing.
We discussed two approaches to tweaking the cost model:
1. Penalize HashAgg disk costs by a constant
aggregation when work_mem is
particularly low, and recursion seems unavoidable. This is generally
thought to be a problem in the costing.
We discussed two approaches to tweaking the cost model:
1. Penalize HashAgg disk costs by a constant amount. It seems to be
chosen a little too often, and we can
nd recursion seems unavoidable. This is generally
> thought to be a problem in the costing.
We discussed two approaches to tweaking the cost model:
1. Penalize HashAgg disk costs by a constant amount. It seems to be
chosen a little too often, and we can reduce the number of plan
changes.
2.
We have a Postgres 13 open item for Disk-based hash aggregate, which
is the only non-trivial open item. There is a general concern about
how often we get disk-based hash aggregation when work_mem is
particularly low, and recursion seems unavoidable. This is generally
thought to be a problem in the
On Wed, Sep 25, 2019 at 05:12:26PM -0300, Alvaro Herrera wrote:
> It's not clear to me what the next action should be on this patch. I
> think Jeff got some feedback from Tom, but was that enough to expect a
> new version to be posted? That was in February; should we now (in late
> September) clo
It's not clear to me what the next action should be on this patch. I
think Jeff got some feedback from Tom, but was that enough to expect a
new version to be posted? That was in February; should we now (in late
September) close this as Returned with Feedback?
--
Álvaro Herreraht
On Fri, Mar 1, 2019 at 7:11 AM Jeff Janes wrote:
> I'm adding it to the commitfest targeting v13. I'm more interested in
> feedback on the conceptual issues rather than stylistic ones, as I would
> probably merge the two functions together before proposing something to
> actually be committed.
Jeff Janes writes:
> Should we be trying to estimate the false positive rate and charging
> cpu_tuple_cost and cpu_operator_cost the IO costs for visiting the table to
> recheck and reject those? I don't think other index types do that, and I'm
> inclined to think the burden should be on the user
On Sun, Feb 24, 2019 at 11:09 AM Jeff Janes wrote:
> I've moved this to the hackers list, and added Teodor and Alexander of the
> bloom extension, as I would like to hear their opinions on the costing.
>
My previous patch had accidentally included a couple lines of a different
thing I was workin
I've moved this to the hackers list, and added Teodor and Alexander of the
bloom extension, as I would like to hear their opinions on the costing.
On Tue, Feb 12, 2019 at 4:17 PM Tom Lane wrote:
>
> It's possible that a good cost model for bloom is so far outside
> genericc
On Thu, Dec 21, 2017 at 3:09 PM, Neto BR wrote:
>
>
> 2017-12-20 17:32 GMT-02:00 David G. Johnston :
>
>> On Wed, Dec 20, 2017 at 12:26 PM, neto brpr wrote:
>>
>>>
>>> About what you said, that some settings can be configured by Tablespace?
>>> I have already seen this in IBM DB2, but in Postgre
2017-12-20 17:32 GMT-02:00 David G. Johnston :
> On Wed, Dec 20, 2017 at 12:26 PM, neto brpr wrote:
>
>>
>> About what you said, that some settings can be configured by Tablespace?
>> I have already seen this in IBM DB2, but in Postgresql as far as I know,
>> for example the Random_page_cost and
2017-12-20 17:34 GMT-02:00 Andres Freund :
> On 2017-12-20 17:13:31 -0200, neto brpr wrote:
> > Just to explain it better. The idea of differentiating read and write
> > parameters (sequential and random) is exactly so that the access plans
> can
> > be better chosen by the optimizer. But for th
On 2017-12-20 17:13:31 -0200, neto brpr wrote:
> Just to explain it better. The idea of differentiating read and write
> parameters (sequential and random) is exactly so that the access plans can
> be better chosen by the optimizer. But for this, the Hash join, merge join,
> sorting and other alg
On Wed, Dec 20, 2017 at 12:26 PM, neto brpr wrote:
>
> About what you said, that some settings can be configured by Tablespace?
> I have already seen this in IBM DB2, but in Postgresql as far as I know,
> for example the Random_page_cost and Seq_page_cost parameters are
> configured for the Integ
2017-12-20 16:37 GMT-02:00 David G. Johnston :
> On Wed, Dec 20, 2017 at 11:26 AM, neto brpr wrote:
>
>> Dear David
>> I have read documentation that you send, but it has only sequential page
>> cost and random page cost parameters. What I need, would be a model of
>> custo for Differentiate Read
neto brpr wrote:
> Anyway, It seems that there has not yet been any initiative related to this
> in the postgresql community, am I right?
Yes.
--
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
optimizer, I believe that
differentiating these costs can have a positive impact on the process of
choosing access methods... This is just an opinion, I'm not sure.
If you said "writes of type X are 100 times as fast as writes of type
> Y", then some useful cost model could perh
On Wed, Dec 20, 2017 at 11:26 AM, neto brpr wrote:
> Dear David
> I have read documentation that you send, but it has only sequential page
> cost and random page cost parameters. What I need, would be a model of
> custo for Differentiate Read/Write (sequential and random), because in SSDs
> the r
No plan
is going to change usefully because of that, because you can't turn one
write into 360 reads or even 1 reads.
If you said "writes of type X are 100 times as fast as writes of type
Y", then some useful cost model could perhaps be developed. But that's
not
Dear David
I have read documentation that you send, but it has only sequential page
cost and random page cost parameters. What I need, would be a model of
custo for Differentiate Read/Write (sequential and random), because in SSDs
the reads and writes have different costs. If you or someone knows a
On Wed, Dec 20, 2017 at 10:29 AM, neto brpr wrote:
> Any comment, hint about it or something, please inform me.
>
The docs contain this - its seem to cover what you describe:
https://www.postgresql.org/docs/10/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
David J.
Hello All
Anyone know if there is a patch that changes the PostgreSQL cost model in
some way?
I'm working with an tuning technique, based in hybrid storage environments
(HDDs and SSDs).
I need to know if exist any patches that allow postgresql to differentiate,
for example, I/O costs of rea
54 matches
Mail list logo