Cost Model

2017-12-20 Thread neto brpr
Hello All

Anyone know if there is a patch that changes the PostgreSQL cost model in
some way?
I'm working with an tuning technique, based in hybrid storage environments
(HDDs and SSDs).

I need to know if exist any patches that allow postgresql to differentiate,
for example, I/O costs of read and write and also costs of sequential read
and random read.

This is because the read and write speeds of SSDs are asynchronous. On SSDs
the read speed can be 340 times faster than writing.
In Hard Disks the sequential reading and writing values are almost the
same. Also in HDDs the random reading is much slower than the sequential
reading.
In the SSDs the sequential and random reading are practically the same, due
to being a purely electronic device.
I would like to know if there is any solution related to this that I said,
because I would not need to develop this if something similar already
exists.

It would be interesting if there is a patch that inserts new parameters in
the postgresql.conf file to:

- sequencial read page cost
- random read page cost
- sequencial write page cost
- random write page cost

Any comment, hint about it or something, please inform me.

Regards,
Neto.



Livre
de vírus. www.avast.com
.
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: Cost Model

2017-12-20 Thread neto brpr
Dear David
I have read documentation that you send, but it has only sequential page
cost and random page cost parameters. What I need, would be a model of
custo for Differentiate Read/Write (sequential and random), because in SSDs
the reads and writes have different costs. If you or someone knows a patch
or other solution, that allows you to configure individual parameters to:

- Sequential reading page cost
- cost of the random reading page
- sequential recording page cost
- Random recording page cost

Best Regards
Neto

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Livre
de vírus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>.
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

2017-12-20 15:35 GMT-02:00 David G. Johnston :

> On Wed, Dec 20, 2017 at 10:29 AM, neto brpr  wrote:
>
>> Any comment, hint about it or something, please inform me.
>>
>
> The docs contain this - its seem to cover what you describe:
>
> ​https://www.postgresql.org/docs/10/static/runtime-config-
> query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
>
> David J.
>
>


Re: Cost Model

2017-12-20 Thread neto brpr
2017-12-20 16:35 GMT-02:00 Alvaro Herrera :

> neto brpr wrote:
> > Dear David
> > I have read documentation that you send, but it has only sequential page
> > cost and random page cost parameters. What I need, would be a model of
> > custo for Differentiate Read/Write (sequential and random), because in
> SSDs
> > the reads and writes have different costs.
>
> I don't think it matters.  I mean, differentiating seq/random read
> speeds can make the planner choose one plan type over another depending
> on how much each plan intends to read randomly or sequentially.  But why
> does it matter if one write is 360x as expensive as one read?  No plan
> is going to change usefully because of that, because you can't turn one
> write into 360 reads or even 1 reads.
>
>
Just to explain it better. The idea of ​​differentiating read and write
parameters (sequential and random) is exactly so that the access plans can
be better chosen by the optimizer. But for this, the Hash join, merge join,
sorting and other algorithms should also be changed to consider these new
parameters.
Because postgresql uses a cost-based optimizer, I believe that
differentiating these costs can have a positive impact on the process of
choosing access methods... This is just an opinion, I'm not sure.

If you said "writes of type X are 100 times as fast as writes of type
> Y", then some useful cost model could perhaps be developed.  But that's
> not what you're saying.
>
>
Anyway, It seems that there has not yet been any initiative related to this
in the postgresql community, am I right?

Best Regards
Neto


> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>



<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Livre
de vírus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>.
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: Cost Model

2017-12-20 Thread neto brpr
2017-12-20 16:37 GMT-02:00 David G. Johnston :

> On Wed, Dec 20, 2017 at 11:26 AM, neto brpr  wrote:
>
>> Dear David
>> I have read documentation that you send, but it has only sequential page
>> cost and random page cost parameters. What I need, would be a model of
>> custo for Differentiate Read/Write (sequential and random), because in SSDs
>> the reads and writes have different costs. If you or someone knows a patch
>> or other solution, that allows you to configure individual parameters to:
>>
>> - Sequential reading page cost
>> - cost of the random reading page
>> - sequential recording page cost
>> - Random recording page cost
>>
>>
> ​Please don't top-post.
>

OK, sorry!

>
> OK...reading more closely I don't see how "recording/writing" costs are
> important in decision making.  Either you have to write something, or you
> don't.​  If you do you updates pages in a buffer and generate WAL and then
> the system puts the data onto disk where they belong - the database itself
> doesn't care about that part and knowing how fast or slow it might happen
> would impact it behavior.  So PostgreSQL provides read settings to be tuned
> so it can decide between index and table scans on the table in question.
>
> The system doesn't understand SSD or HDD but does understand tablespaces
> and I believe that many of these settings are able to be configured on a
> per-tablespace (or table?) basis.
>
>
About what you said, that some settings can be configured by Tablespace?
I have already seen this in IBM DB2, but in Postgresql as far as I know,
for example the Random_page_cost and Seq_page_cost parameters are
configured for the Integer Database and not for Tablespace, ok?.
You or someone can tell me if cost parameters can be configured by
Tablespace, this would be useful for me, thinking of a server that has
hybrid storage environment (HDD x SSDs), I could leave some in a tablespace
with HDD adapted settings and the same way for when I have an SSD disk.

Regards
Neto

David J.
>
>

<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Livre
de vírus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>.
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: Cost Model

2017-12-20 Thread neto brpr
2017-12-20 17:34 GMT-02:00 Andres Freund :

> On 2017-12-20 17:13:31 -0200, neto brpr wrote:
> > Just to explain it better. The idea of ​​differentiating read and write
> > parameters (sequential and random) is exactly so that the access plans
> can
> > be better chosen by the optimizer. But for this, the Hash join, merge
> join,
> > sorting and other algorithms should also be changed to consider these new
> > parameters.
>
> I'm doubtful that there's that much benefit. Mergejoin doesn't write,
> hashjoins commonly don't write , and usually if so there's not that many
> alternatives to batched hashjoins. Similar-ish with sorts, although
> sometimes that can instead be done using ordered index scans.
>

Dear Andres
By reading some cientific paper, it has been said that the hash join and
sort merge join algorithms perform better than nested loop, considering
that it runs on an HDD, since the costs of read and write are practically
the same (symmetrical). However, in an SSD, where the cost of write is
double the cost of reads, this is not true, since both algorithms (sort
merge and hash join) require some writing operations on disk, when the data
does not fit in memory RAM. If we consider that the nested loop that works
only with read, and in case there is an index for the internal table, the
nested loop this would be a good alternative, since the readings on SSDs
are many times faster than in HDDs. This is an example of a situation in
which the difference between reading and writing could make the Optimizer
choose the Nested Loop rather than the Hash Join.

Regards


> What are the cases you forsee where costing reads/writes differently
> will lead to better plans?
>
> Greetings,
>
> Andres Freund
>


<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
Livre
de vírus. www.avast.com
<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>.
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>