On 09/28/2017 10:10 PM, Robert Haas wrote:
On Wed, Sep 13, 2017 at 7:00 AM, Simon Riggs <si...@2ndquadrant.com> wrote:
If we do have an option it won't be using fancy mathematical
terminology at all, it would be described in terms of its function,
e.g. recheck_on_update
+1.

I have nothing against renaming "projection" option to "recheck_on_update" or 
whatever else is suggested.
Just let me know the best version. From my point of view "recheck_on_update" is too verbose and still not self-explained (to understand the meaning of this option it is necessary to uunderstand how heap_update works). "projection"/"non-injective"/... are more declarative notions, explaining the characteristic of the index, while "recheck_on_update" is more procedural notion, explaining behavior of heap_update.


Yes, I'd rather not have an option at all, just some simple code with
useful effect, like we have in many other places.
I think the question we need to be able to answer is: What is the
probability that an update that would otherwise be non-HOT can be made
into a HOT update by performing a recheck to see whether the value has
changed?  It doesn't seem easy to figure that out from any of the
statistics we have available today or could easily get, because it
depends not only on the behavior of the expression which appears in
the index definition but also on the application behavior.  For
example, consider a JSON blob representing a bank account.
b->'balance' is likely to change most of the time, but
b->'account_holder_name' only rarely.  That's going to be hard for an
automated system to determine.

We should clearly check as many of the other criteria for a HOT update
as possible before performing a recheck of this type, so that it only
gets performed when it might help.  For example, if column a is
indexed and b->'foo' is indexed, there's no point in checking whether
b->'foo' has changed if we know that a has changed.  I don't know
whether it would be feasible to postpone deciding whether to do a
recheck until after we've figured out whether the page seems to
contain enough free space to allow a HOT update.

Turning non-HOT updates into HOT updates is really good, so it seems
likely that the rechecks will often be worthwhile.  If we avoid a HOT
update in 25% of cases, that's probably easily worth the CPU overhead
of a recheck assuming the function isn't something ridiculously
expensive to compute; the extra CPU cost will be repaid by reduced
bloat.  However, if we avoid a HOT update only one time in a million,
it's probably not worth the cost of recomputing the expression the
other 999,999 times.  I wonder where the crossover point is -- it
seems like something that could be figured out by benchmarking.

While I agree that it would be nice to have this be a completely
automatic determination, I am not sure that will be practical.  I
oppose overloading some other marker (like function_cost>10000) for
this; that's too magical.

I almost agree with you.
Just few remarks: indexes are rarely created for frequently changed attributes, 
like b->'balance'.
So in case of proper database schema design it is possible to expect that most 
of updates are hot updates: do not actually affect any index.
But certainly different attributes may have different probability of been 
updated.
Unfortunately we do not know before check which attribute of JSON field (or any 
other fields used in indexed expression) is changed.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to