On 7/16/24 15:33, David G. Johnston wrote:
On Tue, Jul 16, 2024 at 11:57 AM Joe Conway <m...@joeconway.com
<mailto:m...@joeconway.com>> wrote:
> There are two alternative philosophies:
>
> A. By choosing to use a Unicode-based function, the user has opted in
> to the Unicode stability guarantees[2], and it's fine to update
Unicode
> occasionally in new major versions as long as we are transparent with
> the user.
>
> B. IMMUTABLE implies some very strict definition of stability, and we
> should never again update Unicode because it changes the results of
> IMMUTABLE functions.
>
> We've been following (A), and that's the defacto policy today[3][4].
> Noah and Laurenz argued[5] that the policy starting in version 18
> should be (B). Given that it's a policy decision that affects
more than
> just the builtin collation provider, I'd like to discuss it more
> broadly outside of that subthread.
On the general topic, we have these definitions in the fine manual:
8<-----------------
A VOLATILE function can do anything, ... A query using a volatile
function will re-evaluate the function at every row where its value is
needed.
A STABLE function cannot modify the database and is guaranteed to
return
the same results given the same arguments for all rows within a single
statement...
An IMMUTABLE function cannot modify the database and is guaranteed to
return the same results given the same arguments forever.
8<-----------------
As Jeff points out, the IMMUTABLE definition has never really been
true.
Even the STABLE is not quite right, as there are at least some STABLE
functions that will return the same value for multiple statements if
they are within a transaction block (e.g. "now()" -- TBH I don't
remember offhand if that is true for all stable functions).
Under-specification here doesn't make the meaning of stable incorrect.
We don't have anything that guarantees stability at the transaction
scope because I don't think it can be guaranteed there without
considering whether said transaction is read-committed, repeatable read,
or serializable. The function itself can promise more but the marker
seems correctly scoped for how the system uses it in statement optimization.
The way it is described is still surprising and can bite you if you are
not familiar with the nuances. In particular I have seen now() used in
transaction blocks surprise more than one person over the years.
and allow those to be
used like we do IMMUTABLE except with appropriate warning labels. E.g.
something ("STABLE_VERSION"?) to mean "forever within a major version
lifetime" and something ("STABLE_SYSTEM?") to mean "as long as you
don't
upgrade your OS".
I'd be content cutting "forever" down to "within a given server
configuration". Then just note that immutable functions can depend
implicitly on external server characteristics and so when moving data
between servers re-evaluation of immutable functions may be necessary.
Not so bad for indexes. A bit more problematic for generated values.
Yeah I forgot about the configuration controlled ones.
I'm not against adding metadata options here but for internal functions
comments and documentation can work. For user-defined functions I have
my doubts on how trustworthy they would end up being.
People lie all the time for user-defined functions, usually specifically
when they need IMMUTABLE semantics and are willing to live with the risk
and/or apply their own controls to ensure no changes in output.
For the original question, I suggest continuing behaving per "A" and
work on making it more clear to users what that means in terms of server
upgrades.
If we do add metadata to reflect our reality I'd settle on a generic
"STATIC" marker that can be used on those functions the rely on real
world state, whether we are directly calling into the system (e.g.,
hashing) or have chosen to provide the state access management ourselves
(e.g., unicode).
So you are proposing we add STATIC to VOLATILE/STABLE/IMMUTABLE (in the
third position before IMMUTABLE), give it IMMUTABLE semantics, mark
builtin functions that deserve it, and document with suitable caution
statements?
I guess can live with just one additional level of granularity.
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com