Re: Implementing SQL ASSERTION

Joe Wildish Mon, 24 Sep 2018 16:05:17 -0700

Hi Peter,

> On 24 Sep 2018, at 15:06, Peter Eisentraut
> <peter.eisentr...@2ndquadrant.com> wrote:
> 
> On 29/04/2018 20:18, Joe Wildish wrote:
>> 
>> Attached is a rebased patch for the prototype.
> 
> I took a look at this.


Thank you for reviewing.

> This has been lying around for a few months, so it will need to be
> rebased again.
> 
> 8< - - - snipped for brevity - - - 8<
> 
> All this new code in constraint.c that checks the assertion expression
> needs more comments and documentation.

All agreed.  I’ll give the patch some TLC and get a new version that
addresses the above.

> Stuff like this isn't going to work:
> 
> static int
> funcMaskForFuncOid(Oid funcOid)
> {
>    char *name = get_func_name(funcOid);
> 
>    if (name == NULL)
>        return OTHER_FUNC;
>    else if (strncmp(name, "min", strlen("min")) == 0)
>        return MIN_AGG_FUNC;
>    else if (strncmp(name, "max", strlen("max")) == 0)
>        return MAX_AGG_FUNC;
> 
> You can assume from the name of a function what it's going to do.
> Solving this properly might be hard.

Agreed. My assumption was that we would record in the data dictionary the
behaviour (or “polarity") of each aggregate function with respect to the
various operators. Column in pg_aggregate? I don’t know how we’d record it
exactly. A bitmask would be a possibility. Also, I don’t know what we’d do
with custom aggregate functions (or indeed custom operators). Allowing end
users to determine the value would potentially lead to assertion checks
being incorrectly skipped. Maybe we’d say that custom aggregates always
have a neutral polarity and are therefore not subject to this
optimisation.

> This ought to be reproducible for you if you build with assertions.

Yes. I shall correct this when I do the aforementioned rebase and
application of TLC.

> My feeling is that if we want to move forward on this topic, we need to
> solve the concurrency question first.  All these optimizations for when
> we don't need to check the assertion are cool, but they are just
> optimizations that we can apply later on, once we have solved the
> critical problems.

I obviously agree that the concurrency issue needs solving. But I don’t
see that at all as a separate matter from the algos. Far from being merely
optimisations, the research indicates we can go a lot further toward
reducing the need for rechecks and, therefore, reducing the chance of
concurrency conflicts from occurring in the first place. This is true
regardless of whatever mechanism we use to enforce correct behaviour under
concurrent modifications -- e.g. a lock on the ASSERTION object itself,
enforced use of SERIALIZABLE, etc.

By way of example (lifted directly from the AM4DP book):

CREATE TABLE employee (
  id INTEGER PRIMARY KEY,
  dept INTEGER NOT NULL,
  job TEXT NOT NULL
);

CREATE ASSERTION department_managers_need_administrators CHECK
  (NOT EXISTS
    (SELECT dept
       FROM employee a
      WHERE EXISTS (SELECT * FROM employee b
                     WHERE a.dept = b.dept
                       AND b.job IN ('Manager', 'Senior Manager'))
        AND NOT EXISTS (SELECT * FROM employee b
                         WHERE a.dept = b.dept
                           AND b.job = 'Administrator')));

The current implementation derives "DELETE(employee), INSERT(employee) and
UPDATE(employee.dept, employee.job)" as the set of invalidating operations
and triggers accordingly. However, in this case, we can supplement the
triggers by having them inspect the transition tables to see if the actual
data from the triggering DML statement could in fact affect the truth of
the expression: specifically, only do the recheck on DELETE of an
"Administrator", INSERT of a "Manager" or "Senior Manager", or UPDATE when
the new job is a "Manager" or "Senior Manager" or the old job was an
"Administrator".

Now, if this is a company with 10,000 employees, and would therefore
presumably only require a handful of managers, right? ;-), then the
potential for a concurrency conflict is massively reduced when compared to
rechecking every time the employee table is touched.

(This optimisation has some caveats and is reliant upon being able to
derive the key of an expression from the underlying base tables plus some
stuff about functional dependencies. I have started work on it but sadly
not had time to progress it in recent months).

Having said all that: there are obviously going to be some expressions
that cannot be proven to have no potential for invalidating the assertion
truth. I guess this is the prime concern from a concurrency PoV? Example:

CREATE TABLE t (
  b BOOLEAN NOT NULL,
  n INTEGER NOT NULL,
  PRIMARY KEY (b, n)
);

CREATE ASSERTION sum_per_b_less_than_10 CHECK
  (NOT EXISTS
    (SELECT FROM (SELECT b, SUM(n)
                    FROM t
                   GROUP BY b) AS v(b, sum_n)
      WHERE sum_n > 10));

Invalidating operations are "INSERT(t) and UPDATE(t.b, t.n)". I guess the
interesting case, from a concurrency perspective, is how do we avoid an
INSERT WHERE b IS TRUE from blocking an INSERT WHERE B IS FALSE? I don’t
have an answer to that unfortunately. Although my understanding was that
SSI could help in these sorts of cases, but I really haven't read or
looked into the detail (yet). Thoughts?

-Joe

Re: Implementing SQL ASSERTION

Reply via email to