Hi Peter, > On 24 Sep 2018, at 15:06, Peter Eisentraut > <peter.eisentr...@2ndquadrant.com> wrote: > > On 29/04/2018 20:18, Joe Wildish wrote: >> >> Attached is a rebased patch for the prototype. > > I took a look at this.
Thank you for reviewing. > This has been lying around for a few months, so it will need to be > rebased again. > > 8< - - - snipped for brevity - - - 8< > > All this new code in constraint.c that checks the assertion expression > needs more comments and documentation. All agreed. I’ll give the patch some TLC and get a new version that addresses the above. > Stuff like this isn't going to work: > > static int > funcMaskForFuncOid(Oid funcOid) > { > char *name = get_func_name(funcOid); > > if (name == NULL) > return OTHER_FUNC; > else if (strncmp(name, "min", strlen("min")) == 0) > return MIN_AGG_FUNC; > else if (strncmp(name, "max", strlen("max")) == 0) > return MAX_AGG_FUNC; > > You can assume from the name of a function what it's going to do. > Solving this properly might be hard. Agreed. My assumption was that we would record in the data dictionary the behaviour (or “polarity") of each aggregate function with respect to the various operators. Column in pg_aggregate? I don’t know how we’d record it exactly. A bitmask would be a possibility. Also, I don’t know what we’d do with custom aggregate functions (or indeed custom operators). Allowing end users to determine the value would potentially lead to assertion checks being incorrectly skipped. Maybe we’d say that custom aggregates always have a neutral polarity and are therefore not subject to this optimisation. > This ought to be reproducible for you if you build with assertions. Yes. I shall correct this when I do the aforementioned rebase and application of TLC. > My feeling is that if we want to move forward on this topic, we need to > solve the concurrency question first. All these optimizations for when > we don't need to check the assertion are cool, but they are just > optimizations that we can apply later on, once we have solved the > critical problems. I obviously agree that the concurrency issue needs solving. But I don’t see that at all as a separate matter from the algos. Far from being merely optimisations, the research indicates we can go a lot further toward reducing the need for rechecks and, therefore, reducing the chance of concurrency conflicts from occurring in the first place. This is true regardless of whatever mechanism we use to enforce correct behaviour under concurrent modifications -- e.g. a lock on the ASSERTION object itself, enforced use of SERIALIZABLE, etc. By way of example (lifted directly from the AM4DP book): CREATE TABLE employee ( id INTEGER PRIMARY KEY, dept INTEGER NOT NULL, job TEXT NOT NULL ); CREATE ASSERTION department_managers_need_administrators CHECK (NOT EXISTS (SELECT dept FROM employee a WHERE EXISTS (SELECT * FROM employee b WHERE a.dept = b.dept AND b.job IN ('Manager', 'Senior Manager')) AND NOT EXISTS (SELECT * FROM employee b WHERE a.dept = b.dept AND b.job = 'Administrator'))); The current implementation derives "DELETE(employee), INSERT(employee) and UPDATE(employee.dept, employee.job)" as the set of invalidating operations and triggers accordingly. However, in this case, we can supplement the triggers by having them inspect the transition tables to see if the actual data from the triggering DML statement could in fact affect the truth of the expression: specifically, only do the recheck on DELETE of an "Administrator", INSERT of a "Manager" or "Senior Manager", or UPDATE when the new job is a "Manager" or "Senior Manager" or the old job was an "Administrator". Now, if this is a company with 10,000 employees, and would therefore presumably only require a handful of managers, right? ;-), then the potential for a concurrency conflict is massively reduced when compared to rechecking every time the employee table is touched. (This optimisation has some caveats and is reliant upon being able to derive the key of an expression from the underlying base tables plus some stuff about functional dependencies. I have started work on it but sadly not had time to progress it in recent months). Having said all that: there are obviously going to be some expressions that cannot be proven to have no potential for invalidating the assertion truth. I guess this is the prime concern from a concurrency PoV? Example: CREATE TABLE t ( b BOOLEAN NOT NULL, n INTEGER NOT NULL, PRIMARY KEY (b, n) ); CREATE ASSERTION sum_per_b_less_than_10 CHECK (NOT EXISTS (SELECT FROM (SELECT b, SUM(n) FROM t GROUP BY b) AS v(b, sum_n) WHERE sum_n > 10)); Invalidating operations are "INSERT(t) and UPDATE(t.b, t.n)". I guess the interesting case, from a concurrency perspective, is how do we avoid an INSERT WHERE b IS TRUE from blocking an INSERT WHERE B IS FALSE? I don’t have an answer to that unfortunately. Although my understanding was that SSI could help in these sorts of cases, but I really haven't read or looked into the detail (yet). Thoughts? -Joe