As a matter of fact, we are actually better than PosgreSQL here, they are not checking if it is satisfiable at all.
A user was able to specify a check which is not satisfiable but it just proceeded: Table "public.users" Column | Type | Collation | Nullable | Default -------------+------------------------+-----------+----------+----------------------------------- id | integer | | not null | nextval('users_id_seq'::regclass) name | character varying(100) | | not null | age | integer | | | user_status | character varying(10) | | | Indexes: "users_pkey" PRIMARY KEY, btree (id) Check constraints: "users_age_check2" CHECK (age < 22 AND age > 50) "users_user_status_check" CHECK (user_status::text = ANY (ARRAY['active'::character varying, 'inactive'::character varying]::text[])) postgres=# insert into users (id, name, age, user_status) values (1, 'Joe', 20, 'active'); ERROR: new row for relation "users" violates check constraint "users_age_check2" DETAIL: Failing row contains (1, Joe, 20, active). On Wed, Feb 19, 2025 at 1:07 PM Štefan Miklošovič <smikloso...@apache.org> wrote: > As said earlier, I am happy to support just simple "x > 10 AND x < 100" > and be done with it. > > However ... I was looking into that PDF more closely and on page 16 there > is section 6.6.1 describing "Allen's Interval Algebra". Citing: > > "One form of infinite-valued CSP which has been widely studied is the case > where the values taken by the variables are intervals on some totally > ordered set. This setting is used to model the temporal behaviour of > systems, where the intervals represent time intervals during which events > occur. The most popular such formalism is Allen’s Interval Algebra". > > Integers is totally ordered set (right?) and here we have "intervals". So > the question if it is satisfiable basically means, in our case, if there is > some intersection of these intervals. There is even a Java library > implementing Allen's interval algebra. Their IntervalTree is computing > "overlap", according to their GH readme, in O(log n). > > What an interval is doing is here (3) > > x > 10 AND x < 100 - that is interval (10, 100) > > x > 1000 -> that is interval [1000, Integer.MAX_VALUE). > > x >= 5 AND x < 20 would be [5, 20) > > So then you ask a question if [1000, Integer.MAX_VALUE] is overlapping > somewhere with [10, 100] and the answer here is - no, it does not. So that > is not satisfiable. > > But I think that anything more complex (OR ...) would be pretty hard to > evaluate in its general form. > > Anyway, for now, this is not necessary, because we do not support OR (this > is not possible -> (x > 10 AND x < 100) OR (x > 500 and x < 1000)) and we > can not use same operator more than once (x > 10 AND x < 100 AND x > 50 - > not possible), which will basically collapse all complexity into "one > interval" which can be checked "manually" and we do not need any libraries > for that .... > > (1) https://en.wikipedia.org/wiki/Allen's_interval_algebra > (2) https://github.com/Breinify/brein-time-utilities > (3) > https://github.com/Breinify/brein-time-utilities/blob/master/docs/Interval.md > > On Wed, Feb 19, 2025 at 12:30 PM Dmitry Konstantinov <netud...@gmail.com> > wrote: > >> sorry, I was not very clear in my mail. >> I mentioned 2-SAT for 2 reasons: >> 1) to show that the NP-complete topic is very sensitive to definitions >> and limitations, even a small change in task definition (2-clause vs >> 3-clause) can affect the target complexity class a lot. Another example to >> show that a value domain can also have an impact here: real >> number logarithm and discrete logarithm - for real numbers you can >> calculate it efficiently, discrete logarithm calculation is an open >> problem. So we should be very accurate by talking about NP-completeness. At >> the same time I agree that the problem you described in a common form (with >> any combinations of OR and AND) quite possible is NP-complete (again, to >> say for sure we need to define it as a math problem and build a reduction >> of SAT or another NP-complete problem to it). >> >> 2) some techniques built for efficient solving of 2-SAT probably can be >> useful for our case.. >> >> Regarding our problem I think if we speak about the following form of >> constraints: >> simple_condition = column_i > comparable value | column_i < comparable >> value >> (simple_condition_1_1 AND simple_condition_1_2 AND ... AND >> simple_condition_1_n) OR ... OR (simple_condition_m_1 AND >> simple_condition_m_2 AND ... AND simple_condition_m_n) >> then the question "do we have at least one combination of column values >> which satisfies it" looks like solvable efficiently: >> - we can split it by OR clauses to M independent problems: if we have a >> combination of column values for any or them - the original constraint is >> satisfiable. >> - within each (simple_condition_1_1 AND simple_condition_1_2 AND ... AND >> simple_condition_1_n) we can group conditions by column and again check >> each of such groups independently. For each column on this level we have a >> set of simple conditions and we can check if the intersection of these >> ranges is not empty (it looks similar to range tombstones merging..). >> >> At the same time the next form looks quite similar to 3SAT and highly >> probable we cannot check it efficiently: >> (simple_condition_1_1 OR ... OR simple_condition_1_2 OR ... OR >> simple_condition_1_n) AND ... AND (simple_condition_m_1 OR >> simple_condition_m_2 OR ... OR simple_condition_m_n) >> >> The last point: even if a problem is NP-complete it does not eliminate >> the ability to solve it in reality for small numbers of columns or have >> some heuristics. In general I would compare the profit vs efforts here: >> technically it is possible to cover quite sophisticated cases but I am not >> sure if it is worth the effort here ... >> >> Note: it was about 14 years ago when I had a deep dive into computational >> complexity theory last time, so please do not trust my words a lot :-) >> >> >> On Wed, 19 Feb 2025 at 05:45, Štefan Miklošovič <smikloso...@apache.org> >> wrote: >> >>> duplicities => duplicates :D Brandon spotted I was using this word some >>> time ago wrongly. >>> >>> On Wed, Feb 19, 2025 at 6:37 AM Štefan Miklošovič < >>> smikloso...@apache.org> wrote: >>> >>>> Great resources, thanks for that. >>>> >>>> It is not immediately obvious to me that this is 2-SAT but I do agree >>>> that this is a CSP (Constraint satisfaction problem). Merely looking into >>>> that, my gut feeling is that this might be somehow polynomial but the >>>> examples of practically-solvable CSPs are related to finite domains while >>>> here we are dealing with an infinite one (all integers). For the former >>>> cases, there seem to exist some kind of an "intelligent brute-force" or >>>> backtracking etc, not sure how this is applicable for infinite domains >>>> though. >>>> >>>> 2-SAT is also said to be NL-complete (one that can be solved >>>> non-deterministically using a logarithmic amount of storage) and that does >>>> not seem like a lot of fun either. >>>> >>>> The third link you provided, the PDF, is the closest to our problem >>>> here. There is a categorisation of constraint satisfaction problems into >>>> various computational classes with examples but some examples seem to be >>>> missing. E.g. section 6.3 starts with "Example 4", not sure where are >>>> examples 1, 2 and 3. >>>> >>>> Anyway, what we do have here is some kind of very simplified form of >>>> that in such a sense that there is only one variable and we have only >>>> conjunctions. It is hard for me to reduce this problem and fit it into one >>>> of the examples to see what the complexity class is and if it is tractable. >>>> >>>> In general, I think that we should fix the obvious cases and just move >>>> on. We have it way simpler here because of one variable and conjunctions >>>> only as I mentioned. >>>> >>>> We can also be "smart" about that and rule out any duplicities when it >>>> comes to operators, e.g. when a user does this: >>>> >>>> ALTER TABLE ks.tb ALTER column1 CHECK column1 > 10 AND column1 > 100 >>>> and column1 < 50 >>>> >>>> Then we see that there is ">" used twice and we just error out on this >>>> as it seems like a user can not make his mind which it is - > 10 or > 100? >>>> If an operator is used just once, that simplifies that a lot as well. >>>> >>>> We can also propose this problem as a subject of the Google Summer of >>>> Code project for students to deal with. I can imagine that it might be >>>> attractive for scholars who are trying to find a problem to solve with a >>>> lot of theory behind it while it is also practical. I do not know what >>>> plans Bernardo has with this as it is his brain-child. I just stumbled upon >>>> this topic and I am trying to drive it to some reasonable maturity but I do >>>> not plan to spend a significant amount of time on this afterwards ... >>>> >>>> On Tue, Feb 18, 2025 at 11:06 PM Dmitry Konstantinov < >>>> netud...@gmail.com> wrote: >>>> >>>>> Hi, are you sure you want to go that deep? :-) >>>>> >>>>> https://en.wikipedia.org/wiki/Constraint_satisfaction_problem >>>>> https://en.wikipedia.org/wiki/2-satisfiability (an example what quite >>>>> a small change in constraints can change the picture dramatically: once we >>>>> moved from 3SAT to 2SAT - it is polynomial). >>>>> >>>>> https://www.cs.ox.ac.uk/activities/constraints/publications/ComplexityLanguages.pdf >>>>> >>>>> >>>>> Regards, >>>>> Dmitry >>>>> >>>>> On Tue, 18 Feb 2025 at 19:56, Štefan Miklošovič < >>>>> smikloso...@apache.org> wrote: >>>>> >>>>>> Sorry, CEP-42 >>>>>> >>>>>> On Tue, Feb 18, 2025 at 8:54 PM Štefan Miklošovič < >>>>>> smikloso...@apache.org> wrote: >>>>>> >>>>>>> Hi list, >>>>>>> >>>>>>> We are doing good progress together with Bernardo when it comes to >>>>>>> constraints which were merged recently (CEP-24). >>>>>>> >>>>>>> What I do now is that I try to "harden" it a little bit. If you >>>>>>> think about that, a user could do something like this (currently) >>>>>>> >>>>>>> ALTER TABLE ks.tb ALTER column1 CHECK column1 > 10 AND column1 < 5 >>>>>>> >>>>>>> If you take a look at this, this constraint is not satisfiable. >>>>>>> There is no such value for "column1" which would pass both checks. That >>>>>>> means that whatever a user would try to put there, such an insertion >>>>>>> would >>>>>>> fail. >>>>>>> >>>>>>> I have a POC which covers the most obvious cases so it fails when a >>>>>>> constraint is obviously not satisfiable, so we are not storing a check >>>>>>> which is impossible to pass. >>>>>>> >>>>>>> However, we are not completely sure how far we should go with this. >>>>>>> >>>>>>> The problem here is that, in theory, we could also support OR, for >>>>>>> example >>>>>>> >>>>>>> ALTER TABLE ks.tb ALTER column1 CHECK (column1 > 5 AND column < 10) >>>>>>> OR (column1 > 1000) >>>>>>> >>>>>>> You got it. >>>>>>> >>>>>>> The problem with this is that, when it is generalized, I can not >>>>>>> check if this is satisfiable, because this is basically a variation of >>>>>>> SAT >>>>>>> problem, which is ... NP-complete. Right? In its most general form, is >>>>>>> not >>>>>>> this a NP-complete problem? Or am I wrong? >>>>>>> >>>>>>> I do not think that we are going to solve NP-completeness of SAT >>>>>>> here :D so I just want to make sure that we are on the same page when it >>>>>>> comes to this and all we will ever support is just a simple AND (that is >>>>>>> complicated enough to verify if it is satisfiable already). >>>>>>> >>>>>>> What do you think about this? Is this all OK for you? >>>>>>> >>>>>>> Regards >>>>>>> >>>>>>> (1) https://en.wikipedia.org/wiki/Boolean_satisfiability_problem >>>>>>> >>>>>> >>>>> >>>>> -- >>>>> Dmitry Konstantinov >>>>> >>>> >> >> -- >> Dmitry Konstantinov >> >