Re: Probable NP-completness of constraints satisfiability check

Štefan Miklošovič Wed, 19 Feb 2025 07:57:19 -0800

As a matter of fact, we are actually better than PosgreSQL here, they are
not checking if it is satisfiable at all.


A user was able to specify a check which is not satisfiable but it just
proceeded:

                                      Table "public.users"
   Column    |          Type          | Collation | Nullable |
 Default
-------------+------------------------+-----------+----------+-----------------------------------
 id          | integer                |           | not null |
nextval('users_id_seq'::regclass)
 name        | character varying(100) |           | not null |
 age         | integer                |           |          |
 user_status | character varying(10)  |           |          |
Indexes:
    "users_pkey" PRIMARY KEY, btree (id)
Check constraints:
    "users_age_check2" CHECK (age < 22 AND age > 50)
    "users_user_status_check" CHECK (user_status::text = ANY
(ARRAY['active'::character varying, 'inactive'::character varying]::text[]))

postgres=# insert into users (id, name, age, user_status) values (1, 'Joe',
20, 'active');
ERROR:  new row for relation "users" violates check constraint
"users_age_check2"
DETAIL:  Failing row contains (1, Joe, 20, active).

On Wed, Feb 19, 2025 at 1:07 PM Štefan Miklošovič <smikloso...@apache.org>
wrote:

> As said earlier, I am happy to support just simple "x > 10 AND x < 100"
> and be done with it.
>
> However ... I was looking into that PDF more closely and on page 16 there
> is section 6.6.1 describing "Allen's Interval Algebra". Citing:
>
> "One form of infinite-valued CSP which has been widely studied is the case
> where the values taken by the variables are intervals on some totally
> ordered set. This setting is used to model the temporal behaviour of
> systems, where the intervals represent time intervals during which events
> occur. The most popular such formalism is Allen’s Interval Algebra".
>
> Integers is totally ordered set (right?) and here we have "intervals". So
> the question if it is satisfiable basically means, in our case, if there is
> some intersection of these intervals. There is even a Java library
> implementing Allen's interval algebra. Their IntervalTree is computing
> "overlap", according to their GH readme, in O(log n).
>
> What an interval is doing is here (3)
>
> x > 10 AND x < 100 - that is interval (10, 100)
>
> x > 1000 -> that is interval [1000, Integer.MAX_VALUE).
>
> x >= 5 AND x < 20 would be [5, 20)
>
> So then you ask a question if [1000, Integer.MAX_VALUE] is overlapping
> somewhere with [10, 100] and the answer here is - no, it does not. So that
> is not satisfiable.
>
> But I think that anything more complex (OR ...) would be pretty hard to
> evaluate in its general form.
>
> Anyway, for now, this is not necessary, because we do not support OR (this
> is not possible -> (x > 10 AND x < 100) OR (x > 500 and x < 1000)) and we
> can not use same operator more than once (x > 10 AND x < 100 AND x > 50 -
> not possible), which will basically collapse all complexity into "one
> interval" which can be checked "manually" and we do not need any libraries
> for that ....
>
> (1) https://en.wikipedia.org/wiki/Allen's_interval_algebra
> (2) https://github.com/Breinify/brein-time-utilities
> (3)
> https://github.com/Breinify/brein-time-utilities/blob/master/docs/Interval.md
>
> On Wed, Feb 19, 2025 at 12:30 PM Dmitry Konstantinov <netud...@gmail.com>
> wrote:
>
>> sorry, I was not very clear in my mail.
>> I mentioned 2-SAT for 2 reasons:
>> 1) to show that the NP-complete topic is very sensitive to definitions
>> and limitations, even a small change in task definition (2-clause vs
>> 3-clause) can affect the target complexity class a lot. Another example to
>> show that a value domain can also have an impact here: real
>> number logarithm and discrete logarithm - for real numbers you can
>> calculate it efficiently, discrete logarithm calculation is an open
>> problem. So we should be very accurate by talking about NP-completeness. At
>> the same time I agree that the problem you described in a common form (with
>> any combinations of OR and AND) quite possible is NP-complete (again, to
>> say for sure we need to define it as a math problem and build a reduction
>> of SAT or another NP-complete problem to it).
>>
>> 2) some techniques built for efficient solving of 2-SAT probably can be
>> useful for our case..
>>
>> Regarding our problem I think if we speak about the following form of
>> constraints:
>> simple_condition = column_i > comparable value | column_i < comparable
>> value
>> (simple_condition_1_1 AND simple_condition_1_2 AND ... AND
>> simple_condition_1_n) OR ... OR (simple_condition_m_1 AND
>> simple_condition_m_2  AND ... AND simple_condition_m_n)
>> then the question "do we have at least one combination of column values
>> which satisfies it" looks like solvable efficiently:
>> - we can split it by OR clauses to M independent problems: if we have a
>> combination of column values for any or them - the original constraint is
>> satisfiable.
>> - within each (simple_condition_1_1 AND simple_condition_1_2 AND ... AND
>> simple_condition_1_n)  we can group conditions by column and again check
>> each of such groups independently. For each column on this level we have a
>> set of simple conditions and we can check if the intersection of these
>> ranges is not empty (it looks similar to range tombstones merging..).
>>
>> At the same time the next form looks quite similar to 3SAT and highly
>> probable we cannot check it efficiently:
>> (simple_condition_1_1 OR ... OR simple_condition_1_2 OR ... OR
>> simple_condition_1_n) AND ... AND (simple_condition_m_1 OR
>> simple_condition_m_2 OR ... OR simple_condition_m_n)
>>
>> The last point: even if a problem is NP-complete it does not eliminate
>> the ability to solve it in reality for small numbers of columns or have
>> some heuristics. In general I would compare the profit vs efforts here:
>> technically it is possible to cover quite sophisticated cases but I am not
>> sure if it is worth the effort here ...
>>
>> Note: it was about 14 years ago when I had a deep dive into computational
>> complexity theory last time, so please do not trust my words a lot :-)
>>
>>
>> On Wed, 19 Feb 2025 at 05:45, Štefan Miklošovič <smikloso...@apache.org>
>> wrote:
>>
>>> duplicities => duplicates :D Brandon spotted I was using this word some
>>> time ago wrongly.
>>>
>>> On Wed, Feb 19, 2025 at 6:37 AM Štefan Miklošovič <
>>> smikloso...@apache.org> wrote:
>>>
>>>> Great resources, thanks for that.
>>>>
>>>> It is not immediately obvious to me that this is 2-SAT but I do agree
>>>> that this is a CSP (Constraint satisfaction problem). Merely looking into
>>>> that, my gut feeling is that this might be somehow polynomial but the
>>>> examples of practically-solvable CSPs are related to finite domains while
>>>> here we are dealing with an infinite one (all integers). For the former
>>>> cases, there seem to exist some kind of an "intelligent brute-force" or
>>>> backtracking etc, not sure how this is applicable for infinite domains
>>>> though.
>>>>
>>>> 2-SAT is also said to be NL-complete (one that can be solved
>>>> non-deterministically using a logarithmic amount of storage) and that does
>>>> not seem like a lot of fun either.
>>>>
>>>> The third link you provided, the PDF, is the closest to our problem
>>>> here. There is a categorisation of constraint satisfaction problems into
>>>> various computational classes with examples but some examples seem to be
>>>> missing. E.g. section 6.3 starts with "Example 4", not sure where are
>>>> examples 1, 2 and 3.
>>>>
>>>> Anyway, what we do have here is some kind of very simplified form of
>>>> that in such a sense that there is only one variable and we have only
>>>> conjunctions. It is hard for me to reduce this problem and fit it into one
>>>> of the examples to see what the complexity class is and if it is tractable.
>>>>
>>>> In general, I think that we should fix the obvious cases and just move
>>>> on. We have it way simpler here because of one variable and conjunctions
>>>> only as I mentioned.
>>>>
>>>> We can also be "smart" about that and rule out any duplicities when it
>>>> comes to operators, e.g. when a user does this:
>>>>
>>>> ALTER TABLE ks.tb ALTER column1 CHECK column1 > 10 AND column1 > 100
>>>> and column1 < 50
>>>>
>>>> Then we see that there is ">" used twice and we just error out on this
>>>> as it seems like a user can not make his mind which it is - > 10 or > 100?
>>>> If an operator is used just once, that simplifies that a lot as well.
>>>>
>>>> We can also propose this problem as a subject of the Google Summer of
>>>> Code project for students to deal with. I can imagine that it might be
>>>> attractive for scholars who are trying to find a problem to solve with a
>>>> lot of theory behind it while it is also practical. I do not know what
>>>> plans Bernardo has with this as it is his brain-child. I just stumbled upon
>>>> this topic and I am trying to drive it to some reasonable maturity but I do
>>>> not plan to spend a significant amount of time on this afterwards ...
>>>>
>>>> On Tue, Feb 18, 2025 at 11:06 PM Dmitry Konstantinov <
>>>> netud...@gmail.com> wrote:
>>>>
>>>>> Hi, are you sure you want to go that deep? :-)
>>>>>
>>>>> https://en.wikipedia.org/wiki/Constraint_satisfaction_problem
>>>>> https://en.wikipedia.org/wiki/2-satisfiability (an example what quite
>>>>> a small change in constraints can change the picture dramatically: once we
>>>>> moved from 3SAT to 2SAT - it is polynomial).
>>>>>
>>>>> https://www.cs.ox.ac.uk/activities/constraints/publications/ComplexityLanguages.pdf
>>>>>
>>>>>
>>>>> Regards,
>>>>> Dmitry
>>>>>
>>>>> On Tue, 18 Feb 2025 at 19:56, Štefan Miklošovič <
>>>>> smikloso...@apache.org> wrote:
>>>>>
>>>>>> Sorry, CEP-42
>>>>>>
>>>>>> On Tue, Feb 18, 2025 at 8:54 PM Štefan Miklošovič <
>>>>>> smikloso...@apache.org> wrote:
>>>>>>
>>>>>>> Hi list,
>>>>>>>
>>>>>>> We are doing good progress together with Bernardo when it comes to
>>>>>>> constraints which were merged recently (CEP-24).
>>>>>>>
>>>>>>> What I do now is that I try to "harden" it a little bit. If you
>>>>>>> think about that, a user could do something like this (currently)
>>>>>>>
>>>>>>> ALTER TABLE ks.tb ALTER column1 CHECK column1 > 10 AND column1 < 5
>>>>>>>
>>>>>>> If you take a look at this, this constraint is not satisfiable.
>>>>>>> There is no such value for "column1" which would pass both checks. That
>>>>>>> means that whatever a user would try to put there, such an insertion 
>>>>>>> would
>>>>>>> fail.
>>>>>>>
>>>>>>> I have a POC which covers the most obvious cases so it fails when a
>>>>>>> constraint is obviously not satisfiable, so we are not storing a check
>>>>>>> which is impossible to pass.
>>>>>>>
>>>>>>> However, we are not completely sure how far we should go with this.
>>>>>>>
>>>>>>> The problem here is that, in theory, we could also support OR, for
>>>>>>> example
>>>>>>>
>>>>>>> ALTER TABLE ks.tb ALTER column1 CHECK (column1 > 5 AND column < 10)
>>>>>>> OR (column1 > 1000)
>>>>>>>
>>>>>>> You got it.
>>>>>>>
>>>>>>> The problem with this is that, when it is generalized, I can not
>>>>>>> check if this is satisfiable, because this is basically a variation of 
>>>>>>> SAT
>>>>>>> problem, which is ... NP-complete. Right? In its most general form, is 
>>>>>>> not
>>>>>>> this a NP-complete problem? Or am I wrong?
>>>>>>>
>>>>>>> I do not think that we are going to solve NP-completeness of SAT
>>>>>>> here :D so I just want to make sure that we are on the same page when it
>>>>>>> comes to this and all we will ever support is just a simple AND (that is
>>>>>>> complicated enough to verify if it is satisfiable already).
>>>>>>>
>>>>>>> What do you think about this? Is this all OK for you?
>>>>>>>
>>>>>>> Regards
>>>>>>>
>>>>>>> (1) https://en.wikipedia.org/wiki/Boolean_satisfiability_problem
>>>>>>>
>>>>>>
>>>>>
>>>>> --
>>>>> Dmitry Konstantinov
>>>>>
>>>>
>>
>> --
>> Dmitry Konstantinov
>>
>

Re: Probable NP-completness of constraints satisfiability check

Reply via email to