sorry, I was not very clear in my mail. I mentioned 2-SAT for 2 reasons: 1) to show that the NP-complete topic is very sensitive to definitions and limitations, even a small change in task definition (2-clause vs 3-clause) can affect the target complexity class a lot. Another example to show that a value domain can also have an impact here: real number logarithm and discrete logarithm - for real numbers you can calculate it efficiently, discrete logarithm calculation is an open problem. So we should be very accurate by talking about NP-completeness. At the same time I agree that the problem you described in a common form (with any combinations of OR and AND) quite possible is NP-complete (again, to say for sure we need to define it as a math problem and build a reduction of SAT or another NP-complete problem to it).
2) some techniques built for efficient solving of 2-SAT probably can be useful for our case.. Regarding our problem I think if we speak about the following form of constraints: simple_condition = column_i > comparable value | column_i < comparable value (simple_condition_1_1 AND simple_condition_1_2 AND ... AND simple_condition_1_n) OR ... OR (simple_condition_m_1 AND simple_condition_m_2 AND ... AND simple_condition_m_n) then the question "do we have at least one combination of column values which satisfies it" looks like solvable efficiently: - we can split it by OR clauses to M independent problems: if we have a combination of column values for any or them - the original constraint is satisfiable. - within each (simple_condition_1_1 AND simple_condition_1_2 AND ... AND simple_condition_1_n) we can group conditions by column and again check each of such groups independently. For each column on this level we have a set of simple conditions and we can check if the intersection of these ranges is not empty (it looks similar to range tombstones merging..). At the same time the next form looks quite similar to 3SAT and highly probable we cannot check it efficiently: (simple_condition_1_1 OR ... OR simple_condition_1_2 OR ... OR simple_condition_1_n) AND ... AND (simple_condition_m_1 OR simple_condition_m_2 OR ... OR simple_condition_m_n) The last point: even if a problem is NP-complete it does not eliminate the ability to solve it in reality for small numbers of columns or have some heuristics. In general I would compare the profit vs efforts here: technically it is possible to cover quite sophisticated cases but I am not sure if it is worth the effort here ... Note: it was about 14 years ago when I had a deep dive into computational complexity theory last time, so please do not trust my words a lot :-) On Wed, 19 Feb 2025 at 05:45, Štefan Miklošovič <smikloso...@apache.org> wrote: > duplicities => duplicates :D Brandon spotted I was using this word some > time ago wrongly. > > On Wed, Feb 19, 2025 at 6:37 AM Štefan Miklošovič <smikloso...@apache.org> > wrote: > >> Great resources, thanks for that. >> >> It is not immediately obvious to me that this is 2-SAT but I do agree >> that this is a CSP (Constraint satisfaction problem). Merely looking into >> that, my gut feeling is that this might be somehow polynomial but the >> examples of practically-solvable CSPs are related to finite domains while >> here we are dealing with an infinite one (all integers). For the former >> cases, there seem to exist some kind of an "intelligent brute-force" or >> backtracking etc, not sure how this is applicable for infinite domains >> though. >> >> 2-SAT is also said to be NL-complete (one that can be solved >> non-deterministically using a logarithmic amount of storage) and that does >> not seem like a lot of fun either. >> >> The third link you provided, the PDF, is the closest to our problem here. >> There is a categorisation of constraint satisfaction problems into various >> computational classes with examples but some examples seem to be missing. >> E.g. section 6.3 starts with "Example 4", not sure where are examples 1, 2 >> and 3. >> >> Anyway, what we do have here is some kind of very simplified form of that >> in such a sense that there is only one variable and we have only >> conjunctions. It is hard for me to reduce this problem and fit it into one >> of the examples to see what the complexity class is and if it is tractable. >> >> In general, I think that we should fix the obvious cases and just move >> on. We have it way simpler here because of one variable and conjunctions >> only as I mentioned. >> >> We can also be "smart" about that and rule out any duplicities when it >> comes to operators, e.g. when a user does this: >> >> ALTER TABLE ks.tb ALTER column1 CHECK column1 > 10 AND column1 > 100 and >> column1 < 50 >> >> Then we see that there is ">" used twice and we just error out on this as >> it seems like a user can not make his mind which it is - > 10 or > 100? If >> an operator is used just once, that simplifies that a lot as well. >> >> We can also propose this problem as a subject of the Google Summer of >> Code project for students to deal with. I can imagine that it might be >> attractive for scholars who are trying to find a problem to solve with a >> lot of theory behind it while it is also practical. I do not know what >> plans Bernardo has with this as it is his brain-child. I just stumbled upon >> this topic and I am trying to drive it to some reasonable maturity but I do >> not plan to spend a significant amount of time on this afterwards ... >> >> On Tue, Feb 18, 2025 at 11:06 PM Dmitry Konstantinov <netud...@gmail.com> >> wrote: >> >>> Hi, are you sure you want to go that deep? :-) >>> >>> https://en.wikipedia.org/wiki/Constraint_satisfaction_problem >>> https://en.wikipedia.org/wiki/2-satisfiability (an example what quite a >>> small change in constraints can change the picture dramatically: once we >>> moved from 3SAT to 2SAT - it is polynomial). >>> >>> https://www.cs.ox.ac.uk/activities/constraints/publications/ComplexityLanguages.pdf >>> >>> >>> Regards, >>> Dmitry >>> >>> On Tue, 18 Feb 2025 at 19:56, Štefan Miklošovič <smikloso...@apache.org> >>> wrote: >>> >>>> Sorry, CEP-42 >>>> >>>> On Tue, Feb 18, 2025 at 8:54 PM Štefan Miklošovič < >>>> smikloso...@apache.org> wrote: >>>> >>>>> Hi list, >>>>> >>>>> We are doing good progress together with Bernardo when it comes to >>>>> constraints which were merged recently (CEP-24). >>>>> >>>>> What I do now is that I try to "harden" it a little bit. If you think >>>>> about that, a user could do something like this (currently) >>>>> >>>>> ALTER TABLE ks.tb ALTER column1 CHECK column1 > 10 AND column1 < 5 >>>>> >>>>> If you take a look at this, this constraint is not satisfiable. There >>>>> is no such value for "column1" which would pass both checks. That means >>>>> that whatever a user would try to put there, such an insertion would fail. >>>>> >>>>> I have a POC which covers the most obvious cases so it fails when a >>>>> constraint is obviously not satisfiable, so we are not storing a check >>>>> which is impossible to pass. >>>>> >>>>> However, we are not completely sure how far we should go with this. >>>>> >>>>> The problem here is that, in theory, we could also support OR, for >>>>> example >>>>> >>>>> ALTER TABLE ks.tb ALTER column1 CHECK (column1 > 5 AND column < 10) OR >>>>> (column1 > 1000) >>>>> >>>>> You got it. >>>>> >>>>> The problem with this is that, when it is generalized, I can not check >>>>> if this is satisfiable, because this is basically a variation of SAT >>>>> problem, which is ... NP-complete. Right? In its most general form, is not >>>>> this a NP-complete problem? Or am I wrong? >>>>> >>>>> I do not think that we are going to solve NP-completeness of SAT here >>>>> :D so I just want to make sure that we are on the same page when it comes >>>>> to this and all we will ever support is just a simple AND (that is >>>>> complicated enough to verify if it is satisfiable already). >>>>> >>>>> What do you think about this? Is this all OK for you? >>>>> >>>>> Regards >>>>> >>>>> (1) https://en.wikipedia.org/wiki/Boolean_satisfiability_problem >>>>> >>>> >>> >>> -- >>> Dmitry Konstantinov >>> >> -- Dmitry Konstantinov