Re: [gentoo-dev] Pre-GLEP RFC: Automated enforcing of REQUIRED_USE constraints

Michał Górny Sat, 08 Jul 2017 14:56:41 -0700

On sob, 2017-07-08 at 22:34 +0200, Alexis Ballier wrote:
> On Sat, 08 Jul 2017 20:44:24 +0200
> Michał Górny <mgo...@gentoo.org> wrote:
> 
> > On sob, 2017-07-08 at 16:12 +0200, Alexis Ballier wrote:
> > > On Sat, 08 Jul 2017 11:43:39 +0200
> > > Michał Górny <mgo...@gentoo.org> wrote:
> > >   
> > > > Hi, everyone.
> > > > 
> > > > I think the affairs have settled enough and I've finished filling
> > > > in the pre-GLEP for REQUIRED_USE auto-enforcing. It's got all
> > > > the algorithms, rationale and separated reference implementation.
> > > > 
> > > > If there are no major concerns raised, I will soon start working
> > > > on writing an optimized implementation for pkgcore/pkgcheck
> > > > and integrating the verification algos with the CI.
> > > > 
> > > > The pre-GLEP for review is here:
> > > > 
> > > > https://wiki.gentoo.org/wiki/User:MGorny/GLEP:ReqUse  
> > > 
> > > 
> > > Constraint group reordering algorithm
> > > 
> > > I really think we should only consider REQUIRED_USE with
> > > forced/masked useflags instantiated there. And ban (in repoman)
> > > REQUIRED_USE that contain some "False": "a? ( b )" with 'a' free
> > > and 'b' masked is perfectly ok now but it hides a serious problem
> > > in the package/profile. Instantiating this would give: "a? ( False
> > > )" and be an error just like we have depend.bad & co. This is
> > > independent of auto solving or not, it's already wrong.  
> > 
> > As I've already explained you multiple times, this obtains *exactly
> > the same* effect. However, it's much simpler when it's done like this
> > because it makes it possible to reuse the already defined algorithms
> > instead of having to precisely define how to preprocess REQUIRED_USE
> > for this and cover all the corner cases.
> 
> Simpler??? I don't think so. What I wrote clearly pinpoints that:
> When you'll write the algorithm for "Verifying that the constraints do
> not alter immutable flags" you'll notice this is exactly that and can
> be put as a preprocessing step and then you can drop all the corner
> cases considerations for immutable flags. I never understood why you're
> insisting that much on immutables: they're really not natural, not
> simple, not standard, and carrying them all over seems to be a burden
> to me.


I wrote the algorithms, and they're simple. This specific check is
the combination of three simple steps:

a. reordering the groups based on immutables,

b. transforming the AST into flat form,

c. verifying each flat constraint.

The first step is trivial -- it's basically 'move true to front, false
to back'. The second step is more complex but it's needed anyway,
and quite well-defined, especially with the assumption that all
the groups always have at least one flag inside. The third step is
trivial again because it's just checking the conditions and constraints
against a list.

The alternative to reordering is altering the groups. Altering means we
need to have separate logic for every type of group while sorting works
the same in all of them. Altering means we need to explicitly special
case forcing 1 and >1 items, and masking all items, for each group.
Again, sorting does not need to be concerned about that because
the check following it (also trivial) will catch it anyway.

Of course, you could say that you will get a little better error
message, like 'all flags inside || are masked' instead of '!b -> a' will
alter immutable flag. But that's purely an implementation detail. It's
not worth making the reference algorithms longer.

> > > Reordering is a dangerous path as we've already seen since it can
> > > create unexpected loops for the solver.  
> > 
> > Freeform reordering is dangerous, and I've removed that already.
> > Reordering restricted to immutables can not cause any issues that any
> > other solution wouldn't cause.
> 
> You're very likely right there. Any proof? Esp. any proof that the
> checker still guarantees the existence of a solution in all cases?
> I'm not asking for a formal proof, but simply a bit more details than
> just an assertion saying it's fine.

The case for checker is just the same as with any other kind of
immutability processing. We need to run the reordering, transform
and verification separately for every possible combination of immutable
flags.

The reordering explicitly alters the results of the transform, and with
the trivial implication form of the flattened constraints
the verification stage checks will find any problems that may arise from
it, just like they would find any problem from doing a similar thing
verbatim.

> 
> > > Working on instantiated REQUIRED_USE constraints would probably
> > > simplify quite a bit your GLEP too: you already have the "Verifying
> > > that the constraints do not alter immutable flags" part that roughly
> > > does the same thing as instantiating, except if you assume it's
> > > already true you can skip the reordering.  
> > 
> > Except that the reordering can be described in 2 points, and so can be
> > the immutability verification. Please prove that you can provide
> > a simpler explanation that doesn't fail in any of the corner cases.
> 
> Except reordering is an invention and immutable checking is simply
> applying boolean logic rules to your implication and check that no
> "False" can appear. You can simply start by applying boolean logic and
> forget about reordering.

No, it is not. You do not have the values of all the items inside
the group, just some of them. Depending on how many of them do you have
and what are them, you need to transform the group appropriately, e.g.
by removing items, replacing the group or failing entirely.

> > > One big advantage of working on ASTs is that it becomes trivial to
> > > suggest a proper reordering.  
> > 
> > Reordering is never a trivial problem. Unless I'm missing something,
> > it is technically possible that a 'reordering' will actually require
> > a sub- item being moved out of the containing group.
> 
> Not if done at the AST level.
> 
> > And to be honest, I find the output of the verification script in this
> > regard quite useful. That is, it saying 'X affects Y, so it needs to
> > go before it' is quite clear to me. I don't think most developers
> > would actually need to script to pinpoint a specific location for
> > every single constraint.
> 
> In most cases this is sufficient.
> Think of a more complex case:
> A -> B
> B -> C
> A -> D
> D -> C
> 
>    |-> B -|
> A -|      |->C
>    |-> D -|
> 
> It's starting to be a more complex mental exercise to get the proper
> ordering when given the 1st form only.
> 
> 
> Actually, considering people rant against git merges because they want
> linear history in the graph log but fail to understand 'git log' is
> precisely about displaying such a linear ordering, I'm ready to bet
> someone will rant :)

We can discuss this when you have a working algorithm. Right now, it's
a purely theoretical exercise unless someone can come up with
a reasonable way of implementing it.

> > > -------
> > > 
> > > Restrictions on REQUIRED_USE format
> > > 
> > > I still fail to see the point here. One can simply apply the
> > > rewriting you suggest below and be done with it. Rationale is not
> > > very convincing to me:
> > > 
> > > - avoiding unpredictable results of automatic flag adjustments:
> > >   A deterministic algorithm is, by definition, predictable.  
> > 
> > s/unpredictable/surprising/?
> > 
> > The goal is for it do something that the developer *not reading
> > the spec* could reasonably predict happening.
> 
> 
> There is a huge gap between failing to solve a constraint voluntarily
> because it has one too much nesting level and having a repoman warning
> telling 'it is not recommended you write it that way'. The latter
> ensures said developer is able to predict what is happening, the former
> just adds annoyance onto users for no real reason.

Except it doesn't because it's extremely uncommon (and even unlikely)
and I am successfully exterminating the last occurrences. Implementing
support for something that will be never used is a waste of time.

> > > - improving readability of REQUIRED_USE constraints:
> > >   No need for a restriction for that. If people want to shoot
> > >   themselves in the foot, it is not a PMS problem. I see that
> > >   like proposing death penalty for those who commit
> > > suicide :)  
> > 
> > This is not PMS. This is a GLEP which serves both the purpose of
> > a technical specification with extended rationale and a policy
> > document.
> 
> Isn't the goal of it to have it in a future EAPI ?

I don't see how that's relevant. Nobody will be copying the whole GLEP
verbatim into the PMS.

> > > - keeping the specification and implementation relatively simple:
> > >   You already define everything for working without
> > > restriction. Plus, unlimited implication nesting has the same
> > > complexity.  
> > 
> > No, I don't. I don't cover the meaning of nested groups and things
> > just explode when they come into game.
> 
> 
> You mostly do with the rewriting part, you're only refusing to admit
> it :)

You mean the transform? It doesn't cover the possibility of those groups
containing anything but plain flags. As we've already established,
the results become non-trivial when they do and those cases are
certainly not covered here.

> > > -------
> > > 
> > > 
> > > Do you have numbers on the checker run on all inputs from
> > > gentoo-x86 ? Since we're dealing with heuristics those are
> > > particularly important to validate we're not rejecting too many
> > > constructs.  
> > 
> > I think I've mentioned that a few times in the GLEP. The validation
> > and verification bits were tested on all REQUIRED_USE cases from a few
> > days ago. For the former, all cases are listed in rationale. For
> > the latter, the reference implementation has test cases for every
> > construct that triggered an issue, including the false positives that
> > were fixed. I've only skipped a few that were completely redundant
> > (e.g. different versions of the same package that added 1-2
> > irrelevant items).
> 
> Unless I'm missing something, rationale seems more about cases rejected
> by the restricted syntax. Numbers I'm talking about is the # of rejected
> constraints vs accepted (and assumed solvable) ones.

I'll crunch some fresh numbers based on today's repository.

-- 
Best regards,
Michał Górny

signature.asc
Description: This is a digitally signed message part

Re: [gentoo-dev] Pre-GLEP RFC: Automated enforcing of REQUIRED_USE constraints

Reply via email to