On 2/10/21 11:11 AM, Rich Freeman wrote:
> On Wed, Feb 10, 2021 at 12:57 PM Andreas K. Hüttel <dilfri...@gentoo.org> 
> wrote:
>>
>> * what portage features are still needed or need improvements (e.g. binpkg
>> signing and verification)
>> * how should hosting look like
> 
> Some ideas for portage enhancements:
> 
> 1.  Ability to fetch binary packages from some kind of repo.

The old PORTAGE_BINHOST functionality has been replaced with a
binrepos.conf file that's very similar to repos.conf:

https://bugs.gentoo.org/668334

It doesn't have explicit support for multiple local binary package
repositories yet, but somebody got it working with src-uri set to a
file:/ uri as described in comments on this bug:

https://bugs.gentoo.org/768957

> 2.  Ability to have multiple binary packages co-exist in a repo (local
> or remote) with different build attributes (arch, USE, CFLAGS,
> DEPENDS, whatever).

We can now enable FEATURES=binpkg-multi-instance by default now that
this bug is fixed:

https://bugs.gentoo.org/571126

> 3.  Ability to pick the most appropriate binary packages to use based
> on user preferences (with a mix of hard and soft preferences).

Current package selection logic for binary packages is basically the
same as for ebuilds. These are the notable differences:

1) Binary packages are sorted in descending order by (version, mtime),
so then most recent builds are preferred when the versions are identical.

2) The --binpkg-respect-use option rejects binary packages what would
need to be rebuilt in order to match local USE settings.

> One idea I've had around how #2-3 might be implemented is:
> 1.  Binary packages already contain data on how they were built (USE
> flags, dependencies, etc).  Place this in a file using a deterministic
> sorting/etc order so that two builds with the same settings will have
> the same results.

This would only be needed to multi-profile binhosts that provide a
variety of configurations for the same package.

Features like this are not necessary if the binhost only intends to
provide packages for a single profile.

> 2.  Generate a hash of the file contents - this can go in the filename
> so that the file can co-exist with other files, and be located
> assuming you have a full matching set of metadata.

For FEATURES=binpkg-multi-instance we currently use an integer BUILD_ID
ensure that file names are unique.

> 3.  Start dropping attributes from the file based on a list of
> priorities and generate additional hashes.  Create symlinked files to
> the original file using these hashes (overwriting or not existing
> symlinks based on policy).  This allows the binary package to be found
> using either an exact set of attributes or a subset of higher-priority
> attributes.  This is analogous to shared object symlinking.
> 4.  The package manager will look for a binary package first using the
> user's full config, and then by dropping optional elements of the
> config (so maybe it does the search without CFLAGs, then without USE
> flags).  Eventually it aborts based on user prefs (maybe the user only
> wants an exact match, or is willing to accept alternate CFLAGs but not
> USE flags, or maybe anything for the arch is selected> 5.  As always the 
> final selected binary package still gets evaluated
> like any other binary package to ensure it is usable.
> 
> Such a system can identify whether a potentially usable file exists
> using only filename, cutting down on fetching.  In the interests of
> avoiding useless fetches we would only carry step 3 reasonably far -
> packages would have to match based on architecture and any dynamic
> linking requirements.  So we wouldn't generate hashes that didn't
> include at least those minimums, and the package manager wouldn't
> search for them.
> 
> Obviously you could do more (if you have 5 combinations of use flags,
> look for the set that matches most closely).  That couldn't be done
> using hashes alone in an efficient way.  You could have a small
> manifest file alongside the binary package that could be fetched
> separately if the package manager wants to narrow things down and
> fetch a few of those to narrow it down further.

All of the above is oriented toward multi-profile binhosts, so we'll
have to do a cost/benefit analysis to determine whether it's worth the
effort to introduce the complexity that multi-profile binhosts add.

> Or you could skip the hash searching and just fetch all the manifests
> for a particular package/arch and just search all of those, but that
> is more data to transfer just to do a query.  A metadata cache of some
> kind of might be another solution.  Content hashes would probably
> still be useful just to allow co-existence of alternate builds.

This also relates to the centralized Packages file that's currently used
to distribute the package metadata for all packages in a binhost. We can
make it scale better if we split out a separate index per package, not
unlike a pypi simple index:

https://pypi.org/simple/
-- 
Thanks,
Zac

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to