Re: [OE-core] Core workflow: sstate for all, bblock/bbunlock, tools for why is sstate not being reused?

Adrian Freihofer Wed, 01 Nov 2023 07:18:30 -0700

Hi Alex, hi Richard

The discussion looks really interesting. I would like to contribute
some comments from the point of view of a rather naive user and try to
understand the workflows for which these improvements would be
beneficial also on a bigger picture.



On Thu, 2023-09-14 at 20:51 +0200, Alexander Kanavin wrote:
> On Thu, 14 Sept 2023 at 14:56, Richard Purdie
> <richard.pur...@linuxfoundation.org> wrote:
> > For the task signatures, we need to think about some questions. If
> > I
> > make a change locally, can I query how much will rebuild and how
> > much
> > will be reused? There is bitbake --dry-run but perhaps it is time
> > for a
> > an option (or dedicated separate command?) to give some statistics
> > about what bitbake would do? How much sstate would be reused?
> > 
> > That then logically leads into the questions, can we tell what has
> > changed? Why isn't my sstate being reused? For that we perhaps
> > should
> > define some existing scenarios where it is currently very difficult
> > to
> > work this out and then work out how we can report that information
> > to
> > the user. These could become test cases?
> 
> So I think there are two questions here that the tools should answer:
> 
> 1. If I would run a build, what would be missing in the cache and
> need
> to be built? The missing cache objects are in a dependency hierarchy,
> so only those missing objects with no dependecies on other missing
> objects would be printed. That should be comparatively easy to add as
> bitbake already does those checks all the time. Is there something
> else that's easily done and useful to print?
> 
> 2. Then there's the question of *why* they are missing, which is
> harder to answer. If, say, curl:do_package is not in the cache, then
> the tool would have to walk the cache tree (I/O heavy operation as
> there is no index), make a list of all curl:do_package objects that
> are there, and do a recursive bitbake-diffsig (going up the task
> tree)
> on them vs the one we want. Then print them starting with the newest.
> Something like:
> 

We are currently experimenting with replacing the eSDK installer with
the bitbake build environment for our users. Part of this
transformation is, of course, the shared sstate-cache, for which this
discussion seems quite relevant. The workflow we are aiming for is as
follows:

   1. Setup the layers and build config (out of scope here)
   2. Download the sstate for a particular recipe (usually an image
      recipe).
      
      Note: Working with SSTATE_MIRRORS does not work very well because
      bitbake connects way too often to the sstate server. So we
      started to develop a script which downloads the sstate artifacts
      into SSTATE_DIR to get the SDK set up.
      
      Your point 1. is basically what our (a bit hacky) download script
      does in the --dry-run mode: Printing all the required sstate
      artifacts. I think as a first step, it would be very valuable to
      have a function or a tinfoil API in the core that returns a list
      of sstate artifacts for a given recipe.
      
      As a second step a tool or a new feature of an existing tool
      could download the artifacts into SSTATE_DIR. This would also be
      a great successor for the not so much maintained devtool sdk-
      update command.
   3. At this point, the user is basically in the same situation as
      after installing the eSDK installer. devtool and now also bitbake
      are available, the sstate-cache is fully populated.

Maintaining such an SDK and sstate mirror infrastructure brings us to
your point 2. Tooling for maintaining the sstate cache becomes even
more important than it is now. Also, locking the sstate cache and
treating missing artifacts as errors seems to be an important feature. 

But I would consider the locking/unlocking more relevant for testing
rather than for deploying locked SDKs. Unlocked SDKs allow the user to
switch the branches of the layers to commits where the sstate is not
available. If the user is in a full featured bitbake environment rather
than the constrained eSDK installer environment this is perfectly fine.
Bitbake can just compile the missing recipes.
Such an SDK would combine all the advantages of the current eSDK
installer and the much more flexible Bitbake environment. This would
imply that the sstate download script should just try to download
what's available on the mirror and maybe print a warning for artifacts
which cannot be downloaded. But it should not abort with an error for
missing artifacts.

Does that make sense?

Regrads,
Adrian

> Existing cache objects are not suitable because:
> <object id 1> was built on <date> and has a mismatching SRCREV
> <object id 2> was built on <earlier date> and has a different
> do_compile()
> 
> > One of the big problems in the past was that we lost much of the
> > hash
> > information after parsing completed. This meant that if the hashes
> > then
> > didn't match, we couldn't tell why as the original computation was
> > lost. I did some work on allowing us to retain more of the
> > information
> > so that we didn't have to recompute it every time to be able to do
> > processing with it. I have to admit I've totally lost track of
> > where I
> > got to with that.
> 
> Here's an idea I can't get out of my head. Right now, the cache is
> simply an amorphous mass of objects, with no information regarding
> how
> they were created. How about storing complete build confgurations as
> well into the same directory? There would be a dedicated, separate
> area for each configuration that placed objects into the cache,
> containing:
> - list of layers and revisions
> - config template used
> - complete content of build/conf
> - bitbake invocation (e.g. targets and prefixed variables like
> MACHINE etc.)
> - complete list of sstate objects that were produced as a result, so
> they can be checked for existence
> 
> This would be written into the cache dir at the very end of the build
> when everything else is already there.
> 
> Right now, everyone sets up their own builds first, then points
> local.conf or site.conf to the cache, and hopes for the best
> regarding
> hit rates. Having stored build configs would allow inverting the
> workflow, so that you first ask from the cache what it can provide
> (e.g. it can provide mickledore or kirkstone core-image-minimal for
> qemux86, and that's exactly what you want as a starting point), then
> you use the build config stored in the cache to set up a build, and
> run it - and that would guarantee complete sstate reuse and getting
> to
> a functional image as soon as possible. Kind of like binary distro,
> but implemented with sstate.
> 
> Alex
> 
> 
>

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#189916): 
https://lists.openembedded.org/g/openembedded-core/message/189916
Mute This Topic: https://lists.openembedded.org/mt/101356420/21656
Group Owner: openembedded-core+ow...@lists.openembedded.org
Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub 
[arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Re: [OE-core] Core workflow: sstate for all, bblock/bbunlock, tools for why is sstate not being reused?

Reply via email to