On Sat, 2023-11-04 at 11:09 +0000, Richard Purdie wrote: > On Sat, 2023-11-04 at 11:29 +0100, adrian.freiho...@gmail.com wrote: > > Hi Alex, hi Richard > > > > After some internal discussions, I would like to clarify my > > previous > > answers on this topic. > > > > * Usually there are two different workflows > > - application developers: could use an SDK with a locked > > sstate-cache. > > - Yocto/BSP developers: need an unlocked SDK. They change the > > recipes. > > * A locked SDK > > - can work with setscene from SSTATE_MIRRORS > > - setscene does caching in the SSTATE_DIR (no issue about that) > > - But network problems can occur during the initial build > > because > > bitbake executes many independent setscene tasks. Opening so > > many > > independent connections slows down the build, especially if > > the > > server treats them as a denial of service attack. > > - The denial of service problem is difficult to solve because > > each > > setscene task runs in its own bibtake task. Reusing a > > connection to > > download multiple sstate artifacts seems almost impossible. > > This is much easier to solve with separate sstate download > > script. > > FWIW, we did have a similar issue with do_fetch overloading > servers/proxies/ISPs and added: > > do_fetch[number_threads] = "4" > > Finding the right place to put a thread limit on overall setscene > tasks > is harder but in theory possible. Or perhaps a "network capable > tasks" > thread limit? > > Is the overload caused by the initial query of sstate presence, or, > does it happen when the setscene tasks themselves run?
The most extreme situation is probably bitbake --setscene-only with an empty TMPDIR. Each of the setscene tasks establishes a new connection. A server receives so many connections that it treats them as a denial of service attack by throttling. A separate script would allow the same connection to be reused to download all the required artifacts. Limiting the number of threads does not really solve the issue because there are still the same amount of connections which get quickly opened. > > > > * An unlocked SDK > > - Tries to download the sstate cache for changed recipes and > > their > > dependencies, which obviously can't work. > > - The useless download requests slow down the build > > considerably and > > cause a high load on the servers without any benefit. > > Is this sstate over http(s) or something else? I seem to remember you > mentioning sftp. If this were using sftp, it would be horribly slow > as > it was designed for a light overhead "does this exist?" check which > http(s) can manage well. Yes, we are evaluating sftp. You are right, it is not optimal from a performance point of view. For example S3 is much faster. A compromise is to set up a limited number of parallel sftp connections. This has worked very well so far. The question of why we use sftp brings us to a larger topic that is probably relevant for almost all Yocto users, but not for the Yocto project itself: Security. There is usually a git server infrastructure that makes it possible to protect Git repositories with finely graded access policies. As the sstate-cache contains the same source code, the protection concept for the Git repositories must also be applied to the sstate-cache artifacts. First of all a user authentication is required for the sstate-mirror. An obvious idea is to use the same user authentication for the sstate- cache server as for the Git server. In addition to https, ssh is also often used for git repositories. SSH even offers some advantages in terms of user-friendliness and security (if a ssh agent is used). This consideration finally leads us to use the sftp protocol for the sstate mirror. This is also relatively easy to administer: Simply copy the user's public ssh keys from the git server to the sftp server. If one then wants to scale an sstate-cache server for many different projects and users, one quickly wishes for an option for authorization at artifact level. Ideally, the access rights to the source code would be completely transferred to the associated sstate artifacts. For such an authorization the ssate mirror server would require the SRC_URI which was used to compile the sstate artifact. With this information, it could ask the Git server whether or not a user has access to all source code repositories to grant or deny access to a particular sstate artifact. It should not be forgotten that the access rights to the Git repositories can change. > > Recently we've been wondering about teaching the hashequiv server > about > "presence", which would then mean the build would only query things > that stood a good chance of existing. > Yes, that sound very interesting. There are probably even more such kind of meta data which could be provided by the hashserver to improve the management of a shared sstate mirror. Would it make sense to include e.g. the SRC_URI in the hashserv database and extend the hashserver's API to also provide meta data e.g. for the authorization of the sstate-mirror? Or is security and authorization something which should be handled independently from hash equivalence? Another topic where additional meta data about the sstate-cache seams to be beneficial is sstate-mirror retention. Knowing which artifact was compiled for which tag or commit of the bitbake layer could help to wipe out some artifacts which are not needed anymore. > > - A script which gets a list of sstate artifacts from bitbake > > and then > > does a upfront download works much better > > + The script runs only when the user calls it or the SDK > > gets boot- > > strapped > > + The script uses a reasonable amount of parallel > > connections which > > are re-used for more then one artifact download > > Explaining to users they need to do X before Y quickly gets tiring, > both for people explaining it and the people doing it trying to > remember. I'd really like to get to a point where the system "does > the > right thing" if we can. > > I don't believe the problems you describe are insurmountable. If you > are using sftp, that is going to be a big chunk of the problem as the > system assumes something faster is available. Yes, I've taken patches > to make sftp work but it isn't recommended at all. I appreciate there > would be reasons why you use sftp but if it is possible to get a list > of "available sstate" via other means, it would improve things. > > > * Idea for a smart lock/unlock implementation > > - Form a user's perspective a locked vs. an unlocked SDK does > > not make > > much sense. It makes more sense if the SDK would > > automatically > > download the sstate-cache if it is expected to be available. > > Lets think about an implementation (which allows to override > > the > > logic) to switch from automatic to manual mode: > > > > SSTATE_MIRRORS_ENABLED ?= "${is_sstate_mirror_available()}" > > What determines this availability? I worry that is something very > fragile and specific to your use case. It is also not an all or > nothing > binary thing. It would probably be better to query a harserver if an artifact is present. > > > In our case the sstate mirror is expected to provide all > > artifacts > > for tagged commits and for some git branches of the layer > > repositories. > > The sstate is obviousely not usable for a "dirty" git layer > > repository. > > That isn't correct and isn't going to work. If I make a single change > locally, there is a good chance that 99.9% of the sstate could still > be > valid in some cases. Forcing the user through 10 hours of rebuild > when > potentially that much was available is a really really bad user > experience. Maybe there is a better idea. > > > That's what the is_sstate_mirror_available function > > could check to automatically enable and disable lazy > > downloads. > > > > - If is_sstate_mirror_available() returns false, it should > > still be > > possible to initiate a sstate-cache download manually. > > > > * Terminology > > - Older Yocto Releases: > > + eSDK means an installer which provides a different > > environment with > > different tools > > + The eSDK was static, with a locked sstate cache > > + Was for one MACHINE, for one image... > > - Newer Yocto Releases: > > + The bitbake environment offers all features of the eSDK > > installer. I > > consider this as already implemented with meta-ide-support > > and > > build-sysroots. > > Remember bblock and bbunlock too. These provide a way to fix or > unlock > specific sections of the codebase. Usually a developer has a pretty > good idea of which bits they want to allow to change. I don't think > people have yet realised/explored the potential these offer. > Yes, I also started thinking about the possibilities we would get for the SDK if there is a hash-server or an even more generic a meta data server for the sstate-cache in the middle of the infrastructure picture. it would probably solve some challenges which I could not find a solution so far. Thank you for your response. Adrian > Cheers, > > Richard
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#190190): https://lists.openembedded.org/g/openembedded-core/message/190190 Mute This Topic: https://lists.openembedded.org/mt/101356420/21656 Group Owner: openembedded-core+ow...@lists.openembedded.org Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-