> -----Original Message----- > From: Richard Purdie [mailto:richard.pur...@linuxfoundation.org] > Sent: den 30 augusti 2017 10:03 > To: Peter Kjellerstedt <peter.kjellerst...@axis.com>; Andre McCurdy > <armccu...@gmail.com> > Cc: OE Core mailing list <openembedded-core@lists.openembedded.org> > Subject: Re: [OE-core] [PATCH 0/2] Avoid build failures due to setscene > errors > > On Wed, 2017-08-30 at 06:44 +0000, Peter Kjellerstedt wrote: > > > I have left this code as an error deliberately as this kind of > > > thing should not happen and if it does, there is really something > > > wrong which you need to figure out. It means that at one point > > > bitbake thinks the sstate is present and valid, then later it > > > isn't. > > > > True, but since the operations of checking if an sstate file exists > > and retrieving it is not an atomic operation, there are always > > problems that can occur. Some may be fixable, some may not. However, > > using a build failure to detect these kind of problems is a bit harsh > > on the developers who only sees their builds complete only to get an > > error for something that is not their fault. We have better ways to > > detect these kinds of problems, e.g., through log monitoring, without > > having to cause unnecessary grief amongst the developers. > > Files are randomly disappearing from your sstate source. So far you've > been lucky and these are not causing corruption, but they could.
Somehow I fail to see how missing sstate cache files can cause corruption. If they are missing, the real task is run and all is well. Also, I do not actually know if the files disappear permanently or temporarily, because at the time when I look at the global sstate cache the files are there, newly created because the build continued and let the real task run. My guess though is that the files only temporarily disappeared due to some network glitch, but currently I cannot verify it. Regardless of whether my proposed changes are accepted or not, if you want to keep the default behavior that a failed setscene task will eventually cause the build to fail, then we should change it to fail immediately instead. Continuing the build when you know it will fail makes no sense at all. > Please figure out and fix your sstate infrastructure, not hack the code > to avoid the errors. As Martin Jansa mentioned in another response, the problem may be due to NFS or general network disturbances. And I see no way to protect ourselves from them. And apparently we are not alone in seeing these kinds of transient errors. > I do appreciate its painful, we did once see this issue on the > autobuilder. There was a real error in the sstate cleanup scripts and > we fixed that but it took some work to find it. Are your sstate cache clean up scripts available somewhere? Because obviously it is not trivial to get it right, and since keeping the sstate cache clean is something that I expect many like to do, having a common script for this seems like a good thing. Otherwise I can contribute our script. If nothing else it would probably be good to have it reviewed by someone who is an expert on the sstate cache. It currently features: * configurable retention period (default is 10 days) * removes related .tgz and .tgz.siginfo files as one * can remove stale symbolic links (typically wanted for a local sstate cache which has links into a global sstate cache which have seen the actual files being cleaned away) * dry run mode * quiet mode (only prints a summary stating how much was clean up and the current size of the sstate cache; very nice for running it as a cronjob) > Also, with changes like this you can end up in a state where sstate can > completely stop working and the only way you'd tell is by increased > build time. As I mentioned, we have monitoring of our builds in place and would definitely notice if the global sstate cache is not used as expected. > > > I'm not convinced patching out the errors is the right solution > > > here... > > > > How about I make it conditional by adding an IGNORE_SETSCENE_ERRORS? > > That way it can default to "0", but we can set it to "1" to > > prioritize the production builds. > > I'm still not convinced, sorry. > > [The reason being complexity. I don't like having multiple ways of > doing things if we can help it, particularly when one of them is a > workaround for a problem elsewhere. One of the codepaths in a case like > this is unlikely to get well tested.] Well, as long as the conditional path is clearly marked as "only enable this if you know what you are doing", I do not see a problem with that path receiving less or no testing by you. It should get enough testing by those of us who rely on it. The problem for me in this kind of situations is that we do not want to make changes to anything inside the Poky repository (which would effectively fork it), because down that route lies madness. So instead we rely on making all adaptations in our own layers. Making changes to recipes is easy as we can use .bbappends in our layers. Making changes to classes or configuration files works by copying them to our layers and changing them there, even though I personally hate it because it causes extra maintenance for me since I often need to build with a newer version of Poky than our layers are currently adapted for in preparations for updating to the next Poky release. However, changes to anything inside bitbake is near impossible. The same with changes to anything in meta/lib/oe. Thus we rely on being able to find a way to get these kinds of changes integrated upstream. > Cheers, > > Richard And in case any of the above sounds as if I am trying to force a feature down your throat that you do not like, then I beg for forgiveness. We really do appreciate your expertise and dedication to the OE community, and I hope we can work this to something that you can accept and that we can use. //Peter -- _______________________________________________ Openembedded-core mailing list Openembedded-core@lists.openembedded.org http://lists.openembedded.org/mailman/listinfo/openembedded-core