I'd wait to see if Vanessa can shed light on reasoning there. Sometimes things that look odd are very smart but obscure.
Ed On Thu, Jan 19, 2017 at 9:06 PM, Thanh Ha <thanh...@linuxfoundation.org> wrote: > If we have consensus that apt-get update during vm init is a bad idea then > this patch might be a good quick solution [0]. > > Regards, > Thanh > > [0] https://gerrit.fd.io/r/4797 > > > On Thu, Jan 19, 2017 at 10:47 PM, Ed Warnicke <hagb...@gmail.com> wrote: > >> Thanh, >> >> I'm not quite sure the logic of having it at that particular point >> either. Something to investigate. >> >> Ed >> >> On Thu, Jan 19, 2017 at 8:44 PM, Thanh Ha <thanh...@linuxfoundation.org> >> wrote: >> >>> FWIW in OpenDaylight we don't typically run yum update or apt-get update >>> in our init-scripts on VM spinup. At the job level we only install >>> dependencies needed by the build. I'm not sure why fd.io is running >>> upgrades but it was existing in the script when I looked at it. System >>> upgrades during VM spinup is not something the OpenDaylight project does at >>> least. >>> >>> Regards, >>> Thanh >>> >>> >>> On Thu, Jan 19, 2017 at 10:38 PM, Dave Wallace <dwallac...@gmail.com> >>> wrote: >>> >>>> Ed, Thanh, Vanessa, >>>> >>>> IMHO, updating the ubuntu packages every time a VM is spun up is a bug >>>> wrt. being able to reproduce some (hopefully rare) build/test issues. >>>> Since every VM is potentially running with different versions of OS >>>> components, when a failure occurs (e.g. in "make test"), then it may be >>>> necessary to recreate the exact run-time environment in order to reproduce >>>> the failure. Unless the complete package list is being archived for every >>>> VM instance that is spun up, this may not be possible. >>>> >>>> My experience is that those rare cases where a tool or environment >>>> issue causes a failure, the cost to find the issue is extraordinarily high >>>> if you do not have the ability to recreate the EXACT build/run-time >>>> environment. This is why CSIT does not update OS components in the VM >>>> initialization scripts and the VM images are built from a specific package >>>> list instead of pulling the latest versions from the apt repositories. >>>> >>>> My recommendation is that the VM images be updated periodically (weekly >>>> or whenever a new security update is released) and the package lists >>>> archived for each VM image version. Each VM image should also be verified >>>> against a known good VPP commit version as is done with CSIT branches. >>>> Ideally we should build a fully automated continuous deployment model to >>>> reduce the amount of work to update the VM images to running a Jenkins job >>>> to build/test/deploy a new VM image from the latest packages versions. >>>> >>>> With that automation in place, this mechanism could be extended for use >>>> by CSIT as well as "make test", thus ensuring that all of our testing was >>>> done with the same OS component version. Ideally, all projects should be >>>> using the same OS components to ensure that everything is tested in the >>>> same run-time environment. >>>> >>>> Thanks, >>>> -daw- >>>> >>>> On 1/19/2017 8:31 PM, Thanh Ha via RT wrote: >>>> >>>> The issue with the 16.04 Ubuntu image is fixed now (but we may require >>>> some additional actions which I'll send to Vanessa to in case this issue >>>> comes up again). We fixed this issue tonight by rebuilding ubuntu1604 and >>>> deploying the new image. >>>> >>>> I'm going to close this ticket as resolved and we'll take the additional >>>> task to find a way to ensure this doesn't appear again off of this ticket. >>>> >>>> If you're not interested in the detailed analysis you can stop reading now. >>>> >>>> For those interested I suspect that the lock issue will appear again >>>> (although I could be wrong). The reason I believe so is that our vm init >>>> script runs "apt-get update" as an initialization step when the VM boots >>>> up at creation time via this script [0]. Ed mentioned that we didn't see >>>> this in the past and it only started appear again recently as we deployed >>>> another patch to disable Ubuntu's unattended updates. >>>> >>>> I believe a possible reason we will see this issue appear again due to [0] >>>> is because of we switched from using JClouds to OpenStack Jenkins plugins >>>> for node spinnup and there's difference in how the init-script is executed >>>> depending on which plugin is being used. >>>> >>>> JClouds Plugin: >>>> >>>> 1) boot vm >>>> 2) wait for ssh access >>>> 3) copies init-script into vm via ssh >>>> 4) executes init-script, and doesn't continue processing until script is >>>> complete >>>> 5) once init-script is complete, passes vm over to job and job starts >>>> >>>> OpenStack Plugin: >>>> >>>> 1) boot vm and passes init-script in as User Data >>>> 2) init-script runs inside vm without Jenkins intervention, thus is a >>>> non-blocking function >>>> 3) in parallel jenkins waits for ssh access to vm >>>> 4) ssh's into vm and passes vm over to job and job starts running >>>> >>>> In the OpenStack plugin case step 4 can execute while step 2 is still >>>> running apt-get update in the background because it was a non-blocking >>>> function. >>>> >>>> A few ideas I have to get around this. >>>> >>>> a) Allow init-script to continue running apt-get update however have a >>>> shell script at the start of Ubuntu jobs that waits for the lock to get >>>> released before allowing the job to start >>>> >>>> b) Remove apt-get update from init-script and make the job run apt-get >>>> update at the beginning of it's execution >>>> >>>> c) Regularly update VMs to ensure that apt-get update always runs quickly >>>> >>>> Regards, >>>> Thanh >>>> >>>> [0] >>>> https://git.fd.io/ci-management/tree/jenkins-scripts/basic_settings.sh#n14 >>>> >>>> >>>> On Thu Jan 19 19:23:59 2017, hagbard wrote: >>>> >>>> FYI... helpdesk is on it, and its being worked in #fdio-infra on IRC >>>> >>>> Ed >>>> >>>> On Thu, Jan 19, 2017 at 4:31 PM, Ed Warnicke <hagb...@gmail.com> >>>> <hagb...@gmail.com> wrote: >>>> >>>> >>>> Looping in help desk. >>>> On Thu, Jan 19, 2017 at 4:16 PM Dave Barach (dbarach) <dbar...@cisco.com> >>>> <dbar...@cisco.com> >>>> wrote: >>>> >>>> >>>> Folks, >>>> >>>> >>>> >>>> See https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/3378/console >>>> >>>> >>>> >>>> 11:00:46 E: Could not get lock /var/lib/dpkg/lock - open (11: Resource >>>> temporarily unavailable) >>>> >>>> 11:00:46 E: Unable to lock the administration directory (/var/lib/dpkg/), >>>> is another process using it? >>>> >>>> >>>> >>>> I recognize this failure from my own Ubuntu 16.04 system: a cron-job >>>> starts “apt-get -q”, which for whatever reason does not terminate. As a >>>> workaround, “sudo killall apt-get || true” before trying to acquire build >>>> dependencies... >>>> >>>> >>>> >>>> HTH... Dave >>>> >>>> >>>> _______________________________________________ >>>> >>>> vpp-dev mailing list >>>> vpp-dev@lists.fd.io >>>> https://lists.fd.io/mailman/listinfo/vpp-dev >>>> >>>> _______________________________________________ >>>> vpp-dev mailing >>>> listvpp-...@lists.fd.iohttps://lists.fd.io/mailman/listinfo/vpp-dev >>>> >>>> >>>> >>> >> >
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev