I'd wait to see if Vanessa can shed light on reasoning there.  Sometimes
things that look odd are very smart but obscure.

Ed

On Thu, Jan 19, 2017 at 9:06 PM, Thanh Ha <thanh...@linuxfoundation.org>
wrote:

> If we have consensus that apt-get update during vm init is a bad idea then
> this patch might be a good quick solution [0].
>
> Regards,
> Thanh
>
> [0] https://gerrit.fd.io/r/4797
>
>
> On Thu, Jan 19, 2017 at 10:47 PM, Ed Warnicke <hagb...@gmail.com> wrote:
>
>> Thanh,
>>
>> I'm not quite sure the logic of having it at that particular point
>> either.  Something to investigate.
>>
>> Ed
>>
>> On Thu, Jan 19, 2017 at 8:44 PM, Thanh Ha <thanh...@linuxfoundation.org>
>> wrote:
>>
>>> FWIW in OpenDaylight we don't typically run yum update or apt-get update
>>> in our init-scripts on VM spinup. At the job level we only install
>>> dependencies needed by the build. I'm not sure why fd.io is running
>>> upgrades but it was existing in the script when I looked at it. System
>>> upgrades during VM spinup is not something the OpenDaylight project does at
>>> least.
>>>
>>> Regards,
>>> Thanh
>>>
>>>
>>> On Thu, Jan 19, 2017 at 10:38 PM, Dave Wallace <dwallac...@gmail.com>
>>> wrote:
>>>
>>>> Ed, Thanh, Vanessa,
>>>>
>>>> IMHO, updating the ubuntu packages every time a VM is spun up is a bug
>>>> wrt. being able to reproduce some (hopefully rare) build/test issues.
>>>> Since every VM is potentially running with different versions of OS
>>>> components, when a failure occurs (e.g. in "make test"), then it may be
>>>> necessary to recreate the exact run-time environment in order to reproduce
>>>> the failure. Unless the complete package list is being archived for every
>>>> VM instance that is spun up, this may not be possible.
>>>>
>>>> My experience is that those rare cases where a tool or environment
>>>> issue causes a failure, the cost to find the issue is extraordinarily high
>>>> if you do not have the ability to recreate the EXACT build/run-time
>>>> environment.  This is why CSIT does not update OS components in the VM
>>>> initialization scripts and the VM images are built from a specific package
>>>> list instead of pulling the latest versions from the apt repositories.
>>>>
>>>> My recommendation is that the VM images be updated periodically (weekly
>>>> or whenever a new security update is released) and the package lists
>>>> archived for each VM image version.  Each VM image should also be verified
>>>> against a known good VPP commit version as is done with CSIT branches.
>>>> Ideally we should build a fully automated continuous deployment model to
>>>> reduce the amount of work to update the VM images to running a Jenkins job
>>>> to build/test/deploy a new VM image from the latest packages versions.
>>>>
>>>> With that automation in place, this mechanism could be extended for use
>>>> by CSIT as well as "make test", thus ensuring that all of our testing was
>>>> done with the same OS component version.  Ideally, all projects should be
>>>> using the same OS components to ensure that everything is tested in the
>>>> same run-time environment.
>>>>
>>>> Thanks,
>>>> -daw-
>>>>
>>>> On 1/19/2017 8:31 PM, Thanh Ha via RT wrote:
>>>>
>>>> The issue with the 16.04 Ubuntu image is fixed now (but we may require 
>>>> some additional actions which I'll send to Vanessa to in case this issue 
>>>> comes up again). We fixed this issue tonight by rebuilding ubuntu1604 and 
>>>> deploying the new image.
>>>>
>>>> I'm going to close this ticket as resolved and we'll take the additional 
>>>> task to find a way to ensure this doesn't appear again off of this ticket.
>>>>
>>>> If you're not interested in the detailed analysis you can stop reading now.
>>>>
>>>> For those interested I suspect that the lock issue will appear again 
>>>> (although I could be wrong). The reason I believe so is that our vm init 
>>>> script runs "apt-get update" as an initialization step when the VM boots 
>>>> up at creation time via this script [0]. Ed mentioned that we didn't see 
>>>> this in the past and it only started appear again recently as we deployed 
>>>> another patch to disable Ubuntu's unattended updates.
>>>>
>>>> I believe a possible reason we will see this issue appear again due to [0] 
>>>> is because of we switched from using JClouds to OpenStack Jenkins plugins 
>>>> for node spinnup and there's difference in how the init-script is executed 
>>>> depending on which plugin is being used.
>>>>
>>>> JClouds Plugin:
>>>>
>>>> 1) boot vm
>>>> 2) wait for ssh access
>>>> 3) copies init-script into vm via ssh
>>>> 4) executes init-script, and doesn't continue processing until script is 
>>>> complete
>>>> 5) once init-script is complete, passes vm over to job and job starts
>>>>
>>>> OpenStack Plugin:
>>>>
>>>> 1) boot vm and passes init-script in as User Data
>>>> 2) init-script runs inside vm without Jenkins intervention, thus is a 
>>>> non-blocking function
>>>> 3) in parallel jenkins waits for ssh access to vm
>>>> 4) ssh's into vm and passes vm over to job and job starts running
>>>>
>>>> In the OpenStack plugin case step 4 can execute while step 2 is still 
>>>> running apt-get update in the background because it was a non-blocking 
>>>> function.
>>>>
>>>> A few ideas I have to get around this.
>>>>
>>>> a) Allow init-script to continue running apt-get update however have a 
>>>> shell script at the start of Ubuntu jobs that waits for the lock to get 
>>>> released before allowing the job to start
>>>>
>>>> b) Remove apt-get update from init-script and make the job run apt-get 
>>>> update at the beginning of it's execution
>>>>
>>>> c) Regularly update VMs to ensure that apt-get update always runs quickly
>>>>
>>>>  Regards,
>>>> Thanh
>>>>
>>>> [0] 
>>>> https://git.fd.io/ci-management/tree/jenkins-scripts/basic_settings.sh#n14
>>>>
>>>>
>>>> On Thu Jan 19 19:23:59 2017, hagbard wrote:
>>>>
>>>> FYI... helpdesk is on it, and its being worked in #fdio-infra on IRC
>>>>
>>>> Ed
>>>>
>>>> On Thu, Jan 19, 2017 at 4:31 PM, Ed Warnicke <hagb...@gmail.com> 
>>>> <hagb...@gmail.com> wrote:
>>>>
>>>>
>>>> Looping in help desk.
>>>> On Thu, Jan 19, 2017 at 4:16 PM Dave Barach (dbarach) <dbar...@cisco.com> 
>>>> <dbar...@cisco.com>
>>>> wrote:
>>>>
>>>>
>>>> Folks,
>>>>
>>>>
>>>>
>>>> See https://jenkins.fd.io/job/vpp-verify-master-ubuntu1604/3378/console
>>>>
>>>>
>>>>
>>>> 11:00:46 E: Could not get lock /var/lib/dpkg/lock - open (11: Resource
>>>> temporarily unavailable)
>>>>
>>>> 11:00:46 E: Unable to lock the administration directory (/var/lib/dpkg/),
>>>> is another process using it?
>>>>
>>>>
>>>>
>>>> I recognize this failure from my own Ubuntu 16.04 system: a cron-job
>>>> starts “apt-get -q”, which for whatever reason does not terminate. As a
>>>> workaround, “sudo killall apt-get || true” before trying to acquire build
>>>> dependencies...
>>>>
>>>>
>>>>
>>>> HTH... Dave
>>>>
>>>>
>>>> _______________________________________________
>>>>
>>>> vpp-dev mailing list
>>>> vpp-dev@lists.fd.io
>>>> https://lists.fd.io/mailman/listinfo/vpp-dev
>>>>
>>>> _______________________________________________
>>>> vpp-dev mailing 
>>>> listvpp-...@lists.fd.iohttps://lists.fd.io/mailman/listinfo/vpp-dev
>>>>
>>>>
>>>>
>>>
>>
>
_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Reply via email to