On 11/28/18 2:58 PM, Dan Prince wrote:
On Wed, 2018-11-28 at 12:45 +0100, Bogdan Dobrelya wrote:
To follow up and explain the patches for code review:
The "header" patch https://review.openstack.org/620310 -> (requires)
https://review.rdoproject.org/r/#/c/17534/, and also
https://review.openstack.org/620061 -> (which in turn requires)
https://review.openstack.org/619744 -> (Kolla change, the 1st to go)
https://review.openstack.org/619736
This email was cross-posted to multiple lists and I think we may have
lost some of the context in the process as the subject was changed.
Most of the suggestions and patches are about making our base
container(s) smaller in size. And the means by which the patches do
that is to share binaries/applications across containers with custom
mounts/volumes. I've -2'd most of them. What concerns me however is
that some of the TripleO cores seemed open to this idea yesterday on
IRC. Perhaps I've misread things but what you appear to be doing here
is quite drastic I think we need to consider any of this carefully
before proceeding with any of it.
Please also read the commit messages, I tried to explain all "Whys"
very
carefully. Just to sum up it here as well:
The current self-containing (config and runtime bits) architecture
of
containers badly affects:
* the size of the base layer and all containers images as an
additional 300MB (adds an extra 30% of size).
You are accomplishing this by removing Puppet from the base container,
but you are also creating another container in the process. This would
still be required on all nodes as Puppet is our config tool. So you
would still be downloading some of this data anyways. Understood your
reasons for doing this are that it avoids rebuilding all containers
when there is a change to any of these packages in the base container.
What you are missing however is how often is it the case that Puppet is
updated that something else in the base container isn't?
For CI jobs updating all containers, its quite an often to have changes
in openstack/tripleo puppet modules to pull in. IIUC, that automatically
picks up any updates for all of its dependencies and for the
dependencies of dependencies, and all that multiplied by a hundred of
total containers to get it updated. That is a *pain* we're used to have
these day for quite often timing out CI jobs... Ofc, the main cause is
delayed promotions though.
For real deployments, I have no data for the cadence of minor updates in
puppet and tripleo & openstack modules for it, let's ask operators (as
we're happened to be in the merged openstack-discuss list)? For its
dependencies though, like systemd and ruby, I'm pretty sure it's quite
often to have CVEs fixed there. So I expect what "in the fields"
security fixes delivering for those might bring some unwanted hassle for
long-term maintenance of LTS releases. As Tengu noted on IRC:
"well, between systemd, puppet and ruby, there are many security
concernes, almost every month... and also, what's the point keeping them
in runtime containers when they are useless?"
I would wager that it is more rare than you'd think. Perhaps looking at
the history of an OpenStack distribution would be a valid way to assess
this more critically. Without this data to backup the numbers I'm
afraid what you are doing here falls into "pre-optimization" territory
for me and I don't think the means used in the patches warrent the
benefits you mention here.
* Edge cases, where we have containers images to be distributed, at
least once to hit local registries, over high-latency and limited
bandwith, highly unreliable WAN connections.
* numbers of packages to update in CI for all containers for all
services (CI jobs do not rebuild containers so each container gets
updated for those 300MB of extra size).
It would seem to me there are other ways to solve the CI containers
update problems. Rebuilding the base layer more often would solve this
right? If we always build our service containers off of a base layer
that is recent there should be no updates to the system/puppet packages
there in our CI pipelines.
* security and the surface of attacks, by introducing systemd et al
as
additional subjects for CVE fixes to maintain for all containers.
We aren't actually using systemd within our containers. I think those
packages are getting pulled in by an RPM dependency elsewhere. So
rather than using 'rpm -ev --nodeps' to remove it we could create a
sub-package for containers in those cases and install it instead. In
short rather than hack this to remove them why not pursue a proper
packaging fix?
In general I am a fan of getting things out of the base container we
don't need... so yeah lets do this. But lets do it properly.
* services uptime, by additional restarts of services related to
security maintanence of irrelevant to openstack components sitting
as a dead weight in containers images for ever.
Like I said above how often is it that these packages actually change
where something else in the base container doesn't? Perhaps we should
get more data here before blindly implementing a solution we aren't
sure really helps out in the real world.
On 11/27/18 4:08 PM, Bogdan Dobrelya wrote:
Changing the topic to follow the subject.
[tl;dr] it's time to rearchitect container images to stop
incluiding
config-time only (puppet et al) bits, which are not needed runtime
and
pose security issues, like CVEs, to maintain daily.
Background: 1) For the Distributed Compute Node edge case, there
is
potentially tens of thousands of a single-compute-node remote edge
sites
connected over WAN to a single control plane, which is having high
latency, like a 100ms or so, and limited bandwith.
2) For a generic security case,
3) TripleO CI updates all
Challenge:
Here is a related bug [1] and implementation [1] for that. PTAL
folks!
[0] https://bugs.launchpad.net/tripleo/+bug/1804822
[1]
https://review.openstack.org/#/q/topic:base-container-reduction
Let's also think of removing puppet-tripleo from the base
container.
It really brings the world-in (and yum updates in CI!) each job
and
each container!
So if we did so, we should then either install puppet-tripleo
and co
on the host and bind-mount it for the docker-puppet deployment
task
steps (bad idea IMO), OR use the magical --volumes-from
<a-side-car-container> option to mount volumes from some
"puppet-config" sidecar container inside each of the containers
being
launched by docker-puppet tooling.
On Wed, Oct 31, 2018 at 11:16 AM Harald Jensås <hjensas at
redhat.com>
wrote:
We add this to all images:
https://github.com/openstack/tripleo-common/blob/d35af75b0d8c4683a677660646e535cf972c98ef/container-images/tripleo_kolla_template_overrides.j2#L35
/bin/sh -c yum -y install iproute iscsi-initiator-utils lvm2
python
socat sudo which openstack-tripleo-common-container-base rsync
cronie
crudini openstack-selinux ansible python-shade puppet-tripleo
python2-
kubernetes && yum clean all && rm -rf /var/cache/yum 276 MB
Is the additional 276 MB reasonable here?
openstack-selinux <- This package run relabling, does that kind
of
touching the filesystem impact the size due to docker layers?
Also: python2-kubernetes is a fairly large package (18007990)
do we use
that in every image? I don't see any tripleo related repos
importing
from that when searching on Hound? The original commit
message[1]
adding it states it is for future convenience.
On my undercloud we have 101 images, if we are downloading
every 18 MB
per image thats almost 1.8 GB for a package we don't use? (I
hope it's
not like this? With docker layers, we only download that 276 MB
transaction once? Or?)
[1] https://review.openstack.org/527927
--
Best regards,
Bogdan Dobrelya,
Irc #bogdando
--
Best regards,
Bogdan Dobrelya,
Irc #bogdando
__________________________________________________________________________
OpenStack Development Mailing List (not for usage questions)
Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev