[ceph-users] Re: Why you might want packages not containers for Ceph deployments

Marc Sun, 20 Jun 2021 07:52:28 -0700

Thanks for answering these. I have been using ceph since Kraken and are now on 
Nautilus. I thought before to join this discussion to watch this video[1] on 
cephadm, but it seems to be more about what console commands to type. So please 
indulge my rookie comments.


> the cephadm
> team isn't yet swayed by the anti-container arguments, so there would
> be some lobbying and discussion to be done first!

To be honest, the same goes for seeing the arguments using the cephadm 
approach. 


Remarks about your cephadm approach/design:

1. I am not interested in learning podman, rook or kubernetes. I am using mesos 
which is also on my osd nodes to use the extra available memory and cores. 
Furthermore your cephadm OC is limited to only ceph nodes. While my mesos OC is 
spread across a larger cluster and has rules when, and when not to run tasks on 
the osd nodes. You incorrectly assume that rgw, grafana, prometheus, haproxy 
are going to be ran on your ceph OC.

2. Nico pointed out that you do not have alpine linux container images. I did 
not even know you were using container images. So how big are these? Where are 
these stored. And why are these not as small as they can be? Such an osd 
container image should be 20MB or so at most. I would even expect statically 
build binary container image, why even a tiny os?

3. Why is in this cephadm still being talked about systemd? Your orchestrator 
should handle restarts,namespaces and failed tasks not? There should be no need 
to have a systemd dependency, at least I have not seen any container images 
relying on this.

4. Ok found the container images[2] (I think). Sorry but this has ‘nothing’ to 
do with container thinking. I expected to find container images for osd, msd, 
rgw separately and smaller. This looks more like an OS deployment.

5. I have been writing this previously on the mailing list here. Is each rgw 
still requiring its own dedicated client id? Is it still true, that if you want 
to spawn 3 rgw instances, they need to authorize like client.rgw1, client.rgw2 
and client.rgw3?
This does not allow for auto scaling. The idea of using an OC is that you 
launch a task, and that you can scale this task automatically when necessary. 
So you would get multiple instances of rgw1. If this is still and issue with 
rgw, mds and mgr etc. Why even bother doing something with an OC and containers?

6. As I wrote before I do not want my rgw or haproxy running in a OC that has 
the ability to give tasks capability SYSADMIN. So that would mean I have to run 
my osd daemons/containers separately.

7. If you are not setting cpu and memory limits on your cephadm containers, 
then again there is an argument why even use containers.

8. I still see lots of comments on the mailing list about accessing logs. I 
have all my containers log to a remote syslog server, if you still have your 
ceph daemons that can not do this (correctly). What point is it even going to 
containers.

9. I am updating my small cluster something like this:

ssh root@c01 "ceph osd set noout  ; ceph osd set noscrub ; ceph osd set 
nodeep-scrub" 
ssh root@c01 "ceph tell osd.* injectargs '--osd_max_scrubs=0'" 

ssh root@c01 "yum update 'ceph-*' -y" 
...

ssh root@c01 "service ceph-mon@a restart" 
...

ssh root@c01 "service ceph-mgr@a restart" 
...

# wait for up and recovery to finish
ssh root@c01  "systemctl restart 'ceph-osd@*'" 
…

I am never going to run a ‘ceph orch upgrade start –ceph-version 16.2.0’. I 
want to see if everything is ok after each command I issue. I want to see if 
scrubbing stopped, I want to see if osd have correctly accepted the new config.
I have a small cluster so I do not see this procedure as a waste of time. If I 
look at your telemetry data[3]. I see 600 clusters with 35k osd’s, that is an 
average of 60 osd per cluster. So these are quite small clusters, I would think 
these admins have a similar point of view as I have.

That leaves these big clusters of >3k osd’s. I wonder what these admins 
require, are they at CERN really waiting for something like cephadm?


I am rather getting the impression you need to have an easy deployment tool for 
ceph than you want to really utilize containers. First there was this 
ceph-deploy and ceph-ansible which I luckily skipped both, and now there is 
cephadm.
I am not anti-container and I think a lot here are not anti-container but you 
are using this as an argument to push cephadm work.
The ceph daemons seem to be not prepared for container use, ceph containers 
can’t use cpu/memory limits, container images are not how they should be. And 
last but not least you totally bypass that the (ceph) admin should choose the 
OC platform and not you, because he probably has more than just ceph nodes.

So my question to you: What problem is it actually that your cephadm dev team 
is trying to solve? That is not clear to me.

[1]
https://www.youtube.com/watch?v=A2LK--zs4Io

[2]
https://hub.docker.com/r/ceph/ceph/tags?page=1&ordering=last_updated

[3]
https://telemetry-public.ceph.com/d/ZFYuv1qWz/telemetry?orgId=1


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

Reply via email to