Frank,

Thank you for the explanation, these are freshly installed machines and did
not have ceph on them. I checked one of the other OSD nodes and there is no
ceph user in /etc/passwd, nor is UID 167 allocated to any user. I did
install ceph-common from the 18.04 repos before realizing that deploying
ceph in containers did not update the host's /etc/apt/sources.list (or add
an entry in /etc/apt/sources.list.d/). I manually added the repo for
nautilus and upgraded the packages. So, I don't know if that had anything
to do with it. Maybe Ubuntu packages ceph under UID 64045 and upgrading to
the Ceph distributed packages didn't change the UID.

Thanks,
Robert LeBlanc
----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Aug 29, 2019 at 12:33 AM Frank Schilder <fr...@dtu.dk> wrote:

> Hi Robert,
>
> this is a bit less trivial than it might look right now. The ceph user is
> usually created by installing the package ceph-common. By default it will
> use id 167. If the ceph user already exists, I would assume it will use the
> existing user to allow an operator to avoid UID collisions (if 167 is used
> already).
>
> If you use docker, the ceph UID on the host and inside the container
> should match (or need to be translated). If they don't, you will have a lot
> of fun re-owning stuff all the time, because deployments will use the
> symbolic name ceph, which has different UIDs on the host and inside the
> container in your case.
>
> I would recommend removing this discrepancy as soon as possible:
>
> 1) Find out why there was a ceph user with UID different from 167 before
> installation of ceph-common.
>    Did you create it by hand? Was UID 167 allocated already?
> 2) If you can safely change the GID and UID of ceph to 167, just do
> groupmod+usermod with new GID and UID.
> 3) If 167 is used already by another service, you will have to map the
> UIDs between host and container.
>
> To prevent ansible from deploying dockerized ceph with mismatching user ID
> for ceph, add these tasks to an appropriate part of your deployment
> (general host preparation or so):
>
> - name: "Create group 'ceph'."
>   group:
>     name: ceph
>     gid: 167
>     local: yes
>     state: present
>     system: yes
>
> - name: "Create user 'ceph'."
>   user:
>     name: ceph
>     password: "!"
>     comment: "ceph-container daemons"
>     uid: 167
>     group: ceph
>     shell: "/sbin/nologin"
>     home: "/var/lib/ceph"
>     create_home: no
>     local: yes
>     state: present
>     system: yes
>
> This should err if a group and user ceph already exist with IDs different
> from 167.
>
> Best regards,
>
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Robert
> LeBlanc <rob...@leblancnet.us>
> Sent: 28 August 2019 23:23:06
> To: ceph-users
> Subject: Re: [ceph-users] Failure to start ceph-mon in docker
>
> Turns out /var/lib/ceph was ceph.ceph and not 167.167, chowning it made
> things work. I guess only monitor needs that permission, rgw,mgr,osd are
> all happy without needing it to be 167.167.
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Wed, Aug 28, 2019 at 1:45 PM Robert LeBlanc <rob...@leblancnet.us
> <mailto:rob...@leblancnet.us>> wrote:
> We are trying to set up a new Nautilus cluster using ceph-ansible with
> containers. We got things deployed, but I couldn't run `ceph s` on the host
> so decided to `apt install ceph-common and installed the Luminous version
> from Ubuntu 18.04. For some reason the docker container that was running
> the monitor restarted and won't restart. I added the repo for Nautilus and
> upgraded ceph-common, but the problem persists. The Manager and OSD docker
> containers don't seem to be affected at all. I see this in the journal:
>
> Aug 28 20:40:55 sun-gcs02-osd01 systemd[1]: Starting Ceph Monitor...
> Aug 28 20:40:55 sun-gcs02-osd01 docker[2926]: Error: No such container:
> ceph-mon-sun-gcs02-osd01
> Aug 28 20:40:55 sun-gcs02-osd01 systemd[1]: Started Ceph Monitor.
> Aug 28 20:40:55 sun-gcs02-osd01 docker[2949]: WARNING: Your kernel does
> not support swap limit capabilities or the cgroup is not mounted. Memory
> limited without swap.
> Aug 28 20:40:56 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:40:56
> /opt/ceph-container/bin/entrypoint.sh: Existing mon, trying to rejoin
> cluster...
> Aug 28 20:40:56 sun-gcs02-osd01 docker[2949]: warning: line 41:
> 'osd_memory_target' in section 'osd' redefined
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:41:03
> /opt/ceph-container/bin/entrypoint.sh: /etc/ceph/ceph.conf is already
> memory tuned
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:41:03
> /opt/ceph-container/bin/entrypoint.sh: SUCCESS
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: exec: PID 368: spawning
> /usr/bin/ceph-mon --cluster ceph --default-log-to-file=false
> --default-mon-cluster-log-to-file=false --setuser ceph --setgroup ceph -d
> --mon-cluster-log-to-stderr --log-stderr-prefix=debug  -i sun-gcs02-osd01
> --mon-data /var/lib/ceph/mon/ceph-sun-gcs02-osd01 --public-addr 10.65.101.21
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: exec: Waiting 368 to quit
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: warning: line 41:
> 'osd_memory_target' in section 'osd' redefined
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28
> 20:41:03.835 7f401283c180  0 set uid:gid to 167:167 (ceph:ceph)
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28
> 20:41:03.835 7f401283c180  0 ceph version 14.2.2
> (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable), process
> ceph-mon, pid 368
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28
> 20:41:03.835 7f401283c180 -1 stat(/var/lib/ceph/mon/ceph-sun-gcs02-osd01)
> (13) Permission denied
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28
> 20:41:03.835 7f401283c180 -1 error accessing monitor data directory at
> '/var/lib/ceph/mon/ceph-sun-gcs02-osd01': (13) Permission denied
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: managing teardown
> after SIGCHLD
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: Waiting PID 368 to
> terminate
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: Process 368 is
> terminated
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: Bye Bye, container
> will die with return code -1
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: if you don't want
> me to die and have access to a shell to debug this situation, next time run
> me with '-e DEBUG=stayalive'
> Aug 28 20:41:04 sun-gcs02-osd01 systemd[1]:
> ceph-mon@sun-gcs02-osd01.service: Main process exited, code=exited,
> status=255/n/a
> Aug 28 20:41:04 sun-gcs02-osd01 systemd[1]:
> ceph-mon@sun-gcs02-osd01.service: Failed with result 'exit-code'.
>
> The directories for the monitor are owned by 167.167 and matches the
> UID.GID that the container reports.
>
> oot@sun-gcs02-osd01:~# ls -lhd /var/lib/ceph/
> drwxr-x--- 14 ceph ceph 4.0K Jul 30 22:15 /var/lib/ceph/
> root@sun-gcs02-osd01:~# ls -lh /var/lib/ceph/
> total 56K
> drwxr-xr-x   2 167 167 4.0K Jul 30 22:16 bootstrap-mds
> drwxr-xr-x   2 167 167 4.0K Jul 30 22:16 bootstrap-mgr
> drwxr-xr-x   2 167 167 4.0K Jul 30 22:16 bootstrap-osd
> drwxr-xr-x   2 167 167 4.0K Jul 30 22:16 bootstrap-rbd
> drwxr-xr-x   2 167 167 4.0K Jul 30 22:16 bootstrap-rbd-mirror
> drwxr-xr-x   2 167 167 4.0K Jul 30 22:16 bootstrap-rgw
> drwxr-xr-x   3 167 167 4.0K Jul 30 22:15 mds
> drwxr-xr-x   3 167 167 4.0K Jul 30 22:15 mgr
> drwxr-xr-x   3 167 167 4.0K Jul 30 22:15 mon
> drwxr-xr-x  14 167 167 4.0K Jul 30 22:28 osd
> drwxr-xr-x   4 167 167 4.0K Aug  1 23:36 radosgw
> drwxr-xr-x 254 167 167  12K Aug 28 20:44 tmp
> root@sun-gcs02-osd01:~# ls -lh /var/lib/ceph/mon/
> total 4.0K
> drwxr-xr-x 3 167 167 4.0K Jul 30 22:16 ceph-sun-gcs02-osd01
> root@sun-gcs02-osd01:~# ls -lh /var/lib/ceph/mon/ceph-sun-gcs02-osd01/
> total 16K
> -rw------- 1 167 167   77 Jul 30 22:15 keyring
> -rw-r--r-- 1 167 167    8 Jul 30 22:15 kv_backend
> -rw-r--r-- 1 167 167    3 Jul 30 22:16 min_mon_release
> drwxr-xr-x 2 167 167 4.0K Aug 28 19:16 store.db
> root@sun-gcs02-osd01:~# ls -lh
> /var/lib/ceph/mon/ceph-sun-gcs02-osd01/store.db/
> total 149M
> -rw-r--r-- 1 167 167 1.7M Aug 28 19:16 050225.log
> -rw-r--r-- 1 167 167  65M Aug 28 19:16 050227.sst
> -rw-r--r-- 1 167 167  45M Aug 28 19:16 050228.sst
> -rw-r--r-- 1 167 167   16 Aug 16 07:40 CURRENT
> -rw-r--r-- 1 167 167   37 Jul 30 22:15 IDENTITY
> -rw-r--r-- 1 167 167    0 Jul 30 22:15 LOCK
> -rw-r--r-- 1 167 167 1.3M Aug 28 19:16 MANIFEST-027846
> -rw-r--r-- 1 167 167 4.7K Aug  1 23:38 OPTIONS-002825
> -rw-r--r-- 1 167 167 4.7K Aug 16 07:40 OPTIONS-027849
>
> ----------------
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to