Frank, Thank you for the explanation, these are freshly installed machines and did not have ceph on them. I checked one of the other OSD nodes and there is no ceph user in /etc/passwd, nor is UID 167 allocated to any user. I did install ceph-common from the 18.04 repos before realizing that deploying ceph in containers did not update the host's /etc/apt/sources.list (or add an entry in /etc/apt/sources.list.d/). I manually added the repo for nautilus and upgraded the packages. So, I don't know if that had anything to do with it. Maybe Ubuntu packages ceph under UID 64045 and upgrading to the Ceph distributed packages didn't change the UID.
Thanks, Robert LeBlanc ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Aug 29, 2019 at 12:33 AM Frank Schilder <fr...@dtu.dk> wrote: > Hi Robert, > > this is a bit less trivial than it might look right now. The ceph user is > usually created by installing the package ceph-common. By default it will > use id 167. If the ceph user already exists, I would assume it will use the > existing user to allow an operator to avoid UID collisions (if 167 is used > already). > > If you use docker, the ceph UID on the host and inside the container > should match (or need to be translated). If they don't, you will have a lot > of fun re-owning stuff all the time, because deployments will use the > symbolic name ceph, which has different UIDs on the host and inside the > container in your case. > > I would recommend removing this discrepancy as soon as possible: > > 1) Find out why there was a ceph user with UID different from 167 before > installation of ceph-common. > Did you create it by hand? Was UID 167 allocated already? > 2) If you can safely change the GID and UID of ceph to 167, just do > groupmod+usermod with new GID and UID. > 3) If 167 is used already by another service, you will have to map the > UIDs between host and container. > > To prevent ansible from deploying dockerized ceph with mismatching user ID > for ceph, add these tasks to an appropriate part of your deployment > (general host preparation or so): > > - name: "Create group 'ceph'." > group: > name: ceph > gid: 167 > local: yes > state: present > system: yes > > - name: "Create user 'ceph'." > user: > name: ceph > password: "!" > comment: "ceph-container daemons" > uid: 167 > group: ceph > shell: "/sbin/nologin" > home: "/var/lib/ceph" > create_home: no > local: yes > state: present > system: yes > > This should err if a group and user ceph already exist with IDs different > from 167. > > Best regards, > > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: ceph-users <ceph-users-boun...@lists.ceph.com> on behalf of Robert > LeBlanc <rob...@leblancnet.us> > Sent: 28 August 2019 23:23:06 > To: ceph-users > Subject: Re: [ceph-users] Failure to start ceph-mon in docker > > Turns out /var/lib/ceph was ceph.ceph and not 167.167, chowning it made > things work. I guess only monitor needs that permission, rgw,mgr,osd are > all happy without needing it to be 167.167. > ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Wed, Aug 28, 2019 at 1:45 PM Robert LeBlanc <rob...@leblancnet.us > <mailto:rob...@leblancnet.us>> wrote: > We are trying to set up a new Nautilus cluster using ceph-ansible with > containers. We got things deployed, but I couldn't run `ceph s` on the host > so decided to `apt install ceph-common and installed the Luminous version > from Ubuntu 18.04. For some reason the docker container that was running > the monitor restarted and won't restart. I added the repo for Nautilus and > upgraded ceph-common, but the problem persists. The Manager and OSD docker > containers don't seem to be affected at all. I see this in the journal: > > Aug 28 20:40:55 sun-gcs02-osd01 systemd[1]: Starting Ceph Monitor... > Aug 28 20:40:55 sun-gcs02-osd01 docker[2926]: Error: No such container: > ceph-mon-sun-gcs02-osd01 > Aug 28 20:40:55 sun-gcs02-osd01 systemd[1]: Started Ceph Monitor. > Aug 28 20:40:55 sun-gcs02-osd01 docker[2949]: WARNING: Your kernel does > not support swap limit capabilities or the cgroup is not mounted. Memory > limited without swap. > Aug 28 20:40:56 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:40:56 > /opt/ceph-container/bin/entrypoint.sh: Existing mon, trying to rejoin > cluster... > Aug 28 20:40:56 sun-gcs02-osd01 docker[2949]: warning: line 41: > 'osd_memory_target' in section 'osd' redefined > Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:41:03 > /opt/ceph-container/bin/entrypoint.sh: /etc/ceph/ceph.conf is already > memory tuned > Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:41:03 > /opt/ceph-container/bin/entrypoint.sh: SUCCESS > Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: exec: PID 368: spawning > /usr/bin/ceph-mon --cluster ceph --default-log-to-file=false > --default-mon-cluster-log-to-file=false --setuser ceph --setgroup ceph -d > --mon-cluster-log-to-stderr --log-stderr-prefix=debug -i sun-gcs02-osd01 > --mon-data /var/lib/ceph/mon/ceph-sun-gcs02-osd01 --public-addr 10.65.101.21 > Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: exec: Waiting 368 to quit > Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: warning: line 41: > 'osd_memory_target' in section 'osd' redefined > Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28 > 20:41:03.835 7f401283c180 0 set uid:gid to 167:167 (ceph:ceph) > Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28 > 20:41:03.835 7f401283c180 0 ceph version 14.2.2 > (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable), process > ceph-mon, pid 368 > Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28 > 20:41:03.835 7f401283c180 -1 stat(/var/lib/ceph/mon/ceph-sun-gcs02-osd01) > (13) Permission denied > Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28 > 20:41:03.835 7f401283c180 -1 error accessing monitor data directory at > '/var/lib/ceph/mon/ceph-sun-gcs02-osd01': (13) Permission denied > Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: managing teardown > after SIGCHLD > Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: Waiting PID 368 to > terminate > Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: Process 368 is > terminated > Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: Bye Bye, container > will die with return code -1 > Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: teardown: if you don't want > me to die and have access to a shell to debug this situation, next time run > me with '-e DEBUG=stayalive' > Aug 28 20:41:04 sun-gcs02-osd01 systemd[1]: > ceph-mon@sun-gcs02-osd01.service: Main process exited, code=exited, > status=255/n/a > Aug 28 20:41:04 sun-gcs02-osd01 systemd[1]: > ceph-mon@sun-gcs02-osd01.service: Failed with result 'exit-code'. > > The directories for the monitor are owned by 167.167 and matches the > UID.GID that the container reports. > > oot@sun-gcs02-osd01:~# ls -lhd /var/lib/ceph/ > drwxr-x--- 14 ceph ceph 4.0K Jul 30 22:15 /var/lib/ceph/ > root@sun-gcs02-osd01:~# ls -lh /var/lib/ceph/ > total 56K > drwxr-xr-x 2 167 167 4.0K Jul 30 22:16 bootstrap-mds > drwxr-xr-x 2 167 167 4.0K Jul 30 22:16 bootstrap-mgr > drwxr-xr-x 2 167 167 4.0K Jul 30 22:16 bootstrap-osd > drwxr-xr-x 2 167 167 4.0K Jul 30 22:16 bootstrap-rbd > drwxr-xr-x 2 167 167 4.0K Jul 30 22:16 bootstrap-rbd-mirror > drwxr-xr-x 2 167 167 4.0K Jul 30 22:16 bootstrap-rgw > drwxr-xr-x 3 167 167 4.0K Jul 30 22:15 mds > drwxr-xr-x 3 167 167 4.0K Jul 30 22:15 mgr > drwxr-xr-x 3 167 167 4.0K Jul 30 22:15 mon > drwxr-xr-x 14 167 167 4.0K Jul 30 22:28 osd > drwxr-xr-x 4 167 167 4.0K Aug 1 23:36 radosgw > drwxr-xr-x 254 167 167 12K Aug 28 20:44 tmp > root@sun-gcs02-osd01:~# ls -lh /var/lib/ceph/mon/ > total 4.0K > drwxr-xr-x 3 167 167 4.0K Jul 30 22:16 ceph-sun-gcs02-osd01 > root@sun-gcs02-osd01:~# ls -lh /var/lib/ceph/mon/ceph-sun-gcs02-osd01/ > total 16K > -rw------- 1 167 167 77 Jul 30 22:15 keyring > -rw-r--r-- 1 167 167 8 Jul 30 22:15 kv_backend > -rw-r--r-- 1 167 167 3 Jul 30 22:16 min_mon_release > drwxr-xr-x 2 167 167 4.0K Aug 28 19:16 store.db > root@sun-gcs02-osd01:~# ls -lh > /var/lib/ceph/mon/ceph-sun-gcs02-osd01/store.db/ > total 149M > -rw-r--r-- 1 167 167 1.7M Aug 28 19:16 050225.log > -rw-r--r-- 1 167 167 65M Aug 28 19:16 050227.sst > -rw-r--r-- 1 167 167 45M Aug 28 19:16 050228.sst > -rw-r--r-- 1 167 167 16 Aug 16 07:40 CURRENT > -rw-r--r-- 1 167 167 37 Jul 30 22:15 IDENTITY > -rw-r--r-- 1 167 167 0 Jul 30 22:15 LOCK > -rw-r--r-- 1 167 167 1.3M Aug 28 19:16 MANIFEST-027846 > -rw-r--r-- 1 167 167 4.7K Aug 1 23:38 OPTIONS-002825 > -rw-r--r-- 1 167 167 4.7K Aug 16 07:40 OPTIONS-027849 > > ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com