Re: [ceph-users] Failure to start ceph-mon in docker

2019-08-29 Thread Frank Schilder
Hi Robert,

this is a bit less trivial than it might look right now. The ceph user is 
usually created by installing the package ceph-common. By default it will use 
id 167. If the ceph user already exists, I would assume it will use the 
existing user to allow an operator to avoid UID collisions (if 167 is used 
already).

If you use docker, the ceph UID on the host and inside the container should 
match (or need to be translated). If they don't, you will have a lot of fun 
re-owning stuff all the time, because deployments will use the symbolic name 
ceph, which has different UIDs on the host and inside the container in your 
case.

I would recommend removing this discrepancy as soon as possible:

1) Find out why there was a ceph user with UID different from 167 before 
installation of ceph-common.
   Did you create it by hand? Was UID 167 allocated already?
2) If you can safely change the GID and UID of ceph to 167, just do 
groupmod+usermod with new GID and UID.
3) If 167 is used already by another service, you will have to map the UIDs 
between host and container.

To prevent ansible from deploying dockerized ceph with mismatching user ID for 
ceph, add these tasks to an appropriate part of your deployment (general host 
preparation or so):

- name: "Create group 'ceph'."
  group:
name: ceph
gid: 167
local: yes
state: present
system: yes

- name: "Create user 'ceph'."
  user:
name: ceph
password: "!"
comment: "ceph-container daemons"
uid: 167
group: ceph
shell: "/sbin/nologin"
home: "/var/lib/ceph"
create_home: no
local: yes
state: present
system: yes

This should err if a group and user ceph already exist with IDs different from 
167.

Best regards,

=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: ceph-users  on behalf of Robert 
LeBlanc 
Sent: 28 August 2019 23:23:06
To: ceph-users
Subject: Re: [ceph-users] Failure to start ceph-mon in docker

Turns out /var/lib/ceph was ceph.ceph and not 167.167, chowning it made things 
work. I guess only monitor needs that permission, rgw,mgr,osd are all happy 
without needing it to be 167.167.

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Wed, Aug 28, 2019 at 1:45 PM Robert LeBlanc 
mailto:rob...@leblancnet.us>> wrote:
We are trying to set up a new Nautilus cluster using ceph-ansible with 
containers. We got things deployed, but I couldn't run `ceph s` on the host so 
decided to `apt install ceph-common and installed the Luminous version from 
Ubuntu 18.04. For some reason the docker container that was running the monitor 
restarted and won't restart. I added the repo for Nautilus and upgraded 
ceph-common, but the problem persists. The Manager and OSD docker containers 
don't seem to be affected at all. I see this in the journal:

Aug 28 20:40:55 sun-gcs02-osd01 systemd[1]: Starting Ceph Monitor...
Aug 28 20:40:55 sun-gcs02-osd01 docker[2926]: Error: No such container: 
ceph-mon-sun-gcs02-osd01
Aug 28 20:40:55 sun-gcs02-osd01 systemd[1]: Started Ceph Monitor.
Aug 28 20:40:55 sun-gcs02-osd01 docker[2949]: WARNING: Your kernel does not 
support swap limit capabilities or the cgroup is not mounted. Memory limited 
without swap.
Aug 28 20:40:56 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:40:56  
/opt/ceph-container/bin/entrypoint.sh: Existing mon, trying to rejoin cluster...
Aug 28 20:40:56 sun-gcs02-osd01 docker[2949]: warning: line 41: 
'osd_memory_target' in section 'osd' redefined
Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:41:03  
/opt/ceph-container/bin/entrypoint.sh: /etc/ceph/ceph.conf is already memory 
tuned
Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:41:03  
/opt/ceph-container/bin/entrypoint.sh: SUCCESS
Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: exec: PID 368: spawning 
/usr/bin/ceph-mon --cluster ceph --default-log-to-file=false 
--default-mon-cluster-log-to-file=false --setuser ceph --setgroup ceph -d 
--mon-cluster-log-to-stderr --log-stderr-prefix=debug  -i sun-gcs02-osd01 
--mon-data /var/lib/ceph/mon/ceph-sun-gcs02-osd01 --public-addr 10.65.101.21
Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: exec: Waiting 368 to quit
Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: warning: line 41: 
'osd_memory_target' in section 'osd' redefined
Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28 20:41:03.835 
7f401283c180  0 set uid:gid to 167:167 (ceph:ceph)
Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28 20:41:03.835 
7f401283c180  0 ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) 
nautilus (stable), process ceph-mon, pid 368
Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28 20:41:03.835 
7f401283c180 -1 stat(/var/lib/ceph/mon/ceph-sun-gcs02-osd01) (13) Permission 
denied
Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: debug 2019-08-28 20:41:03.835 
7f401283c180 -1 error accessing monitor 

[ceph-users] RGW: Upgrade from mimic 13.2.6 -> nautilus 14.2.2 causes Bad Requests on some buckets

2019-08-29 Thread Jacek Suchenia
Hi

Recently, after few weeks of tests of Nautilus on our clusters we decided
to upgrade our oldest one (installed in 2012 as bobtail release). After
gateway upgrade we found that only for some buckets (40% from ~2000) the
same request is handled differently. With mimic RGW - OK (200), with
nautilus RGW - Bad request (400).
All the requests (GET, HEAD, PUT) for particular bucket are refused, for
other buckets are handled correctly. So currently we are running nautilus
cluster with mimic gateways to bypass the problem.

All the failing buckets uses wrong index in explicit_placement settings. We
are using two storage classes and those buckets contains the same
*data_pool* and *index_pool*: *.rgw.buckets* where we are using
*.rgw.bucekts.index. * Also *Data extra pool* field is always empty for
those buckets. I've checked and *.dir.* objects for this buckets
are stored on data pool, not index pool - so according to
*explicit_placement* settings

How can I move them with all metadata keys and update buckets settings?
Where information about *explicit_placement* is stored?
Additionally, is there a way to find out when bucket has been created?
*radosgw-admin
bucket stats* command do not provide it, but maybe it's available based on
a timestamp of some objects or their keys?

Jacek
-- 
Jacek Suchenia
jacek.suche...@gmail.com
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Multisite replication lag

2019-08-29 Thread Płaza Tomasz
Hi all,

I have a two ceph 13.2.6 clusters in multisite setup on HDD disks with ~466.0 M 
objects and rather low usage:  63 MiB/s rd, 1.5 MiB/s wr, 978 op/s rd, 308 op/s 
wr.
In each cluster there are two dedicated rgws for repliaction (setted as zone 
endpoints, other rgws have "rgw run sync thread = false")

Replication lag is about 15sek:
Master at Thu Aug 29 09:30:21 CEST 2019:
  metadata sync no sync (zone is master)
  data sync source: master_zone
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 35 shards
behind shards: 
[8,10,11,29,34,37,45,46,52,54,56,57,58,59,60,62,70,77,78,80,81,82,88,89,90,94,96,97,105,109,112,119,122,125,127]
oldest incremental change not applied: 2019-08-29 
09:30:03.0.537216s
70 shards are recovering
recovering shards: 
[0,1,2,5,6,7,9,10,11,12,13,14,16,18,20,21,22,23,25,28,29,34,35,36,37,39,43,48,49,52,53,54,55,56,57,58,59,60,61,63,65,67,68,69,70,72,75,76,83,84,86,91,92,97,99,101,104,105,109,110,111,112,115,116,117,119,120,121,122,126]

Slave at Thu Aug 29 09:30:22 CEST 2019:
  metadata sync syncing
full sync: 0/64 shards
incremental sync: 64/64 shards
metadata is caught up with master
  data sync source: slave_zone
syncing
full sync: 0/128 shards
incremental sync: 128/128 shards
data is behind on 24 shards
behind shards: 
[11,12,15,18,20,35,51,57,59,60,67,82,83,84,86,89,93,97,105,108,120,122,125,127]
oldest incremental change not applied: 2019-08-29 
09:30:11.0.755569s
64 shards are recovering
recovering shards: 
[0,1,2,3,6,7,8,9,10,11,13,14,15,16,20,21,22,23,25,27,28,29,35,36,37,38,39,43,46,48,49,52,56,59,60,61,62,63,65,67,68,69,70,76,79,83,84,85,88,90,91,97,100,104,105,109,110,111,113,117,118,120,122,123]


Is there anything to speed-up replication? Should I enable "rgw run sync 
thread" on all rgws not just zone endpoints?


Best Regards, Tom
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multisite replication lag

2019-08-29 Thread Wesley Peng

Hi

on 2019/8/29 15:50, Płaza Tomasz wrote:
Is there anything to speed-up replication? Should I enable "rgw run sync 
thread" on all rgws not just zone endpoints?




Did you check the network? A faster network connection should be helpful.

regards.
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Multisite replication lag

2019-08-29 Thread Płaza Tomasz
On 29.08.2019 16∶05 +0800, Wesley Peng wrote:

Hi


on 2019/8/29 15:50, Płaza Tomasz wrote:

Is there anything to speed-up replication? Should I enable "rgw run sync

thread" on all rgws not just zone endpoints?



Did you check the network? A faster network connection should be helpful.

Latency is ~4.8ms and bandwith is ~70Mb/s in total (sum in and out) between 
clusters. For me it looks like network is sufficient. Migrating bucket with 
1.5TB video files took about an hour, but 5GB bucket with files 1-2kB in size 
took about 3 hours.
On average new added files are between 4 to 16 kB in size.



regards.

___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Howto add DB (aka RockDB) device to existing OSD on HDD

2019-08-29 Thread 74cmonty

Hi,

I have created OSD on HDD w/o putting DB on faster drive.

In order to improve performance I have now a single SSD drive with 3.8TB.

I modified /etc/ceph/ceph.conf by adding this in [global]:
bluestore_block_db_size = 53687091200
This should create RockDB with size 50GB.

Then I tried to move DB to a new device (SSD) that is not formatted:
root@ld5505:~# ceph-bluestore-tool bluefs-bdev-new-db –-path 
/var/lib/ceph/osd/ceph-76 --dev-target /dev/sdbk

too many positional options have been specified on the command line

Checking the content of /var/lib/ceph/osd/ceph-76 it appears that 
there's no link to block.db:

root@ld5505:~# ls -l /var/lib/ceph/osd/ceph-76/
insgesamt 52
-rw-r--r-- 1 ceph ceph 418 Aug 27 11:08 activate.monmap
lrwxrwxrwx 1 ceph ceph 93 Aug 27 11:08 block -> 
/dev/ceph-8cd045dc-9eb2-47ad-9668-116cf425a66a/osd-block-9c51bde1-3c75-4767-8808-f7e7b58b8f97

-rw-r--r-- 1 ceph ceph 2 Aug 27 11:08 bluefs
-rw-r--r-- 1 ceph ceph 37 Aug 27 11:08 ceph_fsid
-rw-r--r-- 1 ceph ceph 37 Aug 27 11:08 fsid
-rw--- 1 ceph ceph 56 Aug 27 11:08 keyring
-rw-r--r-- 1 ceph ceph 8 Aug 27 11:08 kv_backend
-rw-r--r-- 1 ceph ceph 21 Aug 27 11:08 magic
-rw-r--r-- 1 ceph ceph 4 Aug 27 11:08 mkfs_done
-rw-r--r-- 1 ceph ceph 41 Aug 27 11:08 osd_key
-rw-r--r-- 1 ceph ceph 6 Aug 27 11:08 ready
-rw-r--r-- 1 ceph ceph 3 Aug 27 11:08 require_osd_release
-rw-r--r-- 1 ceph ceph 10 Aug 27 11:08 type
-rw-r--r-- 1 ceph ceph 3 Aug 27 11:08 whoami

root@ld5505:~# more /var/lib/ceph/osd/ceph-76/bluefs
1

Questions:
How can I add DB device for every single existing OSD to this new SSD drive?
How can I increase the DB size later in case it's insufficient?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Howto add DB (aka RockDB) device to existing OSD on HDD

2019-08-29 Thread Eugen Block

Hi,


Then I tried to move DB to a new device (SSD) that is not formatted:
root@ld5505:~# ceph-bluestore-tool bluefs-bdev-new-db –-path  
/var/lib/ceph/osd/ceph-76 --dev-target /dev/sdbk

too many positional options have been specified on the command line


I think you're trying the wrong option. 'man bluefs-bdev-new-db' says:

   bluefs-bdev-new-db --path osd path --dev-target new-device
  Adds DB device to BlueFS, fails if DB device already exists.

If you want to move an existing DB you should use bluefs-bdev-migrate  
instead. I haven't tried it yet, though.



How can I increase the DB size later in case it's insufficient?


There's also a bluefs-bdev-expand command to resize the db if the  
underlying device has more space available. It depends on your ceph  
version, of course. This was not possible in Luminous, I'm not sure  
about Mimic but it works in Nautilus.


Regards,
Eugen


Zitat von 74cmo...@gmail.com:


Hi,

I have created OSD on HDD w/o putting DB on faster drive.

In order to improve performance I have now a single SSD drive with 3.8TB.

I modified /etc/ceph/ceph.conf by adding this in [global]:
bluestore_block_db_size = 53687091200
This should create RockDB with size 50GB.

Then I tried to move DB to a new device (SSD) that is not formatted:
root@ld5505:~# ceph-bluestore-tool bluefs-bdev-new-db –-path  
/var/lib/ceph/osd/ceph-76 --dev-target /dev/sdbk

too many positional options have been specified on the command line

Checking the content of /var/lib/ceph/osd/ceph-76 it appears that  
there's no link to block.db:

root@ld5505:~# ls -l /var/lib/ceph/osd/ceph-76/
insgesamt 52
-rw-r--r-- 1 ceph ceph 418 Aug 27 11:08 activate.monmap
lrwxrwxrwx 1 ceph ceph 93 Aug 27 11:08 block ->  
/dev/ceph-8cd045dc-9eb2-47ad-9668-116cf425a66a/osd-block-9c51bde1-3c75-4767-8808-f7e7b58b8f97

-rw-r--r-- 1 ceph ceph 2 Aug 27 11:08 bluefs
-rw-r--r-- 1 ceph ceph 37 Aug 27 11:08 ceph_fsid
-rw-r--r-- 1 ceph ceph 37 Aug 27 11:08 fsid
-rw--- 1 ceph ceph 56 Aug 27 11:08 keyring
-rw-r--r-- 1 ceph ceph 8 Aug 27 11:08 kv_backend
-rw-r--r-- 1 ceph ceph 21 Aug 27 11:08 magic
-rw-r--r-- 1 ceph ceph 4 Aug 27 11:08 mkfs_done
-rw-r--r-- 1 ceph ceph 41 Aug 27 11:08 osd_key
-rw-r--r-- 1 ceph ceph 6 Aug 27 11:08 ready
-rw-r--r-- 1 ceph ceph 3 Aug 27 11:08 require_osd_release
-rw-r--r-- 1 ceph ceph 10 Aug 27 11:08 type
-rw-r--r-- 1 ceph ceph 3 Aug 27 11:08 whoami

root@ld5505:~# more /var/lib/ceph/osd/ceph-76/bluefs
1

Questions:
How can I add DB device for every single existing OSD to this new SSD drive?
How can I increase the DB size later in case it's insufficient?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Howto add DB (aka RockDB) device to existing OSD on HDD

2019-08-29 Thread Eugen Block
Sorry, I misread, your option is correct, of course since there was no  
external db device.

This worked for me:

ceph-2:~ # CEPH_ARGS="--bluestore-block-db-size 1048576"  
ceph-bluestore-tool --path /var/lib/ceph/osd/ceph-1 bluefs-bdev-new-db  
--dev-target /dev/sdb

inferring bluefs devices from bluestore path
DB device added /dev/sdb

ceph-2:~ # ll /var/lib/ceph/osd/ceph-1/block*
lrwxrwxrwx 1 ceph ceph 93 31. Jul 15:04 /var/lib/ceph/osd/ceph-1/block  
->  
/dev/ceph-d1f349d6-70ba-40d3-a510-3e5afb585782/osd-block-7523a676-a9de-4ed9-890c-197c6cd2d6d1
lrwxrwxrwx 1 root root  8 29. Aug 12:14  
/var/lib/ceph/osd/ceph-1/block.db -> /dev/sdb


Regards,
Eugen


Zitat von 74cmo...@gmail.com:


Hi,

I have created OSD on HDD w/o putting DB on faster drive.

In order to improve performance I have now a single SSD drive with 3.8TB.

I modified /etc/ceph/ceph.conf by adding this in [global]:
bluestore_block_db_size = 53687091200
This should create RockDB with size 50GB.

Then I tried to move DB to a new device (SSD) that is not formatted:
root@ld5505:~# ceph-bluestore-tool bluefs-bdev-new-db –-path  
/var/lib/ceph/osd/ceph-76 --dev-target /dev/sdbk

too many positional options have been specified on the command line

Checking the content of /var/lib/ceph/osd/ceph-76 it appears that  
there's no link to block.db:

root@ld5505:~# ls -l /var/lib/ceph/osd/ceph-76/
insgesamt 52
-rw-r--r-- 1 ceph ceph 418 Aug 27 11:08 activate.monmap
lrwxrwxrwx 1 ceph ceph 93 Aug 27 11:08 block ->  
/dev/ceph-8cd045dc-9eb2-47ad-9668-116cf425a66a/osd-block-9c51bde1-3c75-4767-8808-f7e7b58b8f97

-rw-r--r-- 1 ceph ceph 2 Aug 27 11:08 bluefs
-rw-r--r-- 1 ceph ceph 37 Aug 27 11:08 ceph_fsid
-rw-r--r-- 1 ceph ceph 37 Aug 27 11:08 fsid
-rw--- 1 ceph ceph 56 Aug 27 11:08 keyring
-rw-r--r-- 1 ceph ceph 8 Aug 27 11:08 kv_backend
-rw-r--r-- 1 ceph ceph 21 Aug 27 11:08 magic
-rw-r--r-- 1 ceph ceph 4 Aug 27 11:08 mkfs_done
-rw-r--r-- 1 ceph ceph 41 Aug 27 11:08 osd_key
-rw-r--r-- 1 ceph ceph 6 Aug 27 11:08 ready
-rw-r--r-- 1 ceph ceph 3 Aug 27 11:08 require_osd_release
-rw-r--r-- 1 ceph ceph 10 Aug 27 11:08 type
-rw-r--r-- 1 ceph ceph 3 Aug 27 11:08 whoami

root@ld5505:~# more /var/lib/ceph/osd/ceph-76/bluefs
1

Questions:
How can I add DB device for every single existing OSD to this new SSD drive?
How can I increase the DB size later in case it's insufficient?

THX
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] FileStore OSD, journal direct symlinked, permission troubles.

2019-08-29 Thread Marco Gaiarin


I've just finished a double upgrade on my ceph (PVE-based) from hammer
to jewel and from jewel to luminous.

All went well, apart that... OSD does not restart automatically,
because permission troubles on the journal:

 Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: starting osd.2 at - osd_data 
/var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal
 Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.449886 
7fa505a43e00 -1 filestore(/var/lib/ceph/osd/ceph-2) mount(1822): failed to open 
journal /var/lib/ceph/osd/ceph-2/journal: (13) Permission denied
 Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.453524 
7fa505a43e00 -1 osd.2 0 OSD:init: unable to mount object store
 Aug 28 14:41:55 capitanmarvel ceph-osd[6645]: 2019-08-28 14:41:55.453535 
7fa505a43e00 -1 #033[0;31m ** ERROR: osd init failed: (13) Permission 
denied#033[0m


A little fast rewind: when i've setup the cluster i've used some 'old'
servers, using a couple of SSD disks as SO and as journal.
Because servers was old, i was forced to partition the boot disk in
DOS, not GPT mode.

While creating the OSD, i've received some warnings:

WARNING:ceph-disk:Journal /dev/sdaX was not prepared with ceph-disk. 
Symlinking directly.


Looking at the cluster now, seems to me that osd init scripts try to
idetify journal based on GPT partition label/info, and clearly fail.


Not that if i do, on servers that hold OSD:

for l in $(readlink -f /var/lib/ceph/osd/ceph-*/journal); do chown 
ceph: $l; done

OSD start flawlessy.


There's something i can do? Thanks.

-- 
dott. Marco Gaiarin GNUPG Key ID: 240A3D66
  Associazione ``La Nostra Famiglia''  http://www.lanostrafamiglia.it/
  Polo FVG   -   Via della Bontà, 7 - 33078   -   San Vito al Tagliamento (PN)
  marco.gaiarin(at)lanostrafamiglia.it   t +39-0434-842711   f +39-0434-842797

Dona il 5 PER MILLE a LA NOSTRA FAMIGLIA!
  http://www.lanostrafamiglia.it/index.php/it/sostienici/5x1000
(cf 00307430132, categoria ONLUS oppure RICERCA SANITARIA)
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] Help understanding EC object reads

2019-08-29 Thread Thomas Byrne - UKRI STFC
Hi all,

I'm investigating an issue with our (non-Ceph) caching layers of our large EC 
cluster. It seems to be turning users requests for whole objects into lots of 
small byte range requests reaching the OSDs, but I'm not sure how inefficient 
this behaviour is in reality.

My limited understanding of an EC object partial read is that the entire object 
is reconstructed on the primary OSD, and then the requested byte range is sent 
to the client before the primary discards the reconstructed object.

Assuming this is correct, do multiple reads for different byte ranges of the 
same object at effectively the same time result in the entire object being 
reconstructed once for each request, or does the primary do something clever 
and use the same reconstructed object for multiple requests before discarding 
it?

If I'm completely off the mark with what is going on under the hood here, a 
nudge in the right direction would be appreciated!

Cheers,
Tom
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


[ceph-users] OSD Down After Reboot

2019-08-29 Thread Thomas Sumpter
Hi Folks,

I have found similar reports of this problem in the past but can't seem to find 
any solution to it.
We have ceph filesystem running mimic version 13.2.5.
OSDs are running on AWS EC2 instances with centos 7. OSD disk is an AWS nvme 
device.

Problem I,  sometimes when rebooting an OSD instance, the OSD volume fails to 
mount and the OSD cannot start.

ceph-volume.log repeats the following
[2019-08-28 09:10:42,061][ceph_volume.main][INFO  ] Running command: 
ceph-volume  lvm trigger 0-fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:42,063][ceph_volume.process][INFO  ] Running command: 
/usr/sbin/lvs --noheadings --readonly --separator=";" -o 
lv_tags,lv_path,lv_name,vg_name,lv_uuid,lv_size
[2019-08-28 09:10:42,074][ceph_volume][ERROR ] exception caught by decorator
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 59, 
in newfunc
   return f(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/main.py", line 148, in main
terminal.dispatch(self.mapper, subcommand_args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in 
dispatch
instance.main()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/main.py", line 
40, in main
terminal.dispatch(self.mapper, self.argv)
  File "/usr/lib/python2.7/site-packages/ceph_volume/terminal.py", line 182, in 
dispatch
instance.main()
 File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, in 
is_root
return func(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/trigger.py", 
line 70, in main
Activate(['--auto-detect-objectstore', osd_id, osd_uuid]).main()
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/activate.py", 
line 339, in main
self.activate(args)
  File "/usr/lib/python2.7/site-packages/ceph_volume/decorators.py", line 16, 
in is_root
return func(*a, **kw)
  File "/usr/lib/python2.7/site-packages/ceph_volume/devices/lvm/activate.py", 
line 249, in activate
raise RuntimeError('could not find osd.%s with fsid %s' % (osd_id, 
osd_fsid))
RuntimeError: could not find osd.0 with fsid 
fcaffe93-4c03-403c-9702-7f1ec694a578

ceph-volume-systemd.log repeats
[2019-08-28 09:10:41,877][systemd][INFO  ] raw systemd input received: 
lvm-0-fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:41,877][systemd][INFO  ] parsed sub-command: lvm, extra data: 
0-fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:41,926][ceph_volume.process][INFO  ] Running command: 
/usr/sbin/ceph-volume lvm trigger 0-fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:42,077][ceph_volume.process][INFO  ] stderr -->  
RuntimeError: could not find osd.0 with fsid 
fcaffe93-4c03-403c-9702-7f1ec694a578
[2019-08-28 09:10:42,084][systemd][WARNING] command returned non-zero exit 
status: 1
[2019-08-28 09:10:42,084][systemd][WARNING] failed activating OSD, retries 
left: 30

To recover I destroy the OSD, zap the disk and create it again.
# ceph osd destroy 0 --yes-i-really-mean-it
# ceph-volume lvm zap /dev/nvme1n1 -destroy
# ceph-volume lvm create --osd-id 0 --data /dev/nvme1n1
# systemctl start ceph-osd@0

Is there something I need to do so that the OSD can boot without these problems?

Thank you!
Tom


ceph-volume.log
Description: ceph-volume.log


ceph-volume-systemd.log
Description: ceph-volume-systemd.log
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


Re: [ceph-users] Failure to start ceph-mon in docker

2019-08-29 Thread Robert LeBlanc
Frank,

Thank you for the explanation, these are freshly installed machines and did
not have ceph on them. I checked one of the other OSD nodes and there is no
ceph user in /etc/passwd, nor is UID 167 allocated to any user. I did
install ceph-common from the 18.04 repos before realizing that deploying
ceph in containers did not update the host's /etc/apt/sources.list (or add
an entry in /etc/apt/sources.list.d/). I manually added the repo for
nautilus and upgraded the packages. So, I don't know if that had anything
to do with it. Maybe Ubuntu packages ceph under UID 64045 and upgrading to
the Ceph distributed packages didn't change the UID.

Thanks,
Robert LeBlanc

Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1


On Thu, Aug 29, 2019 at 12:33 AM Frank Schilder  wrote:

> Hi Robert,
>
> this is a bit less trivial than it might look right now. The ceph user is
> usually created by installing the package ceph-common. By default it will
> use id 167. If the ceph user already exists, I would assume it will use the
> existing user to allow an operator to avoid UID collisions (if 167 is used
> already).
>
> If you use docker, the ceph UID on the host and inside the container
> should match (or need to be translated). If they don't, you will have a lot
> of fun re-owning stuff all the time, because deployments will use the
> symbolic name ceph, which has different UIDs on the host and inside the
> container in your case.
>
> I would recommend removing this discrepancy as soon as possible:
>
> 1) Find out why there was a ceph user with UID different from 167 before
> installation of ceph-common.
>Did you create it by hand? Was UID 167 allocated already?
> 2) If you can safely change the GID and UID of ceph to 167, just do
> groupmod+usermod with new GID and UID.
> 3) If 167 is used already by another service, you will have to map the
> UIDs between host and container.
>
> To prevent ansible from deploying dockerized ceph with mismatching user ID
> for ceph, add these tasks to an appropriate part of your deployment
> (general host preparation or so):
>
> - name: "Create group 'ceph'."
>   group:
> name: ceph
> gid: 167
> local: yes
> state: present
> system: yes
>
> - name: "Create user 'ceph'."
>   user:
> name: ceph
> password: "!"
> comment: "ceph-container daemons"
> uid: 167
> group: ceph
> shell: "/sbin/nologin"
> home: "/var/lib/ceph"
> create_home: no
> local: yes
> state: present
> system: yes
>
> This should err if a group and user ceph already exist with IDs different
> from 167.
>
> Best regards,
>
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: ceph-users  on behalf of Robert
> LeBlanc 
> Sent: 28 August 2019 23:23:06
> To: ceph-users
> Subject: Re: [ceph-users] Failure to start ceph-mon in docker
>
> Turns out /var/lib/ceph was ceph.ceph and not 167.167, chowning it made
> things work. I guess only monitor needs that permission, rgw,mgr,osd are
> all happy without needing it to be 167.167.
> 
> Robert LeBlanc
> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
>
>
> On Wed, Aug 28, 2019 at 1:45 PM Robert LeBlanc  > wrote:
> We are trying to set up a new Nautilus cluster using ceph-ansible with
> containers. We got things deployed, but I couldn't run `ceph s` on the host
> so decided to `apt install ceph-common and installed the Luminous version
> from Ubuntu 18.04. For some reason the docker container that was running
> the monitor restarted and won't restart. I added the repo for Nautilus and
> upgraded ceph-common, but the problem persists. The Manager and OSD docker
> containers don't seem to be affected at all. I see this in the journal:
>
> Aug 28 20:40:55 sun-gcs02-osd01 systemd[1]: Starting Ceph Monitor...
> Aug 28 20:40:55 sun-gcs02-osd01 docker[2926]: Error: No such container:
> ceph-mon-sun-gcs02-osd01
> Aug 28 20:40:55 sun-gcs02-osd01 systemd[1]: Started Ceph Monitor.
> Aug 28 20:40:55 sun-gcs02-osd01 docker[2949]: WARNING: Your kernel does
> not support swap limit capabilities or the cgroup is not mounted. Memory
> limited without swap.
> Aug 28 20:40:56 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:40:56
> /opt/ceph-container/bin/entrypoint.sh: Existing mon, trying to rejoin
> cluster...
> Aug 28 20:40:56 sun-gcs02-osd01 docker[2949]: warning: line 41:
> 'osd_memory_target' in section 'osd' redefined
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:41:03
> /opt/ceph-container/bin/entrypoint.sh: /etc/ceph/ceph.conf is already
> memory tuned
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: 2019-08-28 20:41:03
> /opt/ceph-container/bin/entrypoint.sh: SUCCESS
> Aug 28 20:41:03 sun-gcs02-osd01 docker[2949]: exec: PID 368: spawning
> /usr/bin/ceph-mon --cluster ceph --default-log-to-file=false
> --default-mon-cluster-log-to-fi

Re: [ceph-users] iostat and dashboard freezing

2019-08-29 Thread Reed Dier
See responses below.

> On Aug 28, 2019, at 11:13 PM, Konstantin Shalygin  wrote:
>> Just a follow up 24h later, and the mgr's seem to be far more stable, and 
>> have had no issues or weirdness after disabling the balancer module.
>> 
>> Which isn't great, because the balancer plays an important role, but after 
>> fighting distribution for a few weeks and getting it 'good enough' I'm 
>> taking the stability.
>> 
>> Just wanted to follow up with another 2¢.
> What is your balancer settings (`ceph config-key ls`)? Your mgr running in 
> virtual environment or on bare metal?

bare metal
>> $ ceph config-key ls | grep balance
>> "config/mgr/mgr/balancer/active",
>> "config/mgr/mgr/balancer/max_misplaced",
>> "config/mgr/mgr/balancer/mode",
>> "config/mgr/mgr/balancer/pool_ids",
>> "mgr/balancer/active",
>> "mgr/balancer/max_misplaced",
>> "mgr/balancer/mode",


> How much pools you have? Please also paste `ceph osd tree` & `ceph osd df 
> tree`. 

$ ceph osd pool ls detail
>> pool 16 replicated crush_rule 1 object_hash rjenkins pg_num 4
>> autoscale_mode warn last_change 157895 lfor 0/157895/157893 flags 
>> hashpspool,nodelete stripe_width 0 application cephfs
>> pool 17 replicated crush_rule 0 object_hash rjenkins pg_num 1024 
>> autoscale_mode warn last_change 174817 flags hashpspool,nodelete 
>> stripe_width 0 compression_algorithm snappy compression_mode aggressive 
>> application cephfs
>> pool 20 replicated crush_rule 2 object_hash rjenkins pg_num 4096 
>> autoscale_mode warn last_change 174817 flags hashpspool,nodelete 
>> stripe_width 0 application freeform
>> pool 24 replicated crush_rule 0 object_hash rjenkins pg_num 16   
>> autoscale_mode warn last_change 174817 lfor 0/157704/157702 flags hashpspool 
>> stripe_width 0 compression_algorithm snappy compression_mode none 
>> application freeform
>> pool 29 replicated crush_rule 2 object_hash rjenkins pg_num 128  
>> autoscale_mode warn last_change 174817 lfor 0/0/142604 flags 
>> hashpspool,selfmanaged_snaps stripe_width 0 application rbd
>> pool 30 replicated crush_rule 0 object_hash rjenkins pg_num 1
>> autoscale_mode warn last_change 174817 flags hashpspool stripe_width 0 
>> pg_num_min 1 application mgr_devicehealth
>> pool 31 replicated crush_rule 2 object_hash rjenkins pg_num 16   
>> autoscale_mode warn last_change 174926 flags hashpspool,selfmanaged_snaps 
>> stripe_width 0 application rbd

https://pastebin.com/bXPs28h1 Measure time of 
balancer plan creation: `time ceph balancer optimize new`.
> 
I hadn't seen this optimize command yet, I was always doing balancer eval 
$plan, balancer execute $plan.
>> $ time ceph balancer optimize newplan1
>> Error EALREADY: Unable to find further optimization, or pool(s)' pg_num is 
>> decreasing, or distribution is already perfect
>> 
>> real3m10.627s
>> user0m0.352s
>> sys 0m0.055s

Reed



smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com