[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-04 Thread Zhenshi Zhou
Hi Eugen,

Thanks for the reply. If rbd-mirror constantly synchronize changes,
what frequency to replay once? I don't find any options I can config.

Eugen Block  于2020年6月4日周四 下午2:54写道:

> Hi,
>
> that's the point of rbd-mirror, to constantly replay changes from the
> primary image to the remote image (if the rbd journal feature is
> enabled).
>
>
> Zitat von Zhenshi Zhou :
>
> > Hi all,
> >
> > I'm gonna deploy a rbd-mirror in order to sync image from clusterA to
> > clusterB.
> > The image will be used while syncing. I'm not sure if the rbd-mirror will
> > sync image
> > continuously or not. If not, I will inform clients not to write data in
> it.
> >
> > Thanks. Regards
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-04 Thread Eugen Block
The initial sync is a full image sync, the rest is based on the object  
sets created. There are several options to control the mirroring, for  
example:


rbd_journal_max_concurrent_object_sets
rbd_mirror_concurrent_image_syncs
rbd_mirror_leader_max_missed_heartbeats

and many more. I'm not sure I fully understand what you're asking,  
maybe you could rephrase your question?



Zitat von Zhenshi Zhou :


Hi Eugen,

Thanks for the reply. If rbd-mirror constantly synchronize changes,
what frequency to replay once? I don't find any options I can config.

Eugen Block  于2020年6月4日周四 下午2:54写道:


Hi,

that's the point of rbd-mirror, to constantly replay changes from the
primary image to the remote image (if the rbd journal feature is
enabled).


Zitat von Zhenshi Zhou :

> Hi all,
>
> I'm gonna deploy a rbd-mirror in order to sync image from clusterA to
> clusterB.
> The image will be used while syncing. I'm not sure if the rbd-mirror will
> sync image
> continuously or not. If not, I will inform clients not to write data in
it.
>
> Thanks. Regards
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-04 Thread Zhenshi Zhou
My condition is that the primary image being used while rbd-mirror sync.
I want to get the period between the two times of rbd-mirror transfer the
increased data.
I will search those options you provided, thanks a lot :)

Eugen Block  于2020年6月4日周四 下午3:28写道:

> The initial sync is a full image sync, the rest is based on the object
> sets created. There are several options to control the mirroring, for
> example:
>
> rbd_journal_max_concurrent_object_sets
> rbd_mirror_concurrent_image_syncs
> rbd_mirror_leader_max_missed_heartbeats
>
> and many more. I'm not sure I fully understand what you're asking,
> maybe you could rephrase your question?
>
>
> Zitat von Zhenshi Zhou :
>
> > Hi Eugen,
> >
> > Thanks for the reply. If rbd-mirror constantly synchronize changes,
> > what frequency to replay once? I don't find any options I can config.
> >
> > Eugen Block  于2020年6月4日周四 下午2:54写道:
> >
> >> Hi,
> >>
> >> that's the point of rbd-mirror, to constantly replay changes from the
> >> primary image to the remote image (if the rbd journal feature is
> >> enabled).
> >>
> >>
> >> Zitat von Zhenshi Zhou :
> >>
> >> > Hi all,
> >> >
> >> > I'm gonna deploy a rbd-mirror in order to sync image from clusterA to
> >> > clusterB.
> >> > The image will be used while syncing. I'm not sure if the rbd-mirror
> will
> >> > sync image
> >> > continuously or not. If not, I will inform clients not to write data
> in
> >> it.
> >> >
> >> > Thanks. Regards
> >> > ___
> >> > ceph-users mailing list -- ceph-users@ceph.io
> >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >>
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
>
>
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 15.2.3 Crush Map Viewer problem.

2020-06-04 Thread Lenz Grimmer
Hi Marco,

thank you. It seems as if the REST API output matches the output of
"ceph osd tree", but the tree view in the dashboard somehow fails to
display all nodes. We will investigate this.

I've now submitted your report as a bug on our tracker:

https://tracker.ceph.com/issues/45873

Please make sure to follow this issue in case we need additional
information. Thank you!

Lenz

On 6/2/20 5:03 PM, Marco Pizzolo wrote:

> Hopefully this is the part that was required:
> 
> "tree": {
>       "nodes": [
>         {
>           "id": -1,
>           "name": "default",
>           "type": "root",
>           "type_id": 11,
>           "children": [
>             -9,
>             -7,
>             -5,
>             -3
>           ]
>         },
>         {
>           "id": -3,
>           "name": "prdhcistonode01",
>           "type": "host",
>           "type_id": 1,
>           "pool_weights": {},
>           "children": [
>             15,
>             14,
>             13,
>             12,
>             11,
>             10,
>             9,
>             8,
>             7,
>             6,
>             5,
>             4,
>             3,
>             2,
>             1,
>             0
>           ]
>         },
>         {
>           "id": 0,
>           "device_class": "ssd",
>           "type": "osd",
>           "type_id": 0,
>           "crush_weight": 5.8218994140625,
>           "depth": 2,
>           "pool_weights": {},
>           "exists": 1,
>           "status": "up",
>           "reweight": 1,
>           "primary_affinity": 1,
>           "name": "osd.0"
>         },
>         {
>           "id": 1,
>           "device_class": "ssd",
>           "type": "osd",
>           "type_id": 0,
>           "crush_weight": 5.8218994140625,
>           "depth": 2,
>           "pool_weights": {},
>           "exists": 1,
>           "status": "up",
>           "reweight": 1,
>           "primary_affinity": 1,
>           "name": "osd.1"
>         },
>         {
>           "id": 2,
>           "device_class": "ssd",
>           "type": "osd",
>           "type_id": 0,
>           "crush_weight": 5.8218994140625,
>           "depth": 2,
>           "pool_weights": {},
>           "exists": 1,
>           "status": "up",
>           "reweight": 1,
>           "primary_affinity": 1,
>           "name": "osd.2"
>         },
>         {
>           "id": 3,
>           "device_class": "ssd",
>           "type": "osd",
>           "type_id": 0,
>           "crush_weight": 5.8218994140625,
>           "depth": 2,
>           "pool_weights": {},
>           "exists": 1,
>           "status": "up",
>           "reweight": 1,
>           "primary_affinity": 1,
>           "name": "osd.3"
>         },
>         {
>           "id": 4,
>           "device_class": "ssd",
>           "type": "osd",
>           "type_id": 0,
>           "crush_weight": 5.8218994140625,
>           "depth": 2,
>           "pool_weights": {},
>           "exists": 1,
>           "status": "up",
>           "reweight": 1,
>           "primary_affinity": 1,
>           "name": "osd.4"
>         },
>         {
>           "id": 5,
>           "device_class": "ssd",
>           "type": "osd",
>           "type_id": 0,
>           "crush_weight": 5.8218994140625,
>           "depth": 2,
>           "pool_weights": {},
>           "exists": 1,
>           "status": "up",
>           "reweight": 1,
>           "primary_affinity": 1,
>           "name": "osd.5"
>         },
>         {
>           "id": 6,
>           "device_class": "ssd",
>           "type": "osd",
>           "type_id": 0,
>           "crush_weight": 5.8218994140625,
>           "depth": 2,
>           "pool_weights": {},
>           "exists": 1,
>           "status": "up",
>           "reweight": 1,
>           "primary_affinity": 1,
>           "name": "osd.6"
>         },
>         {
>           "id": 7,
>           "device_class": "ssd",
>           "type": "osd",
>           "type_id": 0,
>           "crush_weight": 5.8218994140625,
>           "depth": 2,
>           "pool_weights": {},
>           "exists": 1,
>           "status": "up",
>           "reweight": 1,
>           "primary_affinity": 1,
>           "name": "osd.7"
>         },
>         {
>           "id": 8,
>           "device_class": "ssd",
>           "type": "osd",
>           "type_id": 0,
>           "crush_weight": 5.8218994140625,
>           "depth": 2,
>           "pool_weights": {},
>           "exists": 1,
>           "status": "up",
>           "reweight": 1,
>           "primary_affinity": 1,
>           "name": "osd.8"
>         },
>         {
>           "id": 9,
>           "device_class": "ssd",
>           "type": "osd",
>           "type_id": 0,
>           "crush_weight": 5.8218994140625,
>           "depth": 2,
>           "pool_weights": {},
>           "exists": 1,
>           "status": "up",
>           "reweight": 1,
>           "primary_affinity": 1,
>           "name"

[ceph-users] Re: Cephadm Hangs During OSD Apply

2020-06-04 Thread Sebastian Wagner
encrypted OSDS should land in the next octopus release:

https://tracker.ceph.com/issues/44625

Am 27.05.20 um 20:31 schrieb m...@silvenga.com:
> I noticed the luks volumes were open, even though luksOpen hung. I killed 
> cryptsetup (once per disk) and ceph-volume continued and eventually created 
> the osd's for the host (yes, this node will be slated for another reinstall 
> when cephadm is stabilized).
> 
> Is there a way to remove an osd service spec with the current tooling? The 
> drives are immediately locked when the node is added to orch.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Octopus 15.2.2 unable to make drives available (reject reason locked)...

2020-06-04 Thread Sebastian Wagner
Hi Marco,

note that encrypted OSDs will land in the next octous release.

Regarding the locked state, you could run ceph-volume directly on the
host to understand the issue better. c-v should give you the reasons.

Am 29.05.20 um 03:18 schrieb Marco Pizzolo:
> Rebooting addressed
> 
> On Thu, May 28, 2020 at 4:52 PM Marco Pizzolo 
> wrote:
> 
>> Hello,
>>
>> Hitting an issue with a new 15.2.2 deployment using cephadm.  I am having
>> a problem creating encrypted, 2 osds per device OSDs (they are NVMe).
>>
>> After removing and bootstrapping the cluster again, i am unable to create
>> OSDs as they're locked.  sgdisk, wipefs, zap all fail to leave the drives
>> as available.
>>
>> Any help would be appreciated.
>> Any comments on performance experiences with ceph in containers (cephadm
>> deployed) vs bare metal (ceph-deploy) would be greatly appreciated as well.
>>
>> Thanks,
>> Marco
>>
>> ceph orch device ls
>> HOST PATH  TYPE   SIZE  DEVICE
>>   AVAIL  REJECT REASONS
>> prdhcistonode01  /dev/nvme0n1  ssd   11.6T
>>  Micron_9300_MTFDHAL12T8TDR_2006266528D1  False  *locked*
>> prdhcistonode01  /dev/nvme1n1  ssd   11.6T
>>  Micron_9300_MTFDHAL12T8TDR_2006266534D9  False  *locked*
>> prdhcistonode01  /dev/nvme2n1  ssd953G  INTEL
>> SSDPEKKF010T8_BTHH850215GA1P0E False  *locked*
>> prdhcistonode01  /dev/nvme3n1  ssd   11.6T
>>  Micron_9300_MTFDHAL12T8TDR_200626651473  False  *locked*
>> prdhcistonode01  /dev/nvme4n1  ssd   11.6T
>>  Micron_9300_MTFDHAL12T8TDR_2006266508FB  False * locked*
>> prdhcistonode01  /dev/nvme5n1  ssd   11.6T
>>  Micron_9300_MTFDHAL12T8TDR_20062664E6E8  False  *locked*
>> prdhcistonode01  /dev/nvme6n1  ssd   11.6T
>>  Micron_9300_MTFDHAL12T8TDR_200626653CC0  False * locked*
>> prdhcistonode01  /dev/nvme7n1  ssd   11.6T
>>  Micron_9300_MTFDHAL12T8TDR_1939243B797E  False * locked*
>> prdhcistonode01  /dev/nvme8n1  ssd   11.6T
>>  Micron_9300_MTFDHAL12T8TDR_200626652441  False  *locked*
>>
>>
>> lsblk
>>
>> NAME
>>MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
>> nvme2n1
>> 259:00 953.9G  0 disk
>> ├─nvme2n1p1
>> 259:10   512M  0 part /boot/efi
>> └─nvme2n1p2
>> 259:20 953.4G  0 part /
>> nvme3n1
>> 259:30  11.7T  0 disk
>> └─ceph--5bd47cae--97b3--4cad--b010--215fd982497b-osd--data--e6045acd--a56d--41d2--a016--b8647b9a717a
>>  253:10  11.7T  0 lvm
>> nvme4n1
>> 259:40  11.7T  0 disk
>> └─ceph--bf7dbfb4--afe3--4391--9847--08e461bf6247-osd--data--12faafac--b695--4c30--b6d7--7046d8275d9f
>>  253:00  11.7T  0 lvm
>> nvme0n1
>> 259:50  11.7T  0 disk
>> └─ceph--1a5d8e23--ff7d--44c3--b6d2--de143fed2b7d-osd--block--b6593547--e99a--4add--8edd--5d0fb53254cd
>> 253:20  11.7T  0 lvm
>> nvme5n1
>> 259:60  11.7T  0 disk
>> └─ceph--7d85ff24--79c8--4792--a2c8--bb4908f77ff0-osd--data--fc4e9dbd--920f--41b8--8467--74e9dcbd57ca
>>  253:30  11.7T  0 lvm
>> nvme6n1
>> 259:70  11.7T  0 disk
>> └─ceph--d8c8652a--1cd8--4e10--a333--4ea10f3b5004-osd--data--9a70a549--3cba--4f0d--a13a--8465781a10e9
>>  253:50  11.7T  0 lvm
>> nvme8n1
>> 259:80  11.7T  0 disk
>> └─ceph--e1914f1c--2385--4c0c--9951--d4b9200b7164-osd--data--8876559c--6393--4fbc--821b--7ac74cfb5a54
>>  253:70  11.7T  0 lvm
>> nvme7n1
>> 259:90  11.7T  0 disk
>> └─ceph--3765b53a--75eb--489e--97e1--d6b03bc25532-osd--data--777638e0--a325--401d--a01d--459676871003
>>  253:40  11.7T  0 lvm
>> nvme1n1
>> 259:10   0  11.7T  0 disk
>> └─ceph--2124f206--2b50--41a1--8a3c--d47c1a909a3b-osd--block--88e4f1eb--73f4--4c83--b978--fe7cabc0c3e6
>> 253:60  11.7T  0 lvm
>>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Wido den Hollander



On 6/4/20 9:17 AM, Frank Schilder wrote:
>> Yes and No. This will cause many CRUSHMap updates where a manual update
>> is only a single change.
>>
>> I would do:
>>
>> $ ceph osd getcrushmap -o crushmap
> 
> Well, that's a yes and a no as well.
> 
> If you are experienced and edit crush maps on a regular basis, you can go 
> that way. I would still enclose the change in a norebalance setting. If you 
> are not experienced, you are likely to shoot your cluster. In particular, 
> adding and moving buckets is not fun this way. You need to be careful what 
> IDs you assign, and there are many options to choose from with documentation 
> targeted at experienced cephers.
> 
> CLI commands will prevent a lot of stupid typos, errors and forgotten 
> mandatory lines. I learned that the hard way and decided to use a direct edit 
> only when absolutely necessary. A couple of extra peerings is a low-cost 
> operation compared with trying to find a stupid typo that just killed all 
> pools when angry users stand next to you.
> 
> My recommendation would be to save the original crush map, apply commands and 
> look at changes these commands do. That's a great way to learn how to do it 
> right. And in general, better be safe than sorry.
> 

I think we understand each other :-)

Main thing: Backup your crushmap! Then you can always roll back if
things go wrong.

Wido

> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> 
> 
> From: Wido den Hollander 
> Sent: 04 June 2020 08:50:16
> To: Frank Schilder; Kyriazis, George; ceph-users
> Subject: Re: [ceph-users] Re: Best way to change bucket hierarchy
> 
> On 6/4/20 12:24 AM, Frank Schilder wrote:
>> You can use the command-line without editing the crush map. Look at the 
>> documentation of commands like
>>
>> ceph osd crush add-bucket ...
>> ceph osd crush move ...
>>
>> Before starting this, set "ceph osd set norebalance" and unset after you are 
>> happy with the crush tree. Let everything peer. You should see misplaced 
>> objects and remapped PGs, but no degraded objects or PGs.
>>
>> Do this only when cluster is helth_ok, otherwise things can get really 
>> complicated.
>>
> 
> Yes and No. This will cause many CRUSHMap updates where a manual update
> is only a single change.
> 
> I would do:
> 
> $ ceph osd getcrushmap -o crushmap
> $ cp crushmap crushmap.backup
> $ crushtool -d crushmap -o crushmap.txt
> $ vi crushmap.txt (now make your changes)
> $ crushtool -c crushmap.txt -o crushmap.new
> $ crushtool -i crushmap.new --tree (check if all OK)
> $ crushtool -i crushmap.new --test --rule 0 --num-rep 3 --show-mappings
> 
> If all is good:
> 
> $ ceph osd setcrushmap -i crushmap.new
> 
> If all goes bad, simply revert to your old crushmap.
> 
> Wido
> 
>> Best regards,
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>>
>> 
>> From: Kyriazis, George 
>> Sent: 03 June 2020 22:45:11
>> To: ceph-users
>> Subject: [ceph-users] Best way to change bucket hierarchy
>>
>> Helo,
>>
>> I have a live ceph cluster, and I’m in the need of modifying the bucket 
>> hierarchy.  I am currently using the default crush rule (ie. keep each 
>> replica on a different host).  My need is to add a “chassis” level, and keep 
>> replicas on a per-chassis level.
>>
>> From what I read in the documentation, I would have to edit the crush file 
>> manually, however this sounds kinda scary for a live cluster.
>>
>> Are there any “best known methods” to achieve that goal without messing 
>> things up?
>>
>> In my current scenario, I have one host per chassis, and planning on later 
>> adding nodes where there would be >1 hosts per chassis. It looks like “in 
>> theory” there wouldn’t be a need for any data movement after the crush map 
>> changes.  Will reality match theory?  Anything else I need to watch out for?
>>
>> Thank you!
>>
>> George
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm Setup Query

2020-06-04 Thread Sebastian Wagner


Am 26.05.20 um 08:16 schrieb Shivanshi .:
> Hi,
> 
> I am facing an issue on Cephadm cluster setup. Whenever, I try to add
> remote devices as OSDs, command just hangs.
> 
> The steps I have followed :
> 
> sudo ceph orch daemon add osd node1:device
> 
>  
> 
>  1. For the setup I have followed the steps mentioned in :
> 
> https://ralph.blog.imixs.com/2020/04/14/ceph-octopus-running-on-debian-buster/
> 
>  
> 
>  1. To make sure it is not facing ssh errors and  host is reachable I
> have tried the following commands:
> cephadm shell -- ceph config-key get mgr/cephadm/ssh_identity_key > key
> cephadm shell -- ceph cephadm get-ssh-config > config
> ssh -F config -i key root@hostname
> 
>   I am able to connect to the host as root.
> 
>  
> 
>  1. Then I have tired collecting the log information
>  1. Command : sudo cephadm logs --fsid
> e236062e-96ad-11ea-bedb-5254002e4127 --name osd
> Result :
> Traceback (most recent call last):
> File "/usr/sbin/cephadm", line 4282, in 
> r = args.func()
> File "/usr/sbin/cephadm", line 921, in _infer_fsid
> return func()
> File "/usr/sbin/cephadm", line 2689, in command_logs
> (daemon_type, daemon_id) = args.name.split('.', 1)
> ValueError: not enough values to unpack (expected 2, got 1)

cephadm logs expects a name as returned by `cephadm ls | jq '.[].name'`


>  2. Commad : sudo ceph log last cephadm
>  
> 
> Result :
> 
>  
> 
> INFO:cephadm:Verifying port 9100 ...
> 
>  WARNING:cephadm:Cannot bind to IP 0.0.0.0 port 9100: [Errno 98] Address
> already in use
> 
>  ERROR: TCP Port(s) '9100' required for node-exporter is already in use
> 
>  Traceback (most recent call last):
> 
> File "/usr/share/ceph/mgr/cephadm/module.py", line 1638, in _run_cephadm
> 
> code, '\n'.join(err)))
> 
>  RuntimeError: cephadm exited with an error code: 1,
> stderr:INFO:cephadm:Deploying daemon node-exporter.ceph-mon ...
> 
>  INFO:cephadm:Verifying port 9100 ...
> 
>  WARNING:cephadm:Cannot bind to IP 0.0.0.0 port 9100: [Errno 98] Address
> already in use
> 
>  ERROR: TCP Port(s) '9100' required for node-exporter is already in use

Looks like a node-exporter is already running on this host. I don't know
where this comes from. Was a node-exporter installed previously?

> 
>  2020-05-15T13:33:46.966159+ mgr.ceph-mgr.dixgvy (mgr.14161) 678 :
> cephadm [WRN] Failed to apply node-exporter spec ServiceSpec(
> 
> {'placement': PlacementSpec(host_pattern='*'), 'service_type':
> 'node-exporter', 'service_id': None, 'unmanaged': False}
> 
> ): cephadm exited with an error code: 1, stderr:INFO:cephadm:Deploying
> daemon node-exporter.ceph-mon ...
> 
>  INFO:cephadm:Verifying port 9100 ...
> 
>  WARNING:cephadm:Cannot bind to IP 0.0.0.0 port 9100: [Errno 98] Address
> already in use
> 
>  ERROR: TCP Port(s) '9100' required for node-exporter is already in use
> 
>  
> 
>  
> 
> But I am not able to infer from these log information. Can you please
> help me with the issue.
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> 

-- 
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg). Geschäftsführer: Felix Imendörffer



signature.asc
Description: OpenPGP digital signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orch upgrade stuck at the beginning.

2020-06-04 Thread Sebastian Wagner
sorry for the late response.

I'm seeing

> Upgrade: It is NOT safe to stop mon.vx-rg23-rk65-u43-130

in the logs.

please make sure `ceph mon ok-to-stop vx-rg23-rk65-u43-130`

succeeds.





Am 22.05.20 um 19:28 schrieb Gencer W. Genç:
> Hi Sebastian,
> 
> I cannot see my replies in here. So i put attachment as a body here:
> 
> 2020-05-21T18:52:36.813+ 7faf19f20040  0 set uid:gid to 167:167 
> (ceph:ceph)
> 2020-05-21T18:52:36.813+ 7faf19f20040  0 ceph version 15.2.2 
> (0c857e985a29d90501a285f242ea9c008df49eb8) octopus (stable), process 
> ceph-mgr, pid 1
> 2020-05-21T18:52:36.817+ 7faf19f20040  0 pidfile_write: ignore empty 
> --pid-file
> 2020-05-21T18:52:36.853+ 7faf19f20040  1 mgr[py] Loading python module 
> 'alerts'
> 2020-05-21T18:52:36.957+ 7faf19f20040  1 mgr[py] Loading python module 
> 'balancer'
> 2020-05-21T18:52:37.029+ 7faf19f20040  1 mgr[py] Loading python module 
> 'cephadm'
> 2020-05-21T18:52:37.237+ 7faf19f20040  1 mgr[py] Loading python module 
> 'crash'
> 2020-05-21T18:52:37.333+ 7faf19f20040  1 mgr[py] Loading python module 
> 'dashboard'
> 2020-05-21T18:52:37.981+ 7faf19f20040  1 mgr[py] Loading python module 
> 'devicehealth'
> 2020-05-21T18:52:38.045+ 7faf19f20040  1 mgr[py] Loading python module 
> 'diskprediction_local'
> 2020-05-21T18:52:38.221+ 7faf19f20040  1 mgr[py] Loading python module 
> 'influx'
> 2020-05-21T18:52:38.293+ 7faf19f20040  1 mgr[py] Loading python module 
> 'insights'
> 2020-05-21T18:52:38.425+ 7faf19f20040  1 mgr[py] Loading python module 
> 'iostat'
> 2020-05-21T18:52:38.489+ 7faf19f20040  1 mgr[py] Loading python module 
> 'k8sevents'
> 2020-05-21T18:52:39.077+ 7faf19f20040  1 mgr[py] Loading python module 
> 'localpool'
> 2020-05-21T18:52:39.133+ 7faf19f20040  1 mgr[py] Loading python module 
> 'orchestrator'
> 2020-05-21T18:52:39.277+ 7faf19f20040  1 mgr[py] Loading python module 
> 'osd_support'
> 2020-05-21T18:52:39.433+ 7faf19f20040  1 mgr[py] Loading python module 
> 'pg_autoscaler'
> 2020-05-21T18:52:39.545+ 7faf19f20040  1 mgr[py] Loading python module 
> 'progress'
> 2020-05-21T18:52:39.633+ 7faf19f20040  1 mgr[py] Loading python module 
> 'prometheus'
> 2020-05-21T18:52:40.013+ 7faf19f20040  1 mgr[py] Loading python module 
> 'rbd_support'
> 2020-05-21T18:52:40.253+ 7faf19f20040  1 mgr[py] Loading python module 
> 'restful'
> 2020-05-21T18:52:40.553+ 7faf19f20040  1 mgr[py] Loading python module 
> 'rook'
> 2020-05-21T18:52:41.229+ 7faf19f20040  1 mgr[py] Loading python module 
> 'selftest'
> 2020-05-21T18:52:41.285+ 7faf19f20040  1 mgr[py] Loading python module 
> 'status'
> 2020-05-21T18:52:41.357+ 7faf19f20040  1 mgr[py] Loading python module 
> 'telegraf'
> 2020-05-21T18:52:41.421+ 7faf19f20040  1 mgr[py] Loading python module 
> 'telemetry'
> 2020-05-21T18:52:41.581+ 7faf19f20040  1 mgr[py] Loading python module 
> 'test_orchestrator'
> 2020-05-21T18:52:41.937+ 7faf19f20040  1 mgr[py] Loading python module 
> 'volumes'
> 2020-05-21T18:52:42.121+ 7faf19f20040  1 mgr[py] Loading python module 
> 'zabbix'
> 2020-05-21T18:52:42.189+ 7faf06a1a700  0 ms_deliver_dispatch: unhandled 
> message 0x556226c8e6e0 mon_map magic: 0 v1 from mon.1 v2:192.168.0.3:3300/0
> 2020-05-21T18:52:43.557+ 7faf06a1a700  1 mgr handle_mgr_map Activating!
> 2020-05-21T18:52:43.557+ 7faf06a1a700  1 mgr handle_mgr_map I am now 
> activating
> 2020-05-21T18:52:43.665+ 7faed44a7700  0 [balancer DEBUG root] setting 
> log level based on debug_mgr: WARNING (1/5)
> 2020-05-21T18:52:43.665+ 7faed44a7700  1 mgr load Constructed class from 
> module: balancer
> 2020-05-21T18:52:43.665+ 7faed44a7700  0 [cephadm DEBUG root] setting log 
> level based on debug_mgr: WARNING (1/5)
> 2020-05-21T18:52:43.689+ 7faed44a7700  1 mgr load Constructed class from 
> module: cephadm
> 2020-05-21T18:52:43.689+ 7faed44a7700  0 [crash DEBUG root] setting log 
> level based on debug_mgr: WARNING (1/5)
> 2020-05-21T18:52:43.689+ 7faed44a7700  1 mgr load Constructed class from 
> module: crash
> 2020-05-21T18:52:43.693+ 7faed44a7700  0 [dashboard DEBUG root] setting 
> log level based on debug_mgr: WARNING (1/5)
> 2020-05-21T18:52:43.693+ 7faed44a7700  1 mgr load Constructed class from 
> module: dashboard
> 2020-05-21T18:52:43.693+ 7faed44a7700  0 [devicehealth DEBUG root] 
> setting log level based on debug_mgr: WARNING (1/5)
> 2020-05-21T18:52:43.693+ 7faed44a7700  1 mgr load Constructed class from 
> module: devicehealth
> 2020-05-21T18:52:43.701+ 7faed44a7700  0 [iostat DEBUG root] setting log 
> level based on debug_mgr: WARNING (1/5)
> 2020-05-21T18:52:43.701+ 7faed44a7700  1 mgr load Constructed class from 
> module: iostat
> 2020-05-21T18:52:43.709+ 7faed44a7700  0 [orchestrator DEBUG root] 
> setting log level based on debug_mgr: WARNING (1/5)
> 2020-05-21T18:52:43.709+ 7faed44a7700  1 mgr load Constructe

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Frank Schilder
Hi George,

for replicated rules you can simply create a new crush rule with the new 
failure domain set to chassis and change any pool's crush rule to this new one. 
If you have EC pools, then the chooseleaf needs to be edited by hand. I did 
this before as well. (A really unfortunate side effect is, that the EC profile 
attached to the pool goes out of sync with the crush map and there is nothing 
one can do about that. This is annoying yet harmless.)

The intend of doing these changes while norebalance is set is

- to avoid unnecessary data movement due to successive changes happening step 
by step and
- to make sure peering is successful before starting to move data.

I believe OSDs peer a bit faster with norebalance set and there is then a 
shorter interrupt to ongoing I/O (no I/O happens to a PG during peering).

Yes, if you safe the old crush map, you can undo everything. It is a good idea 
to have a backup also just for reference and to compare before and after.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Kyriazis, George 
Sent: 04 June 2020 00:58:20
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Thanks Frank,

I don’t have too much experience editing crush rules, but I assume the 
chooseleaf step would also have to change to:

step chooseleaf firstn 0 type chassis

Correct?  Is that the only other change that is needed?  It looks like the rule 
change can happen both inside and outside the “norebalance” setting (again with 
CLI commands), but is it safer to do it inside (ie. while not rebalancing)?

If I keep a backup of the crush rule map (with “ceph osd getcrushmap”), I 
assume I can restore the old map if something goes bad?

Thanks again!

George



> On Jun 3, 2020, at 5:24 PM, Frank Schilder  wrote:
>
> You can use the command-line without editing the crush map. Look at the 
> documentation of commands like
>
> ceph osd crush add-bucket ...
> ceph osd crush move ...
>
> Before starting this, set "ceph osd set norebalance" and unset after you are 
> happy with the crush tree. Let everything peer. You should see misplaced 
> objects and remapped PGs, but no degraded objects or PGs.
>
> Do this only when cluster is helth_ok, otherwise things can get really 
> complicated.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Kyriazis, George 
> Sent: 03 June 2020 22:45:11
> To: ceph-users
> Subject: [ceph-users] Best way to change bucket hierarchy
>
> Helo,
>
> I have a live ceph cluster, and I’m in the need of modifying the bucket 
> hierarchy.  I am currently using the default crush rule (ie. keep each 
> replica on a different host).  My need is to add a “chassis” level, and keep 
> replicas on a per-chassis level.
>
> From what I read in the documentation, I would have to edit the crush file 
> manually, however this sounds kinda scary for a live cluster.
>
> Are there any “best known methods” to achieve that goal without messing 
> things up?
>
> In my current scenario, I have one host per chassis, and planning on later 
> adding nodes where there would be >1 hosts per chassis. It looks like “in 
> theory” there wouldn’t be a need for any data movement after the crush map 
> changes.  Will reality match theory?  Anything else I need to watch out for?
>
> Thank you!
>
> George
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Frank Schilder
> Yes and No. This will cause many CRUSHMap updates where a manual update
> is only a single change.
>
> I would do:
>
> $ ceph osd getcrushmap -o crushmap

Well, that's a yes and a no as well.

If you are experienced and edit crush maps on a regular basis, you can go that 
way. I would still enclose the change in a norebalance setting. If you are not 
experienced, you are likely to shoot your cluster. In particular, adding and 
moving buckets is not fun this way. You need to be careful what IDs you assign, 
and there are many options to choose from with documentation targeted at 
experienced cephers.

CLI commands will prevent a lot of stupid typos, errors and forgotten mandatory 
lines. I learned that the hard way and decided to use a direct edit only when 
absolutely necessary. A couple of extra peerings is a low-cost operation 
compared with trying to find a stupid typo that just killed all pools when 
angry users stand next to you.

My recommendation would be to save the original crush map, apply commands and 
look at changes these commands do. That's a great way to learn how to do it 
right. And in general, better be safe than sorry.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Wido den Hollander 
Sent: 04 June 2020 08:50:16
To: Frank Schilder; Kyriazis, George; ceph-users
Subject: Re: [ceph-users] Re: Best way to change bucket hierarchy

On 6/4/20 12:24 AM, Frank Schilder wrote:
> You can use the command-line without editing the crush map. Look at the 
> documentation of commands like
>
> ceph osd crush add-bucket ...
> ceph osd crush move ...
>
> Before starting this, set "ceph osd set norebalance" and unset after you are 
> happy with the crush tree. Let everything peer. You should see misplaced 
> objects and remapped PGs, but no degraded objects or PGs.
>
> Do this only when cluster is helth_ok, otherwise things can get really 
> complicated.
>

Yes and No. This will cause many CRUSHMap updates where a manual update
is only a single change.

I would do:

$ ceph osd getcrushmap -o crushmap
$ cp crushmap crushmap.backup
$ crushtool -d crushmap -o crushmap.txt
$ vi crushmap.txt (now make your changes)
$ crushtool -c crushmap.txt -o crushmap.new
$ crushtool -i crushmap.new --tree (check if all OK)
$ crushtool -i crushmap.new --test --rule 0 --num-rep 3 --show-mappings

If all is good:

$ ceph osd setcrushmap -i crushmap.new

If all goes bad, simply revert to your old crushmap.

Wido

> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Kyriazis, George 
> Sent: 03 June 2020 22:45:11
> To: ceph-users
> Subject: [ceph-users] Best way to change bucket hierarchy
>
> Helo,
>
> I have a live ceph cluster, and I’m in the need of modifying the bucket 
> hierarchy.  I am currently using the default crush rule (ie. keep each 
> replica on a different host).  My need is to add a “chassis” level, and keep 
> replicas on a per-chassis level.
>
> From what I read in the documentation, I would have to edit the crush file 
> manually, however this sounds kinda scary for a live cluster.
>
> Are there any “best known methods” to achieve that goal without messing 
> things up?
>
> In my current scenario, I have one host per chassis, and planning on later 
> adding nodes where there would be >1 hosts per chassis. It looks like “in 
> theory” there wouldn’t be a need for any data movement after the crush map 
> changes.  Will reality match theory?  Anything else I need to watch out for?
>
> Thank you!
>
> George
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph orch upgrade stuck at the beginning.

2020-06-04 Thread Gencer W . Genç
Hi Sebastian,

No worries about the delay. I just run that command however it returns:

$ ceph mon ok-to-stop vx-rg23-rk65-u43-130

Error EBUSY: not enough monitors would be available (vx-rg23-rk65-u43-130-1) 
after stopping mons [vx-rg23-rk65-u43-130]

It seems we have some progress here. In the past commands i got quorum. This 
time it acknowledges about monitor hostname but fail due to not enough monitors 
after stopping it.

Any idea on this step?

Thanks,
Gencer.
On 4.06.2020 13:20:09, Sebastian Wagner  wrote:
sorry for the late response.

I'm seeing

> Upgrade: It is NOT safe to stop mon.vx-rg23-rk65-u43-130

in the logs.

please make sure `ceph mon ok-to-stop vx-rg23-rk65-u43-130`

succeeds.





Am 22.05.20 um 19:28 schrieb Gencer W. Genç:
> Hi Sebastian,
>
> I cannot see my replies in here. So i put attachment as a body here:
>
> 2020-05-21T18:52:36.813+ 7faf19f20040 0 set uid:gid to 167:167 (ceph:ceph)
> 2020-05-21T18:52:36.813+ 7faf19f20040 0 ceph version 15.2.2 
> (0c857e985a29d90501a285f242ea9c008df49eb8) octopus (stable), process 
> ceph-mgr, pid 1
> 2020-05-21T18:52:36.817+ 7faf19f20040 0 pidfile_write: ignore empty 
> --pid-file
> 2020-05-21T18:52:36.853+ 7faf19f20040 1 mgr[py] Loading python module 
> 'alerts'
> 2020-05-21T18:52:36.957+ 7faf19f20040 1 mgr[py] Loading python module 
> 'balancer'
> 2020-05-21T18:52:37.029+ 7faf19f20040 1 mgr[py] Loading python module 
> 'cephadm'
> 2020-05-21T18:52:37.237+ 7faf19f20040 1 mgr[py] Loading python module 
> 'crash'
> 2020-05-21T18:52:37.333+ 7faf19f20040 1 mgr[py] Loading python module 
> 'dashboard'
> 2020-05-21T18:52:37.981+ 7faf19f20040 1 mgr[py] Loading python module 
> 'devicehealth'
> 2020-05-21T18:52:38.045+ 7faf19f20040 1 mgr[py] Loading python module 
> 'diskprediction_local'
> 2020-05-21T18:52:38.221+ 7faf19f20040 1 mgr[py] Loading python module 
> 'influx'
> 2020-05-21T18:52:38.293+ 7faf19f20040 1 mgr[py] Loading python module 
> 'insights'
> 2020-05-21T18:52:38.425+ 7faf19f20040 1 mgr[py] Loading python module 
> 'iostat'
> 2020-05-21T18:52:38.489+ 7faf19f20040 1 mgr[py] Loading python module 
> 'k8sevents'
> 2020-05-21T18:52:39.077+ 7faf19f20040 1 mgr[py] Loading python module 
> 'localpool'
> 2020-05-21T18:52:39.133+ 7faf19f20040 1 mgr[py] Loading python module 
> 'orchestrator'
> 2020-05-21T18:52:39.277+ 7faf19f20040 1 mgr[py] Loading python module 
> 'osd_support'
> 2020-05-21T18:52:39.433+ 7faf19f20040 1 mgr[py] Loading python module 
> 'pg_autoscaler'
> 2020-05-21T18:52:39.545+ 7faf19f20040 1 mgr[py] Loading python module 
> 'progress'
> 2020-05-21T18:52:39.633+ 7faf19f20040 1 mgr[py] Loading python module 
> 'prometheus'
> 2020-05-21T18:52:40.013+ 7faf19f20040 1 mgr[py] Loading python module 
> 'rbd_support'
> 2020-05-21T18:52:40.253+ 7faf19f20040 1 mgr[py] Loading python module 
> 'restful'
> 2020-05-21T18:52:40.553+ 7faf19f20040 1 mgr[py] Loading python module 
> 'rook'
> 2020-05-21T18:52:41.229+ 7faf19f20040 1 mgr[py] Loading python module 
> 'selftest'
> 2020-05-21T18:52:41.285+ 7faf19f20040 1 mgr[py] Loading python module 
> 'status'
> 2020-05-21T18:52:41.357+ 7faf19f20040 1 mgr[py] Loading python module 
> 'telegraf'
> 2020-05-21T18:52:41.421+ 7faf19f20040 1 mgr[py] Loading python module 
> 'telemetry'
> 2020-05-21T18:52:41.581+ 7faf19f20040 1 mgr[py] Loading python module 
> 'test_orchestrator'
> 2020-05-21T18:52:41.937+ 7faf19f20040 1 mgr[py] Loading python module 
> 'volumes'
> 2020-05-21T18:52:42.121+ 7faf19f20040 1 mgr[py] Loading python module 
> 'zabbix'
> 2020-05-21T18:52:42.189+ 7faf06a1a700 0 ms_deliver_dispatch: unhandled 
> message 0x556226c8e6e0 mon_map magic: 0 v1 from mon.1 v2:192.168.0.3:3300/0
> 2020-05-21T18:52:43.557+ 7faf06a1a700 1 mgr handle_mgr_map Activating!
> 2020-05-21T18:52:43.557+ 7faf06a1a700 1 mgr handle_mgr_map I am now 
> activating
> 2020-05-21T18:52:43.665+ 7faed44a7700 0 [balancer DEBUG root] setting log 
> level based on debug_mgr: WARNING (1/5)
> 2020-05-21T18:52:43.665+ 7faed44a7700 1 mgr load Constructed class from 
> module: balancer
> 2020-05-21T18:52:43.665+ 7faed44a7700 0 [cephadm DEBUG root] setting log 
> level based on debug_mgr: WARNING (1/5)
> 2020-05-21T18:52:43.689+ 7faed44a7700 1 mgr load Constructed class from 
> module: cephadm
> 2020-05-21T18:52:43.689+ 7faed44a7700 0 [crash DEBUG root] setting log 
> level based on debug_mgr: WARNING (1/5)
> 2020-05-21T18:52:43.689+ 7faed44a7700 1 mgr load Constructed class from 
> module: crash
> 2020-05-21T18:52:43.693+ 7faed44a7700 0 [dashboard DEBUG root] setting 
> log level based on debug_mgr: WARNING (1/5)
> 2020-05-21T18:52:43.693+ 7faed44a7700 1 mgr load Constructed class from 
> module: dashboard
> 2020-05-21T18:52:43.693+ 7faed44a7700 0 [devicehealth DEBUG root] setting 
> log level based on debug_mgr: WARNING (1/5)
> 2020-05-21T18:52:43.693+ 7faed44a770

[ceph-users] speed up individual backfills

2020-06-04 Thread Thomas Bennett
Hi,

I have 15628 misplaced objects that are currently backfilling as follows:

   1. pgid:14.3ce1  from:osd.1321 to:osd.3313
   2. pgid:14.4dd9 from:osd.1693 to:osd.2980
   3. pgid:14.680b from:osd.362 to:osd.3313

These are remnant backfills from a pg-upmap/rebalance campaign after we've
added 2 new racks worth of osds to our cluster.

Our mon db is bloated so I'm wanting to trim the mon db before continuing
the next pg-upmap/rebalance campaign.

So, my question is:
Is there any way I can speed up the backfill process on these individual
osds?
Or hints to trace out why these are so slow?

Regards
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Degradation of write-performance after upgrading to Octopus

2020-06-04 Thread Thomas Gradisnik
We have deployed a small test cluster consisting of three nodes. Each node is 
running a mon/mgr and two osds (Samsung PM983 3,84TB NVMe split into two 
partitions), so six osds in total. We started with Ceph 14.2.7 some weeks ago 
(upgraded to 14.2.9 later) and ran different tests using fio against some rbd 
volumes in order to get an overview what performance we could expect. The 
configuration is unchanged compared to the defaults, we only set several 
debugging options to 0/0.

Yesterday we upgraded the whole cluster following the upgrade guidelines to 
Ceph 15.2.3, which worked without any problems so far. Nevertheless when 
running the same tests as before with Ceph 14.2.9, we are seeing some clear 
degradations in write-performance (beside some performance improvements, which 
shall also be mentioned).

Here the results of concern (each with the relevant fio settings used):

Test "read-latency-max"
(rw=randread, iodepth=64, bs=4k)
read_iops: 32500 -> 87000

Test "write-latency-max"
(rw=randwrite, iodepth=64, bs=4k)
write_iops: 22500 -> 11500

Test "write-throughput-iops-max"
(rw=write, iodepth=64, bs=4k)
write_iops: 7000 -> 14000

Test "usecase1"
(rw=randrw, 
bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/,4k/50:8k/20:16k/20:32k/5:64k/2:128k/:256k/,
 rwmixread=1, rate_process=poisson, iodepth=64)
write_iops: 21000 -> 8500

Test "usecase1-readonly"
(rw=randread, bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/, 
rate_process=poisson, iodepth=64)
read_iops: 28000 -> 58000

The last two tests represent a typical use case on our systems. Therefore we 
are especially concerned by the drop in performance from 21000 w/ops to 8500 
w/ops (about 60%) after upgrading to Ceph 15.2.3. 

We ran all tests several times, the values are averaged over all iterations and 
fairly consistent and reproducible. We even tried wiping the whole cluster, 
downgrading to Ceph 14.2.9 again, setting up a new cluster/pool, running the 
tests and upgrading to Ceph 15.2.3 again. The tests have been performed on one 
of the three cluster nodes using a 50G rbd volume, which had been prefilled 
with random data before each test-run.

Have any changes been introduced with Octopus that could explain the observed 
changes in performance?

What we already tried:

- Disabling rbd cache
- Reverting rbc cache policy to writeback (default in 14.2)
- Setting rbd io scheduler to none
- Deploying a fresh cluster starting with Ceph 15.2.3

Kernel is 5.4.38 … I don't know if some other system specs would be helpful 
besides the already mentioned (since we are talking about a relative change in 
performance after upgrading Ceph without any further changes) - if so, please 
let us know.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: speed up individual backfills

2020-06-04 Thread Thomas Bennett
Hi,

It turns out I was mapping to a problematic OSD. In this case OSD 3313.

After disabling the OSD with systemctl on the host, recovery has picked up
again and mapped the pgs to new osds.

For prosperity, I ran smartctl on osd.3313's device and then I noticed:
  5 Reallocated_Sector_Ct   0x0033   092   092   010Pre-fail  Always
-   *31688*

Lots of reallocated sectors, so the drive was "working" but not usable.

In the end it had nothing to do with Ceph at all.

Regards,


On Thu, Jun 4, 2020 at 1:59 PM Thomas Bennett  wrote:

> Hi,
>
> I have 15628 misplaced objects that are currently backfilling as follows:
>
>1. pgid:14.3ce1  from:osd.1321 to:osd.3313
>2. pgid:14.4dd9 from:osd.1693 to:osd.2980
>3. pgid:14.680b from:osd.362 to:osd.3313
>
> These are remnant backfills from a pg-upmap/rebalance campaign after we've
> added 2 new racks worth of osds to our cluster.
>
> Our mon db is bloated so I'm wanting to trim the mon db before continuing
> the next pg-upmap/rebalance campaign.
>
> So, my question is:
> Is there any way I can speed up the backfill process on these individual
> osds?
> Or hints to trace out why these are so slow?
>
> Regards
>


-- 
Thomas Bennett

Storage Engineer at SARAO
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] changing acces vlan for all the OSDs - potential downtime ?

2020-06-04 Thread Adrian Nicolae

Hi all,

I have a Ceph cluster with a standard setup :

- the public network : MONs and OSDs conected in the same agg switch 
with ports in the same access vlan


- private network :  OSDs connected in another switch with a second eth 
connected in another access vlan


I need to change the public vlan on the first switch and the private 
vlan and the second switch.


Although it should be a trivial operation (just change the vlan range 
ports in a single command), it means that all the OSDs and MONs will not 
be able to communicate with each other for a few seconds . (first on the 
public network, then on the private network).  Do you know if this very 
short period of downtime will mess up the cluster somehow ? Is there a 
best practice on how to do this safely ?


 Thank you ,

 Adrian.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Nautilus latest builds for CentOS 8

2020-06-04 Thread Giulio Fidente
On 6/4/20 1:17 AM, Anthony D'Atri wrote:
> cbs.centos.org offers 14.2.7 packages for el8 eg 
> https://cbs.centos.org/koji/buildinfo?buildID=28564 but I don’t know anything 
> about their provenance or nature.
> For sure a downloads.ceph.com package would be desirable.

upstream CentOS Storage SIG and RDO community maintain that repo (and
build) and deps.

we consume it in OpenStack CI; it's not updated as frequently as we'd
like it to because the upstream Storage SIG is pretty small but it used
to exist for nautilus/centos7 , luminous/centos7 and jewel/centos7 so it
should be reliable
-- 
Giulio Fidente
GPG KEY: 08D733BA
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Degradation of write-performance after upgrading to Octopus

2020-06-04 Thread David Orman
   * bluestore: common/options.cc: disable bluefs_preextend_wal_files  <--
from 15.2.3 changelogs. There was a bug which lead to issues on OSD
restart, and I believe this was the attempt at mitigation until a proper
bugfix could be put into place. I suspect this might be the cause of the
symptoms you're seeing.

https://tracker.ceph.com/issues/45613
https://github.com/ceph/ceph/pull/35293

On Thu, Jun 4, 2020 at 8:07 AM Thomas Gradisnik  wrote:

> We have deployed a small test cluster consisting of three nodes. Each node
> is running a mon/mgr and two osds (Samsung PM983 3,84TB NVMe split into two
> partitions), so six osds in total. We started with Ceph 14.2.7 some weeks
> ago (upgraded to 14.2.9 later) and ran different tests using fio against
> some rbd volumes in order to get an overview what performance we could
> expect. The configuration is unchanged compared to the defaults, we only
> set several debugging options to 0/0.
>
> Yesterday we upgraded the whole cluster following the upgrade guidelines
> to Ceph 15.2.3, which worked without any problems so far. Nevertheless when
> running the same tests as before with Ceph 14.2.9, we are seeing some clear
> degradations in write-performance (beside some performance improvements,
> which shall also be mentioned).
>
> Here the results of concern (each with the relevant fio settings used):
>
> Test "read-latency-max"
> (rw=randread, iodepth=64, bs=4k)
> read_iops: 32500 -> 87000
>
> Test "write-latency-max"
> (rw=randwrite, iodepth=64, bs=4k)
> write_iops: 22500 -> 11500
>
> Test "write-throughput-iops-max"
> (rw=write, iodepth=64, bs=4k)
> write_iops: 7000 -> 14000
>
> Test "usecase1"
> (rw=randrw,
> bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/,4k/50:8k/20:16k/20:32k/5:64k/2:128k/:256k/,
> rwmixread=1, rate_process=poisson, iodepth=64)
> write_iops: 21000 -> 8500
>
> Test "usecase1-readonly"
> (rw=randread, bssplit=4k/40:8k/5:16k/20:32k/5:64k/10:128k/10:256k/,
> rate_process=poisson, iodepth=64)
> read_iops: 28000 -> 58000
>
> The last two tests represent a typical use case on our systems. Therefore
> we are especially concerned by the drop in performance from 21000 w/ops to
> 8500 w/ops (about 60%) after upgrading to Ceph 15.2.3.
>
> We ran all tests several times, the values are averaged over all
> iterations and fairly consistent and reproducible. We even tried wiping the
> whole cluster, downgrading to Ceph 14.2.9 again, setting up a new
> cluster/pool, running the tests and upgrading to Ceph 15.2.3 again. The
> tests have been performed on one of the three cluster nodes using a 50G rbd
> volume, which had been prefilled with random data before each test-run.
>
> Have any changes been introduced with Octopus that could explain the
> observed changes in performance?
>
> What we already tried:
>
> - Disabling rbd cache
> - Reverting rbc cache policy to writeback (default in 14.2)
> - Setting rbd io scheduler to none
> - Deploying a fresh cluster starting with Ceph 15.2.3
>
> Kernel is 5.4.38 … I don't know if some other system specs would be
> helpful besides the already mentioned (since we are talking about a
> relative change in performance after upgrading Ceph without any further
> changes) - if so, please let us know.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Degradation of write-performance after upgrading to Octopus

2020-06-04 Thread Janne Johansson
Den tors 4 juni 2020 kl 16:29 skrev David Orman :

>* bluestore: common/options.cc: disable bluefs_preextend_wal_files  <--
> from 15.2.3 changelogs. There was a bug which lead to issues on OSD
>

Given that preextended WAL files was mentioned as a speed increasing
feature in nautilus 14.2.3 release notes, are nautilus clusters in danger
or just Octopus?



> restart, and I believe this was the attempt at mitigation until a proper
> bugfix could be put into place. I suspect this might be the cause of the
> symptoms you're seeing.
>
> https://tracker.ceph.com/issues/45613
> https://github.com/ceph/ceph/pull/35293
>

-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Degradation of write-performance after upgrading to Octopus

2020-06-04 Thread Stephan
Thanks for your fast reply! We just tried all four possible combinations of 
bluefs_preextend_wal_files and bluefs_buffered_io, but the write-iops in test 
"usecase1" remain the same. By the way  bluefs_preextend_wal_files has been 
false in 14.2.9 (as in 15.2.3). Any other ideas?

David Orman wrote:
> * bluestore: common/options.cc: disable bluefs_preextend_wal_files  <--
> from 15.2.3 changelogs. There was a bug which lead to issues on OSD
> restart, and I believe this was the attempt at mitigation until a proper
> bugfix could be put into place. I suspect this might be the cause of the
> symptoms you're seeing.
> 
> https://tracker.ceph.com/issues/45613
> https://github.com/ceph/ceph/pull/35293
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] nfs-ganesha mount hangs every day since upgrade to nautilus

2020-06-04 Thread Marc Roos


After having to revert back to ceph-fuse upgrading to nautilus, I have 
also that the nfs-ganesha mount stalls/breaks every day. Probably caused 
by:

1 clients failing to respond to capability release
2 clients failing to respond to cache pressure
1 MDSs report slow requests


How to fix this?





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Degradation of write-performance after upgrading to Octopus

2020-06-04 Thread Mark Nelson

Hi Stephan,


We recently ran a set of 3-sample tests looking at 2OSD/NVMe vs 1 
OSD/NVMe RBD performance on Nautilus, Octopus, and Master on some of our 
newer performance nodes with Intel P4510 NVMe drives. Those tests use 
the librbd fio backend.  We also saw similar randread and seq write 
performance increases but did not see a performance regression with 4KB 
random writes like you did.  In fact Octopus was significantly faster 
than Nautilus (but master regressed a little vs octopus).  We expect it 
to be significantly faster too as we improved the way the bluestore 
caches work and it's consistently shown gains for us.  Here are the most 
recent test results:



https://docs.google.com/spreadsheets/d/1e5eTeHdZnSizoY6AUjH0knb4jTCW7KMU4RoryLX9EHQ/edit?usp=sharing


Having said that, this is the second report I've gotten regarding 
performance regression in Octopus so there could be something going on 
that we are missing.  If possible, could you run gdbpmp against one of 
your OSDs during the test?  That might help us figure out why it's 
slow.  Otherwise some other things to look at:



1) If this is a large dataset, see if increasing the osd_memory_target 
helps.  onode cache misses really hurt us and can increase latency and 
hurt IOPS.  Now that Adam's column family sharding PR has merged in 
master we have two complimentary PRs the both help reduce OSD memory 
consumption for caching onodes.  For now you might see higher 
performance if you can afford to give the OSDs more memory.


2) Check to see if the CPUs are being kept in a high power state.  The 
transition can cause higher latency and perversely the less CPU you use 
the more likely the CPU is to drop into a low power state resulting in 
higher latency and worse performance, especially if it ends up thrashing 
between power states.


3) Lately I haven't seen the kv sync thread acting as a hard bottleneck 
during 4KB random writes, but it still could be if you have a low 
clocked processor (especially in a power saving state).  This is still 
an area to look carefully at if performance is low.


4) the bluefs_buffered_io change was the other thing I suspected but it 
sounds like you've already tested that.  never-the-less it would be good 
to see if IOs are backing up.  If you can get a wall clock profile with 
gdbpmp you might be able to tell if io_submit is blocking.  iostat or 
collectl can also probably tell you if the device queue is backing up.



Hope this gives some ideas to start out!


Thanks,

Mark


On 6/4/20 10:07 AM, Stephan wrote:

Thanks for your fast reply! We just tried all four possible combinations of 
bluefs_preextend_wal_files and bluefs_buffered_io, but the write-iops in test 
"usecase1" remain the same. By the way  bluefs_preextend_wal_files has been 
false in 14.2.9 (as in 15.2.3). Any other ideas?

David Orman wrote:

* bluestore: common/options.cc: disable bluefs_preextend_wal_files  <--
from 15.2.3 changelogs. There was a bug which lead to issues on OSD
restart, and I believe this was the attempt at mitigation until a proper
bugfix could be put into place. I suspect this might be the cause of the
symptoms you're seeing.

https://tracker.ceph.com/issues/45613
https://github.com/ceph/ceph/pull/35293

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Degradation of write-performance after upgrading to Octopus

2020-06-04 Thread Mark Nelson

Oh, one other thing:


Check for background work, especially PG balancer.  In all of my tests 
the balancer was explicitly disabled.  During benchmarks there may be a 
high background workload affecting client IO if it's constantly 
rebalancing the number of PGs in the pool.



Mark


On 6/4/20 11:03 AM, Mark Nelson wrote:

Hi Stephan,


We recently ran a set of 3-sample tests looking at 2OSD/NVMe vs 1 
OSD/NVMe RBD performance on Nautilus, Octopus, and Master on some of 
our newer performance nodes with Intel P4510 NVMe drives. Those tests 
use the librbd fio backend.  We also saw similar randread and seq 
write performance increases but did not see a performance regression 
with 4KB random writes like you did.  In fact Octopus was 
significantly faster than Nautilus (but master regressed a little vs 
octopus).  We expect it to be significantly faster too as we improved 
the way the bluestore caches work and it's consistently shown gains 
for us.  Here are the most recent test results:



https://docs.google.com/spreadsheets/d/1e5eTeHdZnSizoY6AUjH0knb4jTCW7KMU4RoryLX9EHQ/edit?usp=sharing 




Having said that, this is the second report I've gotten regarding 
performance regression in Octopus so there could be something going on 
that we are missing.  If possible, could you run gdbpmp against one of 
your OSDs during the test?  That might help us figure out why it's 
slow.  Otherwise some other things to look at:



1) If this is a large dataset, see if increasing the osd_memory_target 
helps.  onode cache misses really hurt us and can increase latency and 
hurt IOPS.  Now that Adam's column family sharding PR has merged in 
master we have two complimentary PRs the both help reduce OSD memory 
consumption for caching onodes.  For now you might see higher 
performance if you can afford to give the OSDs more memory.


2) Check to see if the CPUs are being kept in a high power state. The 
transition can cause higher latency and perversely the less CPU you 
use the more likely the CPU is to drop into a low power state 
resulting in higher latency and worse performance, especially if it 
ends up thrashing between power states.


3) Lately I haven't seen the kv sync thread acting as a hard 
bottleneck during 4KB random writes, but it still could be if you have 
a low clocked processor (especially in a power saving state).  This is 
still an area to look carefully at if performance is low.


4) the bluefs_buffered_io change was the other thing I suspected but 
it sounds like you've already tested that.  never-the-less it would be 
good to see if IOs are backing up.  If you can get a wall clock 
profile with gdbpmp you might be able to tell if io_submit is 
blocking.  iostat or collectl can also probably tell you if the device 
queue is backing up.



Hope this gives some ideas to start out!


Thanks,

Mark


On 6/4/20 10:07 AM, Stephan wrote:
Thanks for your fast reply! We just tried all four possible 
combinations of bluefs_preextend_wal_files and bluefs_buffered_io, 
but the write-iops in test "usecase1" remain the same. By the way  
bluefs_preextend_wal_files has been false in 14.2.9 (as in 15.2.3). 
Any other ideas?


David Orman wrote:

* bluestore: common/options.cc: disable bluefs_preextend_wal_files  <--
from 15.2.3 changelogs. There was a bug which lead to issues on OSD
restart, and I believe this was the attempt at mitigation until a 
proper
bugfix could be put into place. I suspect this might be the cause of 
the

symptoms you're seeing.

https://tracker.ceph.com/issues/45613
https://github.com/ceph/ceph/pull/35293

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-04 Thread Jason Dillaman
On Thu, Jun 4, 2020 at 3:43 AM Zhenshi Zhou  wrote:
>
> My condition is that the primary image being used while rbd-mirror sync.
> I want to get the period between the two times of rbd-mirror transfer the
> increased data.
> I will search those options you provided, thanks a lot :)

When using the original (pre-Octopus) journal-based mirroring, once
the initial sync completes to transfer the bulk of the image data from
a point-in-time dynamic snapshot, any changes post sync will be
replayed continuously from the stream of events written to the journal
on the primary image. The "rbd mirror image status" against the
non-primary image will provide more details about the current state of
the journal replay.

With the Octopus release, we now also support snapshot-based mirroring
where we transfer any image deltas between two mirroring snapshots.
These mirroring snapshots are different from user-created snapshots
and their life-time is managed by RBD mirroring (i.e. they are
automatically pruned when no longer needed). This version of mirroring
probably more closely relates to your line of questioning since the
period of replication is at whatever period you create new mirroring
snapshots (provided your two clusters can keep up).

>
> Eugen Block  于2020年6月4日周四 下午3:28写道:
>
> > The initial sync is a full image sync, the rest is based on the object
> > sets created. There are several options to control the mirroring, for
> > example:
> >
> > rbd_journal_max_concurrent_object_sets
> > rbd_mirror_concurrent_image_syncs
> > rbd_mirror_leader_max_missed_heartbeats
> >
> > and many more. I'm not sure I fully understand what you're asking,
> > maybe you could rephrase your question?
> >
> >
> > Zitat von Zhenshi Zhou :
> >
> > > Hi Eugen,
> > >
> > > Thanks for the reply. If rbd-mirror constantly synchronize changes,
> > > what frequency to replay once? I don't find any options I can config.
> > >
> > > Eugen Block  于2020年6月4日周四 下午2:54写道:
> > >
> > >> Hi,
> > >>
> > >> that's the point of rbd-mirror, to constantly replay changes from the
> > >> primary image to the remote image (if the rbd journal feature is
> > >> enabled).
> > >>
> > >>
> > >> Zitat von Zhenshi Zhou :
> > >>
> > >> > Hi all,
> > >> >
> > >> > I'm gonna deploy a rbd-mirror in order to sync image from clusterA to
> > >> > clusterB.
> > >> > The image will be used while syncing. I'm not sure if the rbd-mirror
> > will
> > >> > sync image
> > >> > continuously or not. If not, I will inform clients not to write data
> > in
> > >> it.
> > >> >
> > >> > Thanks. Regards
> > >> > ___
> > >> > ceph-users mailing list -- ceph-users@ceph.io
> > >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >>
> > >>
> > >> ___
> > >> ceph-users mailing list -- ceph-users@ceph.io
> > >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > >>
> >
> >
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Jason
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Kyriazis, George
Thanks Frank,

Interesting info about the EC profile.  I do have an EC pool, but I noticed the 
following when I dumped the profile:

# ceph osd erasure-code-profile get ec22
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=2
m=2
plugin=jerasure
technique=reed_sol_van
w=8
#

Which says that the failure domain of the EC profile is also set to host.  
Looks like I need to change the EC profile, too, but since it associated with 
the pool, maybe I can’t do that after pool creation?  Or…. Since it the 
property is named “crush-failure-domain”, it’s automatically inherited from the 
crush profile, so I don’t have to do anything?

Thanks,

George


On Jun 4, 2020, at 1:51 AM, Frank Schilder mailto:fr...@dtu.dk>> 
wrote:

Hi George,

for replicated rules you can simply create a new crush rule with the new 
failure domain set to chassis and change any pool's crush rule to this new one. 
If you have EC pools, then the chooseleaf needs to be edited by hand. I did 
this before as well. (A really unfortunate side effect is, that the EC profile 
attached to the pool goes out of sync with the crush map and there is nothing 
one can do about that. This is annoying yet harmless.)

The intend of doing these changes while norebalance is set is

- to avoid unnecessary data movement due to successive changes happening step 
by step and
- to make sure peering is successful before starting to move data.

I believe OSDs peer a bit faster with norebalance set and there is then a 
shorter interrupt to ongoing I/O (no I/O happens to a PG during peering).

Yes, if you safe the old crush map, you can undo everything. It is a good idea 
to have a backup also just for reference and to compare before and after.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Kyriazis, George 
mailto:george.kyria...@intel.com>>
Sent: 04 June 2020 00:58:20
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Thanks Frank,

I don’t have too much experience editing crush rules, but I assume the 
chooseleaf step would also have to change to:

   step chooseleaf firstn 0 type chassis

Correct?  Is that the only other change that is needed?  It looks like the rule 
change can happen both inside and outside the “norebalance” setting (again with 
CLI commands), but is it safer to do it inside (ie. while not rebalancing)?

If I keep a backup of the crush rule map (with “ceph osd getcrushmap”), I 
assume I can restore the old map if something goes bad?

Thanks again!

George



On Jun 3, 2020, at 5:24 PM, Frank Schilder mailto:fr...@dtu.dk>> 
wrote:

You can use the command-line without editing the crush map. Look at the 
documentation of commands like

ceph osd crush add-bucket ...
ceph osd crush move ...

Before starting this, set "ceph osd set norebalance" and unset after you are 
happy with the crush tree. Let everything peer. You should see misplaced 
objects and remapped PGs, but no degraded objects or PGs.

Do this only when cluster is helth_ok, otherwise things can get really 
complicated.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Kyriazis, George 
mailto:george.kyria...@intel.com>>
Sent: 03 June 2020 22:45:11
To: ceph-users
Subject: [ceph-users] Best way to change bucket hierarchy

Helo,

I have a live ceph cluster, and I’m in the need of modifying the bucket 
hierarchy.  I am currently using the default crush rule (ie. keep each replica 
on a different host).  My need is to add a “chassis” level, and keep replicas 
on a per-chassis level.

From what I read in the documentation, I would have to edit the crush file 
manually, however this sounds kinda scary for a live cluster.

Are there any “best known methods” to achieve that goal without messing things 
up?

In my current scenario, I have one host per chassis, and planning on later 
adding nodes where there would be >1 hosts per chassis. It looks like “in 
theory” there wouldn’t be a need for any data movement after the crush map 
changes.  Will reality match theory?  Anything else I need to watch out for?

Thank you!

George

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to 
ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Kyriazis, George
Yes, that makes total sense.

Thanks,

George


> On Jun 4, 2020, at 2:17 AM, Frank Schilder  wrote:
> 
>> Yes and No. This will cause many CRUSHMap updates where a manual update
>> is only a single change.
>> 
>> I would do:
>> 
>> $ ceph osd getcrushmap -o crushmap
> 
> Well, that's a yes and a no as well.
> 
> If you are experienced and edit crush maps on a regular basis, you can go 
> that way. I would still enclose the change in a norebalance setting. If you 
> are not experienced, you are likely to shoot your cluster. In particular, 
> adding and moving buckets is not fun this way. You need to be careful what 
> IDs you assign, and there are many options to choose from with documentation 
> targeted at experienced cephers.
> 
> CLI commands will prevent a lot of stupid typos, errors and forgotten 
> mandatory lines. I learned that the hard way and decided to use a direct edit 
> only when absolutely necessary. A couple of extra peerings is a low-cost 
> operation compared with trying to find a stupid typo that just killed all 
> pools when angry users stand next to you.
> 
> My recommendation would be to save the original crush map, apply commands and 
> look at changes these commands do. That's a great way to learn how to do it 
> right. And in general, better be safe than sorry.
> 
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> 
> 
> From: Wido den Hollander 
> Sent: 04 June 2020 08:50:16
> To: Frank Schilder; Kyriazis, George; ceph-users
> Subject: Re: [ceph-users] Re: Best way to change bucket hierarchy
> 
> On 6/4/20 12:24 AM, Frank Schilder wrote:
>> You can use the command-line without editing the crush map. Look at the 
>> documentation of commands like
>> 
>> ceph osd crush add-bucket ...
>> ceph osd crush move ...
>> 
>> Before starting this, set "ceph osd set norebalance" and unset after you are 
>> happy with the crush tree. Let everything peer. You should see misplaced 
>> objects and remapped PGs, but no degraded objects or PGs.
>> 
>> Do this only when cluster is helth_ok, otherwise things can get really 
>> complicated.
>> 
> 
> Yes and No. This will cause many CRUSHMap updates where a manual update
> is only a single change.
> 
> I would do:
> 
> $ ceph osd getcrushmap -o crushmap
> $ cp crushmap crushmap.backup
> $ crushtool -d crushmap -o crushmap.txt
> $ vi crushmap.txt (now make your changes)
> $ crushtool -c crushmap.txt -o crushmap.new
> $ crushtool -i crushmap.new --tree (check if all OK)
> $ crushtool -i crushmap.new --test --rule 0 --num-rep 3 --show-mappings
> 
> If all is good:
> 
> $ ceph osd setcrushmap -i crushmap.new
> 
> If all goes bad, simply revert to your old crushmap.
> 
> Wido
> 
>> Best regards,
>> =
>> Frank Schilder
>> AIT Risø Campus
>> Bygning 109, rum S14
>> 
>> 
>> From: Kyriazis, George 
>> Sent: 03 June 2020 22:45:11
>> To: ceph-users
>> Subject: [ceph-users] Best way to change bucket hierarchy
>> 
>> Helo,
>> 
>> I have a live ceph cluster, and I’m in the need of modifying the bucket 
>> hierarchy.  I am currently using the default crush rule (ie. keep each 
>> replica on a different host).  My need is to add a “chassis” level, and keep 
>> replicas on a per-chassis level.
>> 
>> From what I read in the documentation, I would have to edit the crush file 
>> manually, however this sounds kinda scary for a live cluster.
>> 
>> Are there any “best known methods” to achieve that goal without messing 
>> things up?
>> 
>> In my current scenario, I have one host per chassis, and planning on later 
>> adding nodes where there would be >1 hosts per chassis. It looks like “in 
>> theory” there wouldn’t be a need for any data movement after the crush map 
>> changes.  Will reality match theory?  Anything else I need to watch out for?
>> 
>> Thank you!
>> 
>> George
>> 
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] diskprediction_local fails with python3-sklearn 0.22.2

2020-06-04 Thread Eric Dold
Hello

the mgr module diskprediction_local fails under ubuntu 20.04 focal with
python3-sklearn version 0.22.2
Ceph version is 15.2.3

when the module is enabled i get the following error:

  File "/usr/share/ceph/mgr/diskprediction_local/module.py", line 112, in
serve
self.predict_all_devices()
  File "/usr/share/ceph/mgr/diskprediction_local/module.py", line 279, in
predict_all_devices
result = self._predict_life_expentancy(devInfo['devid'])
  File "/usr/share/ceph/mgr/diskprediction_local/module.py", line 222, in
_predict_life_expentancy
predicted_result = obj_predictor.predict(predict_datas)
  File "/usr/share/ceph/mgr/diskprediction_local/predictor.py", line 457,
in predict
pred = clf.predict(ordered_data)
  File "/usr/lib/python3/dist-packages/sklearn/svm/_base.py", line 585, in
predict
if self.break_ties and self.decision_function_shape == 'ovo':
AttributeError: 'SVC' object has no attribute 'break_ties'

Best Regards
Eric
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] bad balacing (octopus)

2020-06-04 Thread Ml Ml
Hello,
any idea why it´s so bad balanced?

e.g.: osd.52 (82%) vs osd.34 (29%)

I did run "/usr/bin/ceph osd reweight-by-utilization " by cron for
some time, since i was low on space for some time and that helped a
bit.
What should i do next?

Here is some info:

root@ceph01:~# ceph -s
  cluster:
id: 5436dd5d-83d4-4dc8-a93b-60ab5db145df
health: HEALTH_OK

  services:
mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 4h)
mgr: ceph01(active, since 4h), standbys: ceph03, ceph02
osd: 55 osds: 55 up (since 4h), 54 in (since 4h); 10 remapped pgs

  data:
pools:   3 pools, 2049 pgs
objects: 8.17M objects, 29 TiB
usage:   89 TiB used, 45 TiB / 133 TiB avail
pgs: 27740/24497208 objects misplaced (0.113%)
 2034 active+clean
 9active+remapped+backfilling
 5active+clean+scrubbing+deep
 1active+remapped+backfill_wait

  io:
recovery: 172 MiB/s, 45 objects/s

root@ceph01:~# ceph balancer status
{
"active": true,
"last_optimize_duration": "0:00:00.017034",
"last_optimize_started": "Thu Jun  4 15:26:28 2020",
"mode": "upmap",
"optimize_result": "Unable to find further optimization, or
pool(s) pg_num is decreasing, or distribution is already perfect",
"plans": []
}
root@ceph01:~# ceph osd df tree
ID   CLASS  WEIGHT REWEIGHT  SIZE RAW USE  DATA OMAP
META AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
 -1 111.70750 -  133 TiB   89 TiB   88 TiB  4.9 GiB
240 GiB   44 TiB  66.79  1.00-  root default
 -2  25.44400 -   25 TiB   19 TiB   19 TiB  1.0 GiB
49 GiB  6.1 TiB  75.99  1.14-  host ceph01
  0hdd2.3   1.0  2.7 TiB  1.9 TiB  1.8 TiB  107 MiB
5.2 GiB  835 GiB  69.45  1.04  130  up  osd.0
  1hdd2.2   1.0  2.7 TiB  1.8 TiB  1.8 TiB   79 MiB
5.4 GiB  920 GiB  66.36  0.99  124  up  osd.1
  4hdd2.67029   1.0  2.7 TiB  2.1 TiB  2.1 TiB  127 MiB
5.6 GiB  620 GiB  77.31  1.16  144  up  osd.4
  8hdd2.3   1.0  2.7 TiB  1.9 TiB  1.9 TiB  111 MiB
5.3 GiB  755 GiB  72.38  1.08  135  up  osd.8
 11hdd1.71660   1.0  1.7 TiB  1.3 TiB  1.3 TiB   56 MiB
2.6 GiB  418 GiB  76.22  1.14   92  up  osd.11
 12hdd2.67029   1.0  2.7 TiB  2.1 TiB  2.1 TiB  122 MiB
5.3 GiB  625 GiB  77.14  1.16  144  up  osd.12
 14hdd2.67029   1.0  2.7 TiB  2.0 TiB  2.0 TiB  114 MiB
5.6 GiB  642 GiB  76.53  1.15  144  up  osd.14
 18hdd2.7   1.0  2.7 TiB  2.3 TiB  2.3 TiB  143 MiB
6.0 GiB  411 GiB  84.95  1.27  158  up  osd.18
 22hdd1.71660   1.0  1.7 TiB  1.3 TiB  1.3 TiB   59 MiB
2.4 GiB  388 GiB  77.94  1.17   94  up  osd.22
 30hdd1.7   0.95001  1.7 TiB  1.4 TiB  1.4 TiB   71 MiB
2.9 GiB  324 GiB  81.58  1.22   96  up  osd.30
 32hdd0.5 0  0 B  0 B  0 B  0 B
  0 B  0 B  0 00  up  osd.32
 33hdd1.7   0.95001  1.6 TiB  1.3 TiB  1.3 TiB   54 MiB
2.4 GiB  323 GiB  80.55  1.21   91  up  osd.33
 -3  24.52742 -   26 TiB   19 TiB   19 TiB  1.1 GiB
54 GiB  6.8 TiB  74.10  1.11-  host ceph02
  2hdd1.0   1.0  1.7 TiB  827 GiB  825 GiB   29 MiB
2.2 GiB  931 GiB  47.05  0.70   56  up  osd.2
  3hdd2.8   0.95001  2.7 TiB  2.2 TiB  2.2 TiB  123 MiB
6.1 GiB  454 GiB  83.38  1.25  154  up  osd.3
  7hdd2.67029   1.0  2.7 TiB  2.1 TiB  2.1 TiB  152 MiB
5.8 GiB  540 GiB  80.25  1.20  150  up  osd.7
  9hdd2.67029   1.0  2.7 TiB  2.1 TiB  2.1 TiB  172 MiB
6.0 GiB  553 GiB  79.79  1.19  149  up  osd.9
 13hdd1.7   1.0  2.4 TiB  1.4 TiB  1.4 TiB   70 MiB
4.6 GiB  979 GiB  59.91  0.90  100  up  osd.13
 16hdd2.8   0.95001  2.7 TiB  2.2 TiB  2.2 TiB  140 MiB
6.0 GiB  479 GiB  82.46  1.23  153  up  osd.16
 19hdd1.3   1.0  1.7 TiB  1.1 TiB  1.1 TiB   41 MiB
2.8 GiB  616 GiB  64.96  0.97   78  up  osd.19
 23hdd2.0   1.0  2.7 TiB  1.6 TiB  1.6 TiB   96 MiB
5.3 GiB  1.1 TiB  59.58  0.89  111  up  osd.23
 24hdd1.71660   1.0  1.7 TiB  1.4 TiB  1.4 TiB   52 MiB
3.3 GiB  334 GiB  81.02  1.21   97  up  osd.24
 28hdd2.7   1.0  2.7 TiB  2.2 TiB  2.2 TiB  143 MiB
6.2 GiB  440 GiB  83.90  1.26  155  up  osd.28
 31hdd2.67029   1.0  2.7 TiB  2.2 TiB  2.2 TiB  116 MiB
6.0 GiB  523 GiB  80.87  1.21  149  up  osd.31
 -4  20.40346 -   25 TiB   16 TiB   16 TiB  932 MiB
46 GiB  8.2 TiB  66.48  1.00-  host ceph03
  5hdd1.71660   1.0  1.7 TiB  1.4 TiB  1.4 TiB   61 MiB
2.8 GiB  363 GiB  79.37  1.19   95  up  osd.5
  6h

[ceph-users] log_channel(cluster) log [ERR] : Error -2 reading object

2020-06-04 Thread Frank Schilder
Hi all,

I found these messages today:

2020-06-04 17:07:57.471 7fa0aa16e700 -1 log_channel(cluster) log [ERR] : Error 
-2 reading object 14:e4c5ebb6:::1000203c59b.0002:head
2020-06-04 17:08:04.236 7fa0aa16e700 -1 log_channel(cluster) log [ERR] : Error 
-2 reading object 14:e4c9a1a1:::1000203ad7f.:head

in one of our OSD logs. The disk is healthy according to smartctl. Should I 
worry about that?

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Kyriazis, George
Hmm,

So I tried all that, and I got almost all of my PGs being remapped.  Crush map 
looks correct.  Is that normal?

Thanks,

George


On Jun 4, 2020, at 2:33 PM, Frank Schilder mailto:fr...@dtu.dk>> 
wrote:

Hi George,

you don't need to worry about that too much. The EC profile contains two types 
of information, one part about the actual EC encoding and another part about 
crush parameters. Unfortunately, actually. Part of this information is mutable 
after pool creation while the rest is not. Mutable here means outside of the 
profile. You can change the failure domain in the crush map without issues, but 
the profile won't reflect that change. That's an inconsistency we currently 
have to live with and it would have been better to separate mutable data (like 
failure domain) from immutable data (like k and m) or provide a meaningful 
interface to maintain consistency of mutable information.

In short, don't believe everything the EC profile tells you. Some information 
might be out of date, like the failure domain or the device class (basically 
everything starting with crush-). If you remember that, you are out of trouble. 
Always dump the crush rule of an EC pool explicitly to see the true parameters 
in action.

Having said that, to change the failure domain for an EC pool, change the crush 
rule for the EC profile - I did this too and it works just fine. The crush rule 
has by default the same name as the pool. I'm afraid, here you will have to do 
a manual edit of the crush rule as Wido explained. There is no other way - at 
least currently not.

You can ask in this list for confirmation that your change is doing what you 
want.

Do not try to touch an EC profile, they are read-only any ways. The crush 
parameters are only used at pool creation and never looked at again. You can 
override these by editing the crush rule as explained above.

Best regards and good luck,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Kyriazis, George 
mailto:george.kyria...@intel.com>>
Sent: 04 June 2020 20:56:38
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Thanks Frank,

Interesting info about the EC profile.  I do have an EC pool, but I noticed the 
following when I dumped the profile:

# ceph osd erasure-code-profile get ec22
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=2
m=2
plugin=jerasure
technique=reed_sol_van
w=8
#

Which says that the failure domain of the EC profile is also set to host.  
Looks like I need to change the EC profile, too, but since it associated with 
the pool, maybe I can’t do that after pool creation?  Or…. Since it the 
property is named “crush-failure-domain”, it’s automatically inherited from the 
crush profile, so I don’t have to do anything?

Thanks,

George


On Jun 4, 2020, at 1:51 AM, Frank Schilder 
mailto:fr...@dtu.dk>> wrote:

Hi George,

for replicated rules you can simply create a new crush rule with the new 
failure domain set to chassis and change any pool's crush rule to this new one. 
If you have EC pools, then the chooseleaf needs to be edited by hand. I did 
this before as well. (A really unfortunate side effect is, that the EC profile 
attached to the pool goes out of sync with the crush map and there is nothing 
one can do about that. This is annoying yet harmless.)

The intend of doing these changes while norebalance is set is

- to avoid unnecessary data movement due to successive changes happening step 
by step and
- to make sure peering is successful before starting to move data.

I believe OSDs peer a bit faster with norebalance set and there is then a 
shorter interrupt to ongoing I/O (no I/O happens to a PG during peering).

Yes, if you safe the old crush map, you can undo everything. It is a good idea 
to have a backup also just for reference and to compare before and after.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Kyriazis, George 
mailto:george.kyria...@intel.com>>
Sent: 04 June 2020 00:58:20
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Thanks Frank,

I don’t have too much experience editing crush rules, but I assume the 
chooseleaf step would also have to change to:

  step chooseleaf firstn 0 type chassis

Correct?  Is that the only other change that is needed?  It looks like the rule 
change can happen both inside and outside the “norebalance” setting (again with 
CLI commands), but is it safer to do it inside (ie. while not rebalancing)?

If I keep a backup of the crush rule map (with “ceph osd getcrushmap”), I 
assume I can restore the old map if something goes bad?

Thanks again!

George



On Jun 3, 2020, at 5:24 PM, Frank Schilder 
mailto:fr...@dtu.dk>> wrote:

You can use th

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Frank Schilder
Hi George,

you don't need to worry about that too much. The EC profile contains two types 
of information, one part about the actual EC encoding and another part about 
crush parameters. Unfortunately, actually. Part of this information is mutable 
after pool creation while the rest is not. Mutable here means outside of the 
profile. You can change the failure domain in the crush map without issues, but 
the profile won't reflect that change. That's an inconsistency we currently 
have to live with and it would have been better to separate mutable data (like 
failure domain) from immutable data (like k and m) or provide a meaningful 
interface to maintain consistency of mutable information.

In short, don't believe everything the EC profile tells you. Some information 
might be out of date, like the failure domain or the device class (basically 
everything starting with crush-). If you remember that, you are out of trouble. 
Always dump the crush rule of an EC pool explicitly to see the true parameters 
in action.

Having said that, to change the failure domain for an EC pool, change the crush 
rule for the EC profile - I did this too and it works just fine. The crush rule 
has by default the same name as the pool. I'm afraid, here you will have to do 
a manual edit of the crush rule as Wido explained. There is no other way - at 
least currently not.

You can ask in this list for confirmation that your change is doing what you 
want.

Do not try to touch an EC profile, they are read-only any ways. The crush 
parameters are only used at pool creation and never looked at again. You can 
override these by editing the crush rule as explained above.

Best regards and good luck,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Kyriazis, George 
Sent: 04 June 2020 20:56:38
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Thanks Frank,

Interesting info about the EC profile.  I do have an EC pool, but I noticed the 
following when I dumped the profile:

# ceph osd erasure-code-profile get ec22
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=2
m=2
plugin=jerasure
technique=reed_sol_van
w=8
#

Which says that the failure domain of the EC profile is also set to host.  
Looks like I need to change the EC profile, too, but since it associated with 
the pool, maybe I can’t do that after pool creation?  Or…. Since it the 
property is named “crush-failure-domain”, it’s automatically inherited from the 
crush profile, so I don’t have to do anything?

Thanks,

George


On Jun 4, 2020, at 1:51 AM, Frank Schilder mailto:fr...@dtu.dk>> 
wrote:

Hi George,

for replicated rules you can simply create a new crush rule with the new 
failure domain set to chassis and change any pool's crush rule to this new one. 
If you have EC pools, then the chooseleaf needs to be edited by hand. I did 
this before as well. (A really unfortunate side effect is, that the EC profile 
attached to the pool goes out of sync with the crush map and there is nothing 
one can do about that. This is annoying yet harmless.)

The intend of doing these changes while norebalance is set is

- to avoid unnecessary data movement due to successive changes happening step 
by step and
- to make sure peering is successful before starting to move data.

I believe OSDs peer a bit faster with norebalance set and there is then a 
shorter interrupt to ongoing I/O (no I/O happens to a PG during peering).

Yes, if you safe the old crush map, you can undo everything. It is a good idea 
to have a backup also just for reference and to compare before and after.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Kyriazis, George 
mailto:george.kyria...@intel.com>>
Sent: 04 June 2020 00:58:20
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Thanks Frank,

I don’t have too much experience editing crush rules, but I assume the 
chooseleaf step would also have to change to:

   step chooseleaf firstn 0 type chassis

Correct?  Is that the only other change that is needed?  It looks like the rule 
change can happen both inside and outside the “norebalance” setting (again with 
CLI commands), but is it safer to do it inside (ie. while not rebalancing)?

If I keep a backup of the crush rule map (with “ceph osd getcrushmap”), I 
assume I can restore the old map if something goes bad?

Thanks again!

George



On Jun 3, 2020, at 5:24 PM, Frank Schilder mailto:fr...@dtu.dk>> 
wrote:

You can use the command-line without editing the crush map. Look at the 
documentation of commands like

ceph osd crush add-bucket ...
ceph osd crush move ...

Before starting this, set "ceph osd set norebalance" and unset after you are 
happy with the crush tree. Let everything peer. You should see misplaced 
objects and remapped PG

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Kyriazis, George
Understand that it’s difficult to debug remotely. :-)

In my current scenario I have 5 machines (1 host per chassis), but planning on 
adding some additional chassis with 4 hosts per chassis in the near future.  
Currently I am going through the first stage of adding “stub” chassis for the 5 
hosts/chassis that I have, basically reparenting each host to its own chassis, 
as shown below:

ID  CLASS WEIGHTTYPE NAMESTATUS REWEIGHT PRI-AFF
 -1   203.72598 root default
 -540.01700 chassis chassis-hsw1
 -940.01700 host vis-hsw-01
  3   hdd  10.91299 osd.3up  1.0 1.0
  6   hdd  14.55199 osd.6up  1.0 1.0
 10   hdd  14.55199 osd.10   up  1.0 1.0
 -640.01700 chassis chassis-hsw2
-1340.01700 host vis-hsw-02
  0   hdd  10.91299 osd.0up  1.0 1.0
  7   hdd  14.55199 osd.7up  1.0 1.0
 11   hdd  14.55199 osd.11   up  1.0 1.0
 -740.01700 chassis chassis-hsw3
-1140.01700 host vis-hsw-03
  4   hdd  10.91299 osd.4up  1.0 1.0
  8   hdd  14.55199 osd.8up  1.0 1.0
 12   hdd  14.55199 osd.12   up  1.0 1.0
 -840.01700 chassis chassis-hsw4
 -340.01700 host vis-hsw-04
  5   hdd  10.91299 osd.5up  1.0 1.0
  9   hdd  14.55199 osd.9up  1.0 1.0
 13   hdd  14.55199 osd.13   up  1.0 1.0
-1743.65799 chassis chassis-hsw5
-1543.65799 host vis-hsw-05
  1   hdd  14.55299 osd.1up  1.0 1.0
  2   hdd  14.55299 osd.2up  1.0 1.0
 14   hdd  14.55299 osd.14   up  1.0 1.0

There is no additional constraint that is being added, so ideally there would 
be no data movement.  However, I can imagine that the CRUSH algorithm could 
hash the PGs into different OSDs now because there is a new thing to consider 
(namely the chassis).  Does it do that?

Thanks,

George


On Jun 4, 2020, at 6:22 PM, Frank Schilder mailto:fr...@dtu.dk>> 
wrote:

Its hard to tell without knowing what the diff is, but from your description I 
take it that you changed the failure domain for every(?) pool from host to 
chassis. I don't know what a chassis is in your architecture, but if each 
chassis contains several host buckets, then yes, I would expect almost every PG 
to be affected.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Kyriazis, George 
mailto:george.kyria...@intel.com>>
Sent: 05 June 2020 00:28:43
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Hmm,

So I tried all that, and I got almost all of my PGs being remapped.  Crush map 
looks correct.  Is that normal?

Thanks,

George


On Jun 4, 2020, at 2:33 PM, Frank Schilder 
mailto:fr...@dtu.dk>> wrote:

Hi George,

you don't need to worry about that too much. The EC profile contains two types 
of information, one part about the actual EC encoding and another part about 
crush parameters. Unfortunately, actually. Part of this information is mutable 
after pool creation while the rest is not. Mutable here means outside of the 
profile. You can change the failure domain in the crush map without issues, but 
the profile won't reflect that change. That's an inconsistency we currently 
have to live with and it would have been better to separate mutable data (like 
failure domain) from immutable data (like k and m) or provide a meaningful 
interface to maintain consistency of mutable information.

In short, don't believe everything the EC profile tells you. Some information 
might be out of date, like the failure domain or the device class (basically 
everything starting with crush-). If you remember that, you are out of trouble. 
Always dump the crush rule of an EC pool explicitly to see the true parameters 
in action.

Having said that, to change the failure domain for an EC pool, change the crush 
rule for the EC profile - I did this too and it works just fine. The crush rule 
has by default the same name as the pool. I'm afraid, here you will have to do 
a manual edit of the crush rule as Wido explained. There is no other way - at 
least currently not.

You can ask in this list for confirmation that your change is doing what you 
want.

Do not try to touch an EC profile, they are read-only any ways. The crush 
parameters are only used at pool creation and never looked at again. You can 
override these by editing the crush rule as explained above.

Best regards and good luck,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

__

[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-04 Thread Zhenshi Zhou
Thank you for the clarification. That's very clear.

Jason Dillaman  于2020年6月5日周五 上午12:46写道:

> On Thu, Jun 4, 2020 at 3:43 AM Zhenshi Zhou  wrote:
> >
> > My condition is that the primary image being used while rbd-mirror sync.
> > I want to get the period between the two times of rbd-mirror transfer the
> > increased data.
> > I will search those options you provided, thanks a lot :)
>
> When using the original (pre-Octopus) journal-based mirroring, once
> the initial sync completes to transfer the bulk of the image data from
> a point-in-time dynamic snapshot, any changes post sync will be
> replayed continuously from the stream of events written to the journal
> on the primary image. The "rbd mirror image status" against the
> non-primary image will provide more details about the current state of
> the journal replay.
>
> With the Octopus release, we now also support snapshot-based mirroring
> where we transfer any image deltas between two mirroring snapshots.
> These mirroring snapshots are different from user-created snapshots
> and their life-time is managed by RBD mirroring (i.e. they are
> automatically pruned when no longer needed). This version of mirroring
> probably more closely relates to your line of questioning since the
> period of replication is at whatever period you create new mirroring
> snapshots (provided your two clusters can keep up).
>
> >
> > Eugen Block  于2020年6月4日周四 下午3:28写道:
> >
> > > The initial sync is a full image sync, the rest is based on the object
> > > sets created. There are several options to control the mirroring, for
> > > example:
> > >
> > > rbd_journal_max_concurrent_object_sets
> > > rbd_mirror_concurrent_image_syncs
> > > rbd_mirror_leader_max_missed_heartbeats
> > >
> > > and many more. I'm not sure I fully understand what you're asking,
> > > maybe you could rephrase your question?
> > >
> > >
> > > Zitat von Zhenshi Zhou :
> > >
> > > > Hi Eugen,
> > > >
> > > > Thanks for the reply. If rbd-mirror constantly synchronize changes,
> > > > what frequency to replay once? I don't find any options I can config.
> > > >
> > > > Eugen Block  于2020年6月4日周四 下午2:54写道:
> > > >
> > > >> Hi,
> > > >>
> > > >> that's the point of rbd-mirror, to constantly replay changes from
> the
> > > >> primary image to the remote image (if the rbd journal feature is
> > > >> enabled).
> > > >>
> > > >>
> > > >> Zitat von Zhenshi Zhou :
> > > >>
> > > >> > Hi all,
> > > >> >
> > > >> > I'm gonna deploy a rbd-mirror in order to sync image from
> clusterA to
> > > >> > clusterB.
> > > >> > The image will be used while syncing. I'm not sure if the
> rbd-mirror
> > > will
> > > >> > sync image
> > > >> > continuously or not. If not, I will inform clients not to write
> data
> > > in
> > > >> it.
> > > >> >
> > > >> > Thanks. Regards
> > > >> > ___
> > > >> > ceph-users mailing list -- ceph-users@ceph.io
> > > >> > To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >>
> > > >>
> > > >> ___
> > > >> ceph-users mailing list -- ceph-users@ceph.io
> > > >> To unsubscribe send an email to ceph-users-le...@ceph.io
> > > >>
> > >
> > >
> > >
> > >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> --
> Jason
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Frank Schilder
Its hard to tell without knowing what the diff is, but from your description I 
take it that you changed the failure domain for every(?) pool from host to 
chassis. I don't know what a chassis is in your architecture, but if each 
chassis contains several host buckets, then yes, I would expect almost every PG 
to be affected.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Kyriazis, George 
Sent: 05 June 2020 00:28:43
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Hmm,

So I tried all that, and I got almost all of my PGs being remapped.  Crush map 
looks correct.  Is that normal?

Thanks,

George


On Jun 4, 2020, at 2:33 PM, Frank Schilder mailto:fr...@dtu.dk>> 
wrote:

Hi George,

you don't need to worry about that too much. The EC profile contains two types 
of information, one part about the actual EC encoding and another part about 
crush parameters. Unfortunately, actually. Part of this information is mutable 
after pool creation while the rest is not. Mutable here means outside of the 
profile. You can change the failure domain in the crush map without issues, but 
the profile won't reflect that change. That's an inconsistency we currently 
have to live with and it would have been better to separate mutable data (like 
failure domain) from immutable data (like k and m) or provide a meaningful 
interface to maintain consistency of mutable information.

In short, don't believe everything the EC profile tells you. Some information 
might be out of date, like the failure domain or the device class (basically 
everything starting with crush-). If you remember that, you are out of trouble. 
Always dump the crush rule of an EC pool explicitly to see the true parameters 
in action.

Having said that, to change the failure domain for an EC pool, change the crush 
rule for the EC profile - I did this too and it works just fine. The crush rule 
has by default the same name as the pool. I'm afraid, here you will have to do 
a manual edit of the crush rule as Wido explained. There is no other way - at 
least currently not.

You can ask in this list for confirmation that your change is doing what you 
want.

Do not try to touch an EC profile, they are read-only any ways. The crush 
parameters are only used at pool creation and never looked at again. You can 
override these by editing the crush rule as explained above.

Best regards and good luck,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Kyriazis, George 
mailto:george.kyria...@intel.com>>
Sent: 04 June 2020 20:56:38
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Thanks Frank,

Interesting info about the EC profile.  I do have an EC pool, but I noticed the 
following when I dumped the profile:

# ceph osd erasure-code-profile get ec22
crush-device-class=hdd
crush-failure-domain=host
crush-root=default
jerasure-per-chunk-alignment=false
k=2
m=2
plugin=jerasure
technique=reed_sol_van
w=8
#

Which says that the failure domain of the EC profile is also set to host.  
Looks like I need to change the EC profile, too, but since it associated with 
the pool, maybe I can’t do that after pool creation?  Or…. Since it the 
property is named “crush-failure-domain”, it’s automatically inherited from the 
crush profile, so I don’t have to do anything?

Thanks,

George


On Jun 4, 2020, at 1:51 AM, Frank Schilder 
mailto:fr...@dtu.dk>> wrote:

Hi George,

for replicated rules you can simply create a new crush rule with the new 
failure domain set to chassis and change any pool's crush rule to this new one. 
If you have EC pools, then the chooseleaf needs to be edited by hand. I did 
this before as well. (A really unfortunate side effect is, that the EC profile 
attached to the pool goes out of sync with the crush map and there is nothing 
one can do about that. This is annoying yet harmless.)

The intend of doing these changes while norebalance is set is

- to avoid unnecessary data movement due to successive changes happening step 
by step and
- to make sure peering is successful before starting to move data.

I believe OSDs peer a bit faster with norebalance set and there is then a 
shorter interrupt to ongoing I/O (no I/O happens to a PG during peering).

Yes, if you safe the old crush map, you can undo everything. It is a good idea 
to have a backup also just for reference and to compare before and after.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Kyriazis, George 
mailto:george.kyria...@intel.com>>
Sent: 04 June 2020 00:58:20
To: Frank Schilder
Cc: ceph-users
Subject: Re: Best way to change bucket hierarchy

Thanks Frank,

I don’t have too much experience editing crush rules, but I assume the 
choose

[ceph-users] Re: changing acces vlan for all the OSDs - potential downtime ?

2020-06-04 Thread Konstantin Shalygin

On 6/4/20 4:26 PM, Adrian Nicolae wrote:

Hi all,

I have a Ceph cluster with a standard setup :

- the public network : MONs and OSDs conected in the same agg switch 
with ports in the same access vlan


- private network :  OSDs connected in another switch with a second 
eth connected in another access vlan


I need to change the public vlan on the first switch and the private 
vlan and the second switch.


Although it should be a trivial operation (just change the vlan range 
ports in a single command), it means that all the OSDs and MONs will 
not be able to communicate with each other for a few seconds . (first 
on the public network, then on the private network).  Do you know if 
this very short period of downtime will mess up the cluster somehow ? 
Is there a best practice on how to do this safely ? 


`ceph osd set noout` is enough for this, I think.



k
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io