[ceph-users] Re: Worst thing that can happen if I have size= 2
> - three servers as reccomended by proxmox (with 10gb ethernet and so on) > - size=3 and min_size=2 reccomended by Ceph You forgot the ceph recommendation* to provide sufficient fail-over capacity in case a failure domain or disk fails. The recommendation would be to have 4 hosts with 25% capacity left free for fail-over and another 10% for handling imbalance. With very few disks I would increase the buffer for imbalance. * Its actually not a recommendation, its a requirement for non-experimental clusters. Everything else has been answered already in great detail. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Mario Giammarco Sent: 05 February 2021 21:10:33 To: Eneko Lacunza Cc: Ceph Users Subject: [ceph-users] Re: Worst thing that can happen if I have size= 2 Il giorno gio 4 feb 2021 alle ore 12:19 Eneko Lacunza ha scritto: > Hi all, > > El 4/2/21 a las 11:56, Frank Schilder escribió: > >> - three servers > >> - three monitors > >> - 6 osd (two per server) > >> - size=3 and min_size=2 > > This is a set-up that I would not run at all. The first one is, that > ceph lives on the law of large numbers and 6 is a small number. Hence, your > OSD fill-up due to uneven distribution. > > > > What comes to my mind is a hyper-converged server with 6+ disks in a > RAID10 array, possibly with a good controller with battery-powered or other > non-volatile cache. Ceph will never beat that performance. Put in some > extra disks as hot-spare and you have close to self-healing storage. > > > > Such a small ceph cluster will inherit all the baddies of ceph > (performance, maintenance) without giving any of the goodies (scale-out, > self-healing, proper distributed raid protection). Ceph needs size to > become well-performing and pay off the maintenance and architectural effort. > > > > It's funny that we have multiple clusters similar to this, and we and > our customers couldn't be happier. Just use a HCI solution (like for > example Proxmox VE, but there are others) to manage everything. > > > Maybe the weakest thing in that configuration is having 2 OSDs per node; > osd nearfull must be tuned accordingly so that no OSD goes beyond about > 0.45, so that in case of failure of one disk, the other OSD in the node > has enough space for healing replication. > > I reply to both: infact I am using Proxmox VE and I am following all guidelines for ha hyperconverged server: - three servers as reccomended by proxmox (with 10gb ethernet and so on) - size=3 and min_size=2 reccomended by Ceph It is not that a morning I wake up and put some random hardware together, I followed guidelines. The result should be: - if a disk (or more) brokes work goes on - if a server brokes the VMs on the server start on another server and work goes on. The result is: one disk brokes, ceph fills the other one in the same server , reaches 90% and EVERYTHING stops including all VMs and the customer has lost unsaved data and it cannot run the VMs it needs to continue works. Not very "HA" as hoped. Size=3 means 3xhdd cost. Now I must double it again 6x. Customer will not buy other disks. So I ask (again): apart the known fact that with size=2 I risk that a second disk brokes before ceph has filled again the second copy of data are there other risks?? I repeat: I know perfectly size=3 is "better" I followed guidelines but what can happen with size=2 and min_size=1? The only thing I can imagine is that if I power down switches I get a split brain but in this case monitor quorum is not reached and so ceph should stop writing and so I do not risk inconsistent data. Are there other things to consider? Thanks, Mario ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Worst thing that can happen if I have size= 2
> How do you achieve that? 2 hours? That's a long story. Short one is, by taking a wrong path for trouble shooting. I should have stayed with my check-list instead. This is the whole point of the redundancy remark I made, that 1 admin mistake doesn't hurt and you are less likely to panic if one happens. For a too long time on this day, I thought I had lost the whole cluster. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Konstantin Shalygin Sent: 06 February 2021 12:04:12 To: Frank Schilder Cc: Alexander E. Patrakov; Mario Giammarco; ceph-users Subject: Re: [ceph-users] Re: Worst thing that can happen if I have size= 2 How do you achieve that? 2 hours? Install new drive for db is 10min of DC engineer hand work (if drive is HHHL and need power off server). Then after server is boots your mon already up. After you provision new drive, make fstab, is stop, rm old monstore, mount new, mon mkfs, start. Even if this is not covered by script is max 5 minutes to reach quorum. Thanks, k Sent from my iPhone > On 5 Feb 2021, at 12:03, Frank Schilder wrote: > > I learned this the hard way when upgrading our MON data disks. We have 3 MONs > and I needed to migrate each MON store to new storage. Of course I managed to > install the new disks in one and wipe the MON store on another MON. 2 hours > downtime. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] ceph-volume bluestore _read_fsid unparsable uuid
Hi Dave and everyone else affected, I'm responding to a thread you opened on an issue with lvm OSD creation: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YYH3VANVV22WGM3CNL4TN4TTL63FCEVD/ https://tracker.ceph.com/issues/43868 Most important question: is there a workaround? My observations: I'm running into the exact same issue on mimic 13.2.10. The strange thing is, that some OSDs get created and others fail. I can't see a pattern here. I have 1 host where every create worked out and another where half failed. The important lines in the log are probably: stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 bluestore(/var/lib/ceph/osd/ceph-342/) _read_fsid unparsable uuid stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 bdev(0x561db199c700 /var/lib/ceph/osd/ceph-342//block) _aio_start io_setup(2) failed with EAGAIN; try increasing /proc/sys/fs/aio-max-nr stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 bluestore(/var/lib/ceph/osd/ceph-342/) mkfs failed, (11) Resource temporarily unavailable stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (11) Resource temporarily unavailable stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 [0;31m ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-342/: (11) Resource temporarily unavailable[0m I really need to get a decent number of disks up very soon. Any help is appreciated. I can provide more output if that helps. Best regards and good weekend! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph-volume bluestore _read_fsid unparsable uuid
I just noticed one difference between the two servers: "Broken" server: # lvm vgs Failed to set up async io, using sync io. VG#PV #LV #SN Attr VSize VFree [listing follows] "Good" server: # lvm vgs VG#PV #LV #SN Attr VSize VFree [listing follows] Could this play a role here? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 06 February 2021 14:31:38 To: ceph-users@ceph.io Cc: Dave Hall Subject: [ceph-users] ceph-volume bluestore _read_fsid unparsable uuid Hi Dave and everyone else affected, I'm responding to a thread you opened on an issue with lvm OSD creation: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YYH3VANVV22WGM3CNL4TN4TTL63FCEVD/ https://tracker.ceph.com/issues/43868 Most important question: is there a workaround? My observations: I'm running into the exact same issue on mimic 13.2.10. The strange thing is, that some OSDs get created and others fail. I can't see a pattern here. I have 1 host where every create worked out and another where half failed. The important lines in the log are probably: stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 bluestore(/var/lib/ceph/osd/ceph-342/) _read_fsid unparsable uuid stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 bdev(0x561db199c700 /var/lib/ceph/osd/ceph-342//block) _aio_start io_setup(2) failed with EAGAIN; try increasing /proc/sys/fs/aio-max-nr stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 bluestore(/var/lib/ceph/osd/ceph-342/) mkfs failed, (11) Resource temporarily unavailable stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (11) Resource temporarily unavailable stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 [0;31m ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-342/: (11) Resource temporarily unavailable[0m I really need to get a decent number of disks up very soon. Any help is appreciated. I can provide more output if that helps. Best regards and good weekend! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: ceph-volume bluestore _read_fsid unparsable uuid
OK, found it. The second line in the error messages actually gives it away: stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 bdev(0x561db199c700 /var/lib/ceph/osd/ceph-342//block) _aio_start io_setup(2) failed with EAGAIN; try increasing /proc/sys/fs/aio-max-nr On my system, the default is rather small: # sysctl fs.aio-max-nr fs.aio-max-nr = 65536 Seemingly not a problem for ceph-disk OSDs: # sysctl fs.aio-nr fs.aio-nr = 32768 However, LVM OSDs seem to be quite hungry. Increasing the value to sysctl -w fs.aio-max-nr=1048576 solved it for me. Should have read it more carefully. Best regards and a nice weekend. = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 06 February 2021 14:37:15 To: ceph-users@ceph.io Cc: Dave Hall Subject: [ceph-users] Re: ceph-volume bluestore _read_fsid unparsable uuid I just noticed one difference between the two servers: "Broken" server: # lvm vgs Failed to set up async io, using sync io. VG#PV #LV #SN Attr VSize VFree [listing follows] "Good" server: # lvm vgs VG#PV #LV #SN Attr VSize VFree [listing follows] Could this play a role here? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: 06 February 2021 14:31:38 To: ceph-users@ceph.io Cc: Dave Hall Subject: [ceph-users] ceph-volume bluestore _read_fsid unparsable uuid Hi Dave and everyone else affected, I'm responding to a thread you opened on an issue with lvm OSD creation: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/YYH3VANVV22WGM3CNL4TN4TTL63FCEVD/ https://tracker.ceph.com/issues/43868 Most important question: is there a workaround? My observations: I'm running into the exact same issue on mimic 13.2.10. The strange thing is, that some OSDs get created and others fail. I can't see a pattern here. I have 1 host where every create worked out and another where half failed. The important lines in the log are probably: stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 bluestore(/var/lib/ceph/osd/ceph-342/) _read_fsid unparsable uuid stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 bdev(0x561db199c700 /var/lib/ceph/osd/ceph-342//block) _aio_start io_setup(2) failed with EAGAIN; try increasing /proc/sys/fs/aio-max-nr stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 bluestore(/var/lib/ceph/osd/ceph-342/) mkfs failed, (11) Resource temporarily unavailable stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 OSD::mkfs: ObjectStore::mkfs failed with error (11) Resource temporarily unavailable stderr: 2021-02-06 13:48:27.477 7f46756b4b80 -1 [0;31m ** ERROR: error creating empty object store in /var/lib/ceph/osd/ceph-342/: (11) Resource temporarily unavailable[0m I really need to get a decent number of disks up very soon. Any help is appreciated. I can provide more output if that helps. Best regards and good weekend! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: db_devices doesn't show up in exported osd service spec
Add dev to comment. With 15.2.8, when apply OSD service spec, db_devices is gone. Here is the service spec file. == service_type: osd service_id: osd-spec placement: hosts: - ceph-osd-1 spec: objectstore: bluestore data_devices: rotational: 1 db_devices: rotational: 0 == Here is the logging from mon. The message with "Tony" is added by me in mgr to confirm. The audit from mon shows db_devices is gone. Is there anything in mon to filter that out based on host info? How can I trace it? == audit 2021-02-07T00:45:38.106171+ mgr.ceph-control-1.nxjnzz (mgr.24142551) 4020 : audit [DBG] from='client.24184218 -' entity='client.admin' cmd=[{"prefix": "orch apply osd", "target": ["mon-mgr", ""]}]: dispatch cephadm 2021-02-07T00:45:38.108546+ mgr.ceph-control-1.nxjnzz (mgr.24142551) 4021 : cephadm [INF] Marking host: ceph-osd-1 for OSDSpec preview refresh. cephadm 2021-02-07T00:45:38.108798+ mgr.ceph-control-1.nxjnzz (mgr.24142551) 4022 : cephadm [INF] Saving service osd.osd-spec spec with placement ceph-osd-1 cephadm 2021-02-07T00:45:38.108893+ mgr.ceph-control-1.nxjnzz (mgr.24142551) 4023 : cephadm [INF] Tony: spec: placement=PlacementSpec(hosts=[HostPlacementSpec(hostname='ceph-osd-1', network='', name='')]), service_id='osd-spec', service_type='osd', data_devices=DeviceSelection(rotational=1, all=False), db_devices=DeviceSelection(rotational=0, all=False), osd_id_claims={}, unmanaged=False, filter_logic='AND', preview_only=False)> audit 2021-02-07T00:45:38.109782+ mon.ceph-control-3 (mon.2) 25 : audit [INF] from='mgr.24142551 10.6.50.30:0/2838166251' entity='mgr.ceph-control-1.nxjnzz' cmd=[{"prefix":"config-key set","key":"mgr/cephadm/spec.osd.osd-spec","val":"{\"created\": \"2021-02-07T00:45:38.108810\", \"spec\": {\"placement\": {\"hosts\": [\"ceph-osd-1\"]}, \"service_id\": \"osd-spec\", \"service_name\": \"osd.osd-spec\", \"service_type\": \"osd\", \"spec\": {\"data_devices\": {\"rotational\": 1}, \"filter_logic\": \"AND\", \"objectstore\": \"bluestore\"}}}"}]: dispatch audit 2021-02-07T00:45:38.110133+ mon.ceph-control-1 (mon.0) 107 : audit [INF] from='mgr.24142551 ' entity='mgr.ceph-control-1.nxjnzz' cmd=[{"prefix":"config-key set","key":"mgr/cephadm/spec.osd.osd-spec","val":"{\"created\": \"2021-02-07T00:45:38.108810\", \"spec\": {\"placement\": {\"hosts\": [\"ceph-osd-1\"]}, \"service_id\": \"osd-spec\", \"service_name\": \"osd.osd-spec\", \"service_type\": \"osd\", \"spec\": {\"data_devices\": {\"rotational\": 1}, \"filter_logic\": \"AND\", \"objectstore\": \"bluestore\"}}}"}]: dispatch audit 2021-02-07T00:45:38.152756+ mon.ceph-control-1 (mon.0) 108 : audit [INF] from='mgr.24142551 ' entity='mgr.ceph-control-1.nxjnzz' cmd='[{"prefix":"config-key set","key":"mgr/cephadm/spec.osd.osd-spec","val":"{\"created\": \"2021-02-07T00:45:38.108810\", \"spec\": {\"placement\": {\"hosts\": [\"ceph-osd-1\"]}, \"service_id\": \"osd-spec\", \"service_name\": \"osd.osd-spec\", \"service_type\": \"osd\", \"spec\": {\"data_devices\": {\"rotational\": 1}, \"filter_logic\": \"AND\", \"objectstore\": \"bluestore\"}}}"}]': finished == Thanks! Tony > -Original Message- > From: Jens Hyllegaard (Soft Design A/S) > Sent: Thursday, February 4, 2021 6:31 AM > To: ceph-users@ceph.io > Subject: [ceph-users] Re: db_devices doesn't show up in exported osd > service spec > > Hi. > > I have the same situation. Running 15.2.8 I created a specification that > looked just like it. With rotational in the data and non-rotational in > the db. > > First use applied fine. Afterwards it only uses the hdd, and not the ssd. > Also, is there a way to remove an unused osd service. > I manages to create osd.all-available-devices, when I tried to stop the > autocreation of OSD's. Using ceph orch apply osd --all-available-devices > --unmanaged=true > > I created the original OSD using the web interface. > > Regards > > Jens > -Original Message- > From: Eugen Block > Sent: 3. februar 2021 11:40 > To: Tony Liu > Cc: ceph-users@ceph.io > Subject: [ceph-users] Re: db_devices doesn't show up in exported osd > service spec > > How do you manage the db_sizes of your SSDs? Is that managed > automatically by ceph-volume? You could try to add another config and > see what it does, maybe try to add block_db_size? > > > Zitat von Tony Liu : > > > All mon, mgr, crash and osd are upgraded to 15.2.8. It actually fixed > > another issue (no device listed after adding host). > > But this issue remains. > > ``` > > # cat osd-spec.yaml > > service_type: osd > > service_id: osd-spec > > placement: > > host_pattern: ceph-osd-[1-3] > > data_devices: > > rotational: 1 > > db_devices: > > rotational: 0 > > > > # ceph orch apply osd -i osd-spec.yaml Scheduled osd.osd-spec > > update... > > > > # ceph orch