Note that instead of including the step to use the UUID in the osd creation like [1] this, I opted to separate it out in those instructions. That was to simplify the commands and to give people an idea of how to fix their OSDs if they created them using the device name instead of UUID. It would be simpler to just create the OSD using the partuuid instead. Also not mentioned in my previous response is that if you would like your OSDs to be encrypted at rest, you should add --dmcrypt to the ceph-volume command (included in the example below).
[1] # Create the OSD echo "/dev/sdb /dev/nvme0n1p2 /dev/sdc /dev/nvme0n1p3" | while read hdd db; do uuid=$(ls -l /dev/disk/by-partuuid/ | awk '/'${db}'$/ {print $9}') ceph-volume lvm create --bluestore --dmcrypt --data $hdd --block.db /dev/disk/by-partuuid/$uuid done On Fri, May 11, 2018 at 12:46 PM David Turner <drakonst...@gmail.com> wrote: > This thread is off in left field and needs to be brought back to how > things work. > > While multiple OSDs can use the same device for block/wal partitions, they > each need their own partition. osd.0 could use nvme0n1p1, osd.2/nvme0n1p2, > etc. You cannot use the same partition for each osd. Ceph-volume will not > create the db/wal partitions for you, you need to manually create the > partitions to be used by the OSD. There is no need to put a filesystem on > top of the partition for the wal/db. That is wasted overhead that will > slow things down. > > Back to the original email. > > > Or do I need to use osd-db=/dev/nvme0n1p2 for data=/dev/sdb, > > osd-db=/dev/nvme0n1p3 for data=/dev/sdc, and so on? > This is what you need to do, but like said above, you need to create the > partitions for --block-db yourself. You talked about having a 10GB > partition for this, but the general recommendation for block-db partitions > is 10GB per 1TB of OSD. If your OSD is a 4TB disk you should be looking > closer to a 40GB block.db partition. If your block.db partition is too > small, then once it fills up it will spill over onto the data volume and > slow things down. > > > > And just to make sure - if I specify "--osd-db", I don't need > > to set "--osd-wal" as well, since the WAL will end up on the > > DB partition automatically, correct? > This is correct. The wal will automatically be placed on the db if not > otherwise specified. > > > I don't use ceph-deploy, but the process for creating the OSDs should be > something like this. After the OSDs are created it is a good idea to make > sure that the OSD is not looking for the db partition with the > /dev/nvme0n1p2 distinction as that can change on reboots if you have > multiple nvme devices. > > # Make sure the disks are clean and ready to use as an OSD > for hdd in /dev/sd{b..c}; do > ceph-volume lvm zap $hdd --destroy > done > > # Create the nvme db partitions (assuming 10G size for a 1TB OSD) > for partition in {2..3}; do > sgdisk -c /dev/nvme0n1 -n:$partition:0:+10G -c:$partition:'ceph db' > done > > # Create the OSD > echo "/dev/sdb /dev/nvme0n1p2 > /dev/sdc /dev/nvme0n1p3" | while read hdd db; do > ceph-volume lvm create --bluestore --data $hdd --block.db $db > done > > # Fix the OSDs to look for the block.db partition by UUID instead of its > device name. > for db in /var/lib/ceph/osd/*/block.db; do > dev=$(readlink $db | grep -Eo nvme[[:digit:]]+n[[:digit:]]+p[[:digit:]]+ > || echo false) > if [[ "$dev" != false ]]; then > uuid=$(ls -l /dev/disk/by-partuuid/ | awk '/'${dev}'$/ {print $9}') > ln -sf /dev/disk/by-partuuid/$uuid $db > fi > done > systemctl restart ceph-osd.target > > On Fri, May 11, 2018 at 10:59 AM João Paulo Sacchetto Ribeiro Bastos < > joaopaulos...@gmail.com> wrote: > >> Actually, if you go to https://ceph.com/community/new-luminous-bluestore/ you >> will see that DB/WAL work on a XFS partition, while the data itself goes on >> a raw block. >> >> Also, I told you the wrong command in the last mail. When i said --osd-db >> it should be --block-db. >> >> On Fri, May 11, 2018 at 11:51 AM Oliver Schulz < >> oliver.sch...@tu-dortmund.de> wrote: >> >>> Hi, >>> >>> thanks for the advice! I'm a bit confused now, though. ;-) >>> I thought DB and WAL were supposed to go on raw block >>> devices, not file systems? >>> >>> >>> Cheers, >>> >>> Oliver >>> >>> >>> On 11.05.2018 16:01, João Paulo Sacchetto Ribeiro Bastos wrote: >>> > Hello Oliver, >>> > >>> > As far as I know yet, you can use the same DB device for about 4 or 5 >>> > OSDs, just need to be aware of the free space. I'm also developing a >>> > bluestore cluster, and our DB and WAL will be in the same SSD of about >>> > 480GB serving 4 OSD HDDs of 4 TB each. About the sizes, its just a >>> > feeling because I couldn't find yet any clear rule about how to >>> measure >>> > the requirements. >>> > >>> > * The only concern that took me some time to realize is that you >>> should >>> > create a XFS partition if using ceph-deploy because if you don't it >>> will >>> > simply give you a RuntimeError that doesn't give any hint about what's >>> > going on. >>> > >>> > So, answering your question, you could do something like: >>> > $ ceph-deploy osd create --bluestore --data=/dev/sdb --block-db >>> > /dev/nvme0n1p1 $HOSTNAME >>> > $ ceph-deploy osd create --bluestore --data=/dev/sdc --block-db >>> > /dev/nvme0n1p1 $HOSTNAME >>> > >>> > On Fri, May 11, 2018 at 10:35 AM Oliver Schulz >>> > <oliver.sch...@tu-dortmund.de <mailto:oliver.sch...@tu-dortmund.de>> >>> wrote: >>> > >>> > Dear Ceph Experts, >>> > >>> > I'm trying to set up some new OSD storage nodes, now with >>> > bluestore (our existing nodes still use filestore). I'm >>> > a bit unclear on how to specify WAL/DB devices: Can >>> > several OSDs share one WAL/DB partition? So, can I do >>> > >>> > ceph-deploy osd create --bluestore --osd-db=/dev/nvme0n1p2 >>> > --data=/dev/sdb HOSTNAME >>> > >>> > ceph-deploy osd create --bluestore --osd-db=/dev/nvme0n1p2 >>> > --data=/dev/sdc HOSTNAME >>> > >>> > ... >>> > >>> > Or do I need to use osd-db=/dev/nvme0n1p2 for data=/dev/sdb, >>> > osd-db=/dev/nvme0n1p3 for data=/dev/sdc, and so on? >>> > >>> > And just to make sure - if I specify "--osd-db", I don't need >>> > to set "--osd-wal" as well, since the WAL will end up on the >>> > DB partition automatically, correct? >>> > >>> > >>> > Thanks for any hints, >>> > >>> > Oliver >>> > _______________________________________________ >>> > ceph-users mailing list >>> > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> > >>> > -- >>> > >>> > João Paulo Sacchetto Ribeiro Bastos >>> > +55 31 99279-7092 <+55%2031%2099279-7092> >>> > >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> -- >> >> João Paulo Bastos >> DevOps Engineer at Mav Tecnologia >> Belo Horizonte - Brazil >> +55 31 99279-7092 <+55%2031%2099279-7092> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com