[ceph-users] Ceph cluster upgrade
Hi list, Given a single node Ceph cluster (lab), I started out with the following CRUSH rule: > # rules > rule replicated_ruleset { > ruleset 0 > type replicated > min_size 1 > max_size 10 > step take default > step choose firstn 0 type osd > step emit > } Meanwhile, the cluster has grown (production) and additional hosts (and OSDs, obviously) were added. Ensuring redundancy between hosts, I would like to alter the rule as follows: > # rules > rule replicated_ruleset { > ruleset 0 > type replicated > min_size 1 > max_size 10 > step take default > step chooseleaf firstn 0 type host > step emit > } Is this the way to go? I would like as little performance degradation while rebalancing as possible. Please advice if I need to take in account certain preparations. Thanks in advance! Best regards, Kees ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph cluster upgrade
Hi Micha, Thank you very much for your prompt response. In an earlier process, I already ran: > $ ceph tell osd.* injectargs '--osd-max-backfills 1' > $ ceph tell osd.* injectargs '--osd-recovery-op-priority 1' > $ ceph tell osd.* injectargs '--osd-client-op-priority 63' > $ ceph tell osd.* injectargs '--osd-recovery-max-active 1' And yes, creating a separate ruleset makes sense. But, does the proposed ruleset itself make sense as well? Regards, Kees On 06-07-16 15:36, Micha Krause wrote: > Set these in your ceph.conf beforehand: > > osd recovery op priority = 1 > osd max backfills= 1 > > I would allso suggest creating a new crush rule, instead of modifying > your existing one. > > This enables you to change the rule on a per pool basis: > > ceph osd pool set crush_rulenum > > Then start with your smallest pool, and see how it goes. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph cluster upgrade
Thank you very much, I'll start testing the logic prior to implementation. K. On 06-07-16 19:20, Bob R wrote: > See http://dachary.org/?p=3189 for some simple instructions on testing > your crush rule logic. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] (no subject)
Hi Gaurav, Unfortunately I'm not completely sure about your setup, but I guess it makes sense to configure Cinder and Glance to use RBD for a backend. It seems to me, you're trying to store VM images directly on an OSD filesystem. Please refer to http://docs.ceph.com/docs/master/rbd/rbd-openstack/ for details. Regards, Kees On 06-07-16 23:03, Gaurav Goyal wrote: > > I am installing ceph hammer and integrating it with openstack Liberty > for the first time. > > My local disk has only 500 GB but i need to create 600 GB VM. SO i > have created a soft link to ceph filesystem as > > lrwxrwxrwx 1 root root 34 Jul 6 13:02 instances -> > /var/lib/ceph/osd/ceph-0/instances [root@OSKVM1 nova]# pwd > /var/lib/nova [root@OSKVM1 nova]# > > now when i am trying to create an instance it is giving the following > error as checked from nova-compute.log > I need your help to fix this issue. > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] (no subject)
Hi Gaurav, The following snippets should suffice (for Cinder, at least): > [DEFAULT] > enabled_backends=rbd > > [rbd] > volume_driver = cinder.volume.drivers.rbd.RBDDriver > rbd_pool = cinder-volumes > rbd_ceph_conf = /etc/ceph/ceph.conf > rbd_flatten_volume_from_snapshot = false > rbd_max_clone_depth = 5 > rbd_store_chunk_size = 4 > rados_connect_timeout = -1 > rbd_user = cinder > rbd_secret = REDACTED > > backup_driver = cinder.backup.drivers.ceph > backup_ceph_conf = /etc/ceph/ceph.conf > backup_ceph_user = cinder-backup > backup_ceph_chunk_size = 134217728 > backup_ceph_pool = backups > backup_ceph_stripe_unit = 0 > backup_ceph_stripe_count = 0 > restore_discard_excess_bytes = true Obviously you'd alter the directives according to your configuration and/or wishes. And no, creating RBD volumes by hand is not needed. Cinder will do this for you. K. On 08-07-16 04:14, Gaurav Goyal wrote: > Yeah i didnt find additional section for [ceph] in my cinder.conf > file. Should i create that manually? > As i didnt find [ceph] section so i modified same parameters in > [DEFAULT] section. > I will change that as per your suggestion. > > Moreoevr checking some other links i got to know that, i must > configure following additional parameters > should i do that and install tgtadm package? > rootwrap_config = /etc/cinder/rootwrap.conf > api_paste_confg = /etc/cinder/api-paste.ini > iscsi_helper = tgtadm > volume_name_template = volume-%s > volume_group = cinder-volumes > Do i need to execute following commands? > "pvcreate /dev/rbd1" & > "vgcreate cinder-volumes /dev/rbd1" ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph cluster upgrade
Thank you everyone, I just tested and verified the ruleset and applied it so some pools. Worked like a charm! K. On 06-07-16 19:20, Bob R wrote: > See http://dachary.org/?p=3189 for some simple instructions on testing > your crush rule logic. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] (no subject)
Hi Gaurav, Have you distributed your Ceph authentication keys to your compute nodes? And, do they have the correct permissions in terms of Ceph? K. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] (no subject)
Hi, I'd recommend generating an UUID and use it for all your compute nodes. This way, you can keep your configuration in libvirt constant. Regards, Kees On 08-07-16 16:15, Gaurav Goyal wrote: > > For below section, should i generate separate UUID for both compte hosts? > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] (no subject)
Hi, I think there's still something misconfigured: > Invalid: 400 Bad Request: Unknown scheme 'file' found in URI (HTTP 400) It seems the RBD backend is not used as expected. Have you configured both Cinder _and_ Glance to use Ceph? Regards, Kees On 08-07-16 17:33, Gaurav Goyal wrote: > > I regenerated the UUID as per your suggestion. > Now i have same UUID in host1 and host2. > I could create volumes and attach them to existing VMs. > > I could create new glance images. > > But still finding the same error while instance launch via GUI. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] (no subject)
Glad to hear it works now! Good luck with your setup. Regards, Kees On 11-07-16 17:29, Gaurav Goyal wrote: > Hello it worked for me after removing the following parameter from > /etc/nova/nova.conf file ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Fwd: Re: (no subject)
Sorry, should have posted this to the list. Forwarded Message Subject:Re: [ceph-users] (no subject) Date: Tue, 12 Jul 2016 08:30:49 +0200 From: Kees Meijs To: Gaurav Goyal Hi Gaurav, It might seem a little far fetched, but I'd use the qemu-img(1) tool to convert the qcow2 image file to a Ceph backed volume. First of all, create a volume of appropriate size in Cinder. The volume will be sparse. Then, figure out the identifier and use rados(8) to find the exact name of the volume in Ceph. Finally, use qemu-img(1) and point to the volume you just found out about. Cheers, Kees On 11-07-16 18:07, Gaurav Goyal wrote: > Thanks! > > I need to create a VM having qcow2 image file as 6.7 GB but raw image > as 600GB which is too big. > Is there a way that i need not to convert qcow2 file to raw and it > works well with rbd? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fwd: Re: (no subject)
Hi Fran, Fortunately, qemu-img(1) is able to directly utilise RBD (supporting sparse block devices)! Please refer to http://docs.ceph.com/docs/hammer/rbd/qemu-rbd/ for examples. Cheers, Kees On 13-07-16 09:18, Fran Barrera wrote: > Can you explain how you do this procedure? I have the same problem > with the large images and snapshots. > > This is what I do: > > # qemu-img convert -f qcow2 -O raw image.qcow2 image.img > # openstack image create image.img > > But the image.img is too large. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Fwd: Re: (no subject)
Hi, If the qemu-img is able to handle RBD in a clever way (and I assume it does) it is able to sparsely write the image to the Ceph pool. But, it is an assumption! Maybe someone else could shed some light on this? Or even better: read the source, the RBD handler specifically. And last but not least, create an empty test image in qcow2 sparse format of e.g. 10G and store it on Ceph. In other words: just test it and you'll know for sure. Cheers, Kees On 13-07-16 09:31, Fran Barrera wrote: > Yes, but is the same problem isn't? The image will be too large > because the format is raw. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] SSD Journal
Hi, This is an OSD box running Hammer on Ubuntu 14.04 LTS with additional systems administration tools: > $ df -h | grep -v /var/lib/ceph/osd > Filesystem Size Used Avail Use% Mounted on > udev5,9G 4,0K 5,9G 1% /dev > tmpfs 1,2G 892K 1,2G 1% /run > /dev/dm-1 203G 2,1G 200G 2% / > none4,0K 0 4,0K 0% /sys/fs/cgroup > none5,0M 0 5,0M 0% /run/lock > none5,9G 0 5,9G 0% /run/shm > none100M 0 100M 0% /run/user > /dev/dm-1 203G 2,1G 200G 2% /home As you can see, less than 10G is actually used. Regards, Kees On 13-07-16 11:51, Ashley Merrick wrote: > May sound a random question, but what size would you recommend for the > SATA-DOM, obviously I know standard OS space requirements, but will CEPH > required much on the root OS of a OSD only node apart from standard logs. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Physical maintainance
Hi Cephers, There's some physical maintainance I need to perform on an OSD node. Very likely the maintainance is going to take a while since it involves replacing components, so I would like to be well prepared. Unfortunately it is no option to add another OSD node or rebalance at this time, so I'm planning to operate in degraded state during the maintainance. If at all possible, I would to shut down the OSD node cleanly and prevent slow (or even blocking) requests on Ceph clients. Just setting the noout flag and shutting down the OSDs on the given node is not enough as it seems. In fact clients do not act that well in this case. Connections time out and for a while I/O seems to stall. Any thoughts on this, anyone? For example, is it a sensible idea and are writes still possible? Let's assume there are OSDs on to the to-be-maintained host which are primary for sure. Thanks in advance! Cheers, Kees ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Physical maintainance
Thanks! So to sum up, I'd best: * set the noout flag * stop the OSDs one by one * shut down the physical node * jank the OSD drives to prevent ceph-disk(8) from automaticly activating at boot time * do my maintainance * start the physical node * reseat and activate the OSD drives one by one * unset the noout flag On 13-07-16 14:39, Jan Schermer wrote: > If you stop the OSDs cleanly then that should cause no disruption to clients. > Starting the OSD back up is another story, expect slow request for a while > there and unless you have lots of very fast CPUs on the OSD node, start them > one-by-one and not all at once. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Physical maintainance
Hi, Thanks guys, this worked like a charm. Activating the OSDs wasn't necessary: it seemed udev(7) helped me with that. Cheers, Kees On 13-07-16 14:47, Kees Meijs wrote: > So to sum up, I'd best: > > * set the noout flag > * stop the OSDs one by one > * shut down the physical node > * jank the OSD drives to prevent ceph-disk(8) from automaticly > activating at boot time > * do my maintainance > * start the physical node > * reseat and activate the OSD drives one by one > * unset the noout flag > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] What HBA to choose? To expand or not to expand?
Hi list, It's probably something to discuss over coffee in Ede tomorrow but I'll ask anyway: what HBA is best suitable for Ceph nowadays? In an earlier thread I read some comments about some "dumb" HBAs running in IT mode but still being able to use cache on the HBA. Does it make sense? Or, is this dangerous similar to RAID solutions* without BBU? (On a side note, we're planning on not using SAS expanders any-more but to "wire" each individual disk e.g. using SFF8087 per four disks minimising risk of bus congestion and/or lock-ups.) Anyway, in short I'm curious about opinions on brand, type and configuration of HBA to choose. Cheers, Kees *: apologies for cursing. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] What HBA to choose? To expand or not to expand?
Hi Jake, On 19-09-17 15:14, Jake Young wrote: > Ideally you actually want fewer disks per server and more servers. > This has been covered extensively in this mailing list. Rule of thumb > is that each server should have 10% or less of the capacity of your > cluster. That's very true, but let's focus on the HBA. > I didn't do extensive research to decide on this HBA, it's simply what > my server vendor offered. There are probably better, faster, cheaper > HBAs out there. A lot of people complain about LSI HBAs, but I am > comfortable with them. Given a configuration our vendor offered it's about LSI/Avago 9300-8i with 8 drives connected individually using SFF8087 on a backplane (e.g. not an expander). Or, 24 drives using three HBAs (6xSFF8087 in total) when using a 4HE SuperMicro chassis with 24 drive bays. But, what are the LSI complaints about? Or, are the complaints generic to HBAs and/or cryptic CLI tools and not LSI specific? > There is a management tool called storcli that can fully configure the > HBA in one or two command lines. There's a command that configures > all attached disks as individual RAID0 disk groups. That command gets > run by salt when I provision a new osd server. The thread I read was about Areca in JBOD but still able to utilise the cache, if I'm not mistaken. I'm not sure anymore if there was something mentioned about BBU. > > What many other people are doing is using the least expensive JBOD HBA > or the on board SAS controller in JBOD mode and then using SSD > journals. Save the money you would have spent on the fancy HBA for > fast, high endurance SSDs. Thanks! And obviously I'm very interested in other comments as well. Regards, Kees ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Deep scrubbing causes severe I/O stalling
Hi Cephers, Using Ceph 0.94.9-1trusty we noticed severe I/O stalling during deep scrubbing (vanilla parameters used in regards to scrubbing). I'm aware this has been discussed before, but I'd like to share the parameters we're going to evaluate: * osd_scrub_begin_hour 1 * osd_scrub_end_hour 7 * osd_scrub_min_interval 259200 * osd_scrub_max_interval 1814400 * osd_scrub_chunk_max 5 * osd_scrub_sleep .1 * osd_deep_scrub_interval 1814400 * osd_deep_scrub_stride 1048576 Anyway, thoughts on the matter or specific parameter advice is more than welcome. Cheers, Kees ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deep scrubbing causes severe I/O stalling
Hi, On 28-10-16 12:06, w...@42on.com wrote: > I don't like this personally. Your cluster should be capable of doing > a deep scrub at any moment. If not it will also not be able to handle > a node failure during peak times. Valid point and I totally agree. Unfortunately, the current load doesn't give me much of a choice I'm afraid. Tweaking and extending the cluster hardware (e.g. more and faster spinners) makes more sense but we're not there yet. Maybe the new parameters help us towards the "always capable" momentum. Let's hope for the best and see what'll happen. ;-) If it works out, I could (and will) remove the time constraints. > * osd_scrub_sleep .1 > > You can try to bump that even more. Thank you for pointing that out. I'm unsure about the osd_scrub_sleep parameter behaviour (documentation is scarce). Could you please shed a little light on this? Cheers, Kees ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deep scrubbing causes severe I/O stalling
Hi, Interesting... We're now running using deadline. In other posts I read about noop for SSDs instead of CFQ. Since we're using spinners with SSD journals; does it make since to mix the scheduler? E.g. CFG for spinners _and_ noop for SSD? K. On 28-10-16 14:43, Wido den Hollander wrote: > Make sure you use the CFQ disk scheduler for your disks though. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Deep scrubbing causes severe I/O stalling
Hi, As promised, our findings so far: * For the time being, the new scrubbing parameters work well. * Using CFQ for spinners and NOOP voor SSD seems to spread load over the storage cluster a little better than deadline does. However, overall latency seems (just a feeling, no numbers there) a little higher. Cheers, Kees On 28-10-16 15:37, Kees Meijs wrote: > > Interesting... We're now running using deadline. In other posts I read > about noop for SSDs instead of CFQ. > > Since we're using spinners with SSD journals; does it make since to > mix the scheduler? E.g. CFG for spinners _and_ noop for SSD? > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Stalling IO with cache tier
Hi list, Our current Ceph production cluster seems to cope with performance issues, so we decided to add a fully flash based cache tier (now running with spinners and journals on separate SSDs). We ordered SSDs (Intel), disk trays and read http://docs.ceph.com/docs/hammer/rados/operations/cache-tiering/ carefully. Afterwards a new pool was created in a separate root, assigned with a ruleset matching flash-only OSDs only. Since adding and removing the cache tier could be done transparantly, we decided to get going in order to save time and improve performance as soon as possible: > # ceph osd tier add cinder-volumes cache > pool 'cache' is now (or already was) a tier of 'cinder-volumes' > # ceph osd tier cache-mode cache writeback > set cache-mode for pool 'cache' to writeback > # ceph osd tier set-overlay cinder-volumes cache > overlay for 'cinder-volumes' is now (or already was) 'cache' > # ceph osd pool set cache hit_set_type bloom > set pool 6 hit_set_type to bloom > # ceph osd pool set cache hit_set_count 1 > set pool 6 hit_set_count to 1 > # ceph osd pool set cache hit_set_period 3600 > set pool 6 hit_set_period to 3600 > # ceph osd pool set cache target_max_bytes 257698037760 > set pool 6 target_max_bytes to 257698037760 > # ceph osd pool set cache cache_target_full_ratio 0.8 > set pool 6 cache_target_full_ratio to 0.8 Yes, full flash cache here we go! Or, is it? After a few minutes, all hell broke loose and it seemed all IO on our cluster was stalling and no objects were to be found in the new cache pool called cache. Luckily we were able to remove the cache tier in a few moments again, restoring storage services. The storage cluster backs both Cinder and Glance services with OpenStack. Could someone please give some pointers in how to debug this? Log files seem a little "voidy" on the matter, I'm afraid. Thanks in advance! It would be great if we could implement the cache tier again in the near future, improving performance. Cheers, Kees ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stalling IO with cache tier
Hi, In addition, some log was generated by KVM processes: > qemu: terminating on signal 15 from pid 2827 > osdc/ObjectCacher.cc: In function 'ObjectCacher::~ObjectCacher()' > thread 7f265a77da80 time 2016-11-23 17:26:24.237542 > osdc/ObjectCacher.cc: 551: FAILED assert(i->empty()) > ceph version 0.94.8 (838cd35201e4fe1339e16d987cc33e873524af90) > 1: (()+0x15b8ab) [0x7f2649afc8ab] > 2: (()+0x38cfdd) [0x7f2649d2dfdd] > 3: (()+0x57406) [0x7f26499f8406] > 4: (()+0x7e3cd) [0x7f2649a1f3cd] > 5: (rbd_close()+0x9) [0x7f26499dd529] > 6: (()+0x2c12) [0x7f264bf70c12] > 7: (bdrv_close()+0x80) [0x565063a61b90] > 8: (bdrv_unref()+0x97) [0x565063a61e27] > 9: (bdrv_close()+0x155) [0x565063a61c65] > 10: (bdrv_close_all()+0x3c) [0x565063a61d5c] > 11: (main()+0x418f) [0x5650637bde2f] > 12: (__libc_start_main()+0xf5) [0x7f2655297f45] > 13: (()+0xfbaa1) [0x5650637c1aa1] > NOTE: a copy of the executable, or `objdump -rdS ` is > needed to interpret this. > terminate called after throwing an instance of 'ceph::FailedAssertion' Hope it helps. Cheers, Kees On 24-11-16 13:06, Kees Meijs wrote: > Could someone please give some pointers in how to debug this? Log files > seem a little "voidy" on the matter, I'm afraid. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stalling IO with cache tier
Hi Burkhard, A testing pool makes absolute sense, thank you. About the complete setup, the documentation states: > The cache tiering agent can flush or evict objects based upon the > total number of bytes *or* the total number of objects. To specify a > maximum number of bytes, execute the following: > And: > If you specify both limits, the cache tiering agent will begin > flushing or evicting when either threshold is triggered. > I *did *configure target_max_bytes so I presume (yes, that is an assumption) we should be good. Tests will confirm or deny. Regards, Kees On 24-11-16 15:05, Burkhard Linke wrote: > Just my 2ct: > > A cache tier needs a complete setup, e.g. the target_max_objects > setting is missing. Try to set all cache related settings to a sane > value. > > You might also want to create a simple backend pool first and test the > cache tier with that pool. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stalling IO with cache tier
Hi Nick, All Ceph pools have very restrictive permissions for each OpenStack service, indeed. Besides creating the cache pool and enabling it, no additional parameters or configuration was done. Do I understand correctly access parameters (e.g. authx keys) are needed for a cache tier? If yes, it would make sense to add this to the documentation. Cheers, Kees On 24-11-16 15:12, Nick Fisk wrote: > I think I remember seeing other people with this problem before, isn't there > something you have to do in Openstack to make sure it > has the correct keys to access the new cache pool? Or something like that. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stalling IO with cache tier
Hi Nick, Oh... In retrospect it makes sense in a way, but it does not as well. ;-) To clarify: it makes sense since the cache is "just a pool" but it does not since "it is an overlay and just a cache in between". Anyway, something that should be well documented and warned for, if you ask me. Cheers, Kees On 24-11-16 15:29, Nick Fisk wrote: > Yes, if your keys in use in Openstack only grant permission to the base pool, > then it will not be able to access the cache pool when > enabled. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Stalling IO with cache tier
Hi, Just checked permissions: > # ceph auth get client.cinder > exported keyring for client.cinder > [client.cinder] > key = REDACTED > caps mon = "allow r" > caps osd = "allow class-read object_prefix rbd_children, allow rwx > pool=cinder-volumes, allow rwx pool=cinder-vms, allow rx > pool=glance-images" I presume I should add *allow rwx pool=cache* in our case? Thanks again, Kees On 24-11-16 15:55, Kees Meijs wrote: > Oh... In retrospect it makes sense in a way, but it does not as well. ;-) > > To clarify: it makes sense since the cache is "just a pool" but it does > not since "it is an overlay and just a cache in between". > > Anyway, something that should be well documented and warned for, if you > ask me. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] CoW clone performance
Hi list, We're using CoW clones (using OpenStack via Glance and Cinder) to store virtual machine images. For example: > # rbd info cinder-volumes/volume-a09bd74b-f100-4043-a422-5e6be20d26b2 > rbd image 'volume-a09bd74b-f100-4043-a422-5e6be20d26b2': > size 25600 MB in 3200 objects > order 23 (8192 kB objects) > block_name_prefix: rbd_data.c569832b851bc > format: 2 > features: layering, striping > flags: > parent: glance-images/37a54104-fe3c-4e2a-a94b-da0f3776e1ac@snap > overlap: 4096 MB > stripe unit: 8192 kB > stripe count: 1 It seems our storage cluster writes a lot, also when the virtualization cluster isn't loaded at all and there seem to be more writes than reads. In general that is, which is quite odd and unexpected. In addition, performance is not as good as we would like. Can someone please share their thoughts on this matter and for example at flattening (or maybe not) the volumes. Thanks in advance! Cheers, Kees ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 2x replication: A BIG warning
Hi Wido, Valid point. At this moment, we're using a cache pool with size = 2 and would like to "upgrade" to size = 3. Again, you're absolutely right... ;-) Anyway, any things to consider or could we just: 1. Run "ceph osd pool set cache size 3". 2. Wait for rebalancing to complete. 3. Run "ceph osd pool set cache min_size 2". Thanks! Regards, Kees On 07-12-16 09:08, Wido den Hollander wrote: > As a Ceph consultant I get numerous calls throughout the year to help people > with getting their broken Ceph clusters back online. > > The causes of downtime vary vastly, but one of the biggest causes is that > people use replication 2x. size = 2, min_size = 1. > > In 2016 the amount of cases I have where data was lost due to these settings > grew exponentially. > > Usually a disk failed, recovery kicks in and while recovery is happening a > second disk fails. Causing PGs to become incomplete. > > There have been to many times where I had to use xfs_repair on broken disks > and use ceph-objectstore-tool to export/import PGs. > > I really don't like these cases, mainly because they can be prevented easily > by using size = 3 and min_size = 2 for all pools. > > With size = 2 you go into the danger zone as soon as a single disk/daemon > fails. With size = 3 you always have two additional copies left thus keeping > your data safe(r). > > If you are running CephFS, at least consider running the 'metadata' pool with > size = 3 to keep the MDS happy. > > Please, let this be a big warning to everybody who is running with size = 2. > The downtime and problems caused by missing objects/replicas are usually big > and it takes days to recover from those. But very often data is lost and/or > corrupted which causes even more problems. > > I can't stress this enough. Running with size = 2 in production is a SERIOUS > hazard and should not be done imho. > > To anyone out there running with size = 2, please reconsider this! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] 2x replication: A BIG warning
Hi Wido, Since it's a Friday night, I decided to just go for it. ;-) It took a while to rebalance the cache tier but all went well. Thanks again for your valuable advice! Best regards, enjoy your weekend, Kees On 07-12-16 14:58, Wido den Hollander wrote: >> Anyway, any things to consider or could we just: >> >> 1. Run "ceph osd pool set cache size 3". >> 2. Wait for rebalancing to complete. >> 3. Run "ceph osd pool set cache min_size 2". >> > Indeed! It is a simple as that. > > Your cache pool can also contain very valuable data you do not want to loose. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Upgrading from Hammer
Hi guys, In the past few months, I've read some posts about upgrading from Hammer. Maybe I've missed something, but I didn't really read something on QEMU/KVM behaviour in this context. At the moment, we're using: > $ qemu-system-x86_64 --version > QEMU emulator version 2.3.0 (Debian 1:2.3+dfsg-5ubuntu9.4~cloud2), > Copyright (c) 2003-2008 Fabrice Bellard The Ubuntu package (originating from Canonical's cloud archive) is utilising: * librados2 - 0.94.8-0ubuntu0.15.10.1~cloud0 * librbd1 - 0.94.8-0ubuntu0.15.10.1~cloud0 I'm very curious if there's someone out there using a similar version with a Ceph cluster on Jewel. Anything to take in account? Thanks in advance! Best regards, Kees ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading from Hammer
Hi Wido, At the moment, we're running Ubuntu 14.04 LTS using the Ubuntu Cloud Archive. To be precise again, it's QEMU/KVM 2.3+dfsg-5ubuntu9.4~cloud2 linked to Ceph 0.94.8-0ubuntu0.15.10.1~cloud0. So yes, it's all about running a newer QEMU/KVM on a not so new version of Ubuntu. Question is, are we able to run against a Ceph cluster running Jewel instead of Hammer. Or, do we need to upgrade our OpenStack installation first? Regards, Kees On 13-12-16 09:26, Wido den Hollander wrote: > Why? The Ubuntu Cloud Archive is there to provide you a newer Qemu on a older > Ubuntu system. > > If you run Qemu under Ubuntu 16.04 and use the DEB packages directly from > Ceph you should be fine. > > Recent Qemu and recent Ceph :) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrading from Hammer
Hi Wido, Thanks again! Good to hear, it saves us a lot of upgrade trouble in advance. If I'm not mistaken, we haven't done anything with CRUSH tunables. Any pointers on how to make sure we really didn't? Regards, Kees On 20-12-16 10:14, Wido den Hollander wrote: > No, you don't. A Hammer/Jewel client can talk to a Hammer/Jewel cluster. One > thing, don't change any CRUSH tunables if the cluster runs Jewel and the > client is still on Hammer. > > The librados/librbd version is what matters. If you upgrade the cluster to > Jewel and leave the client on Hammer it works. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Unbalanced OSD's
Hi Asley, We experience (using Hammer) a similar issue. Not that I have a perfect solution to share, but I felt like mentioning a "me too". ;-) On a side note: we configured correct weight per drive as well. Regards, Kees On 29-12-16 11:54, Ashley Merrick wrote: > > Hello, > > > > I currently have 5 servers within my CEPH Cluster > > > > 2 x (10 * 8TB Disks) > > 3 x (10 * 4TB Disks) > > > > Currently seeing a larger difference in OSD use across the two > separate server types, as well as within the server itself. > > > > For example on one 4TB server I have an OSD at 64% and one at 84%, > where on the 8TB servers the OSD range from 49% to 64%, where the > highest used OSD’s are on the 4TB. > > > > Each drive has a weight set correctly for the drive size and each > server has the correct weight set, below is my crush map. Apart from > running the command to adjust the re-weight is there anything I am > doing wrong or should change for better spread of data, not looking > for near perfect but where the 8TB drives are sitting at 64% max and > 4TB are sitting at 80%’s causes a big inbalance. > > > > # begin crush map > > tunable choose_local_tries 0 > > tunable choose_local_fallback_tries 0 > > tunable choose_total_tries 50 > > tunable chooseleaf_descend_once 1 > > tunable chooseleaf_vary_r 1 > > tunable straw_calc_version 1 > > tunable allowed_bucket_algs 54 > > > > # buckets > > host sn1 { > > id -2 # do not change unnecessarily > > # weight 72.800 > > alg straw2 > > hash 0 # rjenkins1 > > item osd.0 weight 7.280 > > item osd.1 weight 7.280 > > item osd.3 weight 7.280 > > item osd.4 weight 7.280 > > item osd.2 weight 7.280 > > item osd.5 weight 7.280 > > item osd.6 weight 7.280 > > item osd.7 weight 7.280 > > item osd.8 weight 7.280 > > item osd.9 weight 7.280 > > } > > host sn3 { > > id -6 # do not change unnecessarily > > # weight 72.800 > > alg straw2 > > hash 0 # rjenkins1 > > item osd.10 weight 7.280 > > item osd.11 weight 7.280 > > item osd.12 weight 7.280 > > item osd.13 weight 7.280 > > item osd.14 weight 7.280 > > item osd.15 weight 7.280 > > item osd.16 weight 7.280 > > item osd.17 weight 7.280 > > item osd.18 weight 7.280 > > item osd.19 weight 7.280 > > } > > host sn4 { > > id -7 # do not change unnecessarily > > # weight 36.060 > > alg straw2 > > hash 0 # rjenkins1 > > item osd.20 weight 3.640 > > item osd.21 weight 3.640 > > item osd.22 weight 3.640 > > item osd.23 weight 3.640 > > item osd.24 weight 3.640 > > item osd.25 weight 3.640 > > item osd.26 weight 3.640 > > item osd.27 weight 3.640 > > item osd.28 weight 3.640 > > item osd.29 weight 3.300 > > } > > host sn5 { > > id -8 # do not change unnecessarily > > # weight 36.060 > > alg straw2 > > hash 0 # rjenkins1 > > item osd.30 weight 3.640 > > item osd.31 weight 3.640 > > item osd.32 weight 3.640 > > item osd.33 weight 3.640 > > item osd.34 weight 3.640 > > item osd.35 weight 3.640 > > item osd.36 weight 3.640 > > item osd.37 weight 3.640 > > item osd.38 weight 3.640 > > item osd.39 weight 3.640 > > } > > host sn6 { > > id -9 # do not change unnecessarily > > # weight 36.060 > > alg straw2 > > hash 0 # rjenkins1 > > item osd.40 weight 3.640 > > item osd.41 weight 3.640 > > item osd.42 weight 3.640 > > item osd.43 weight 3.640 > > item osd.44 weight 3.640 > > item osd.45 weight 3.640 > > item osd.46 weight 3.640 > > item osd.47 weight 3.640 > > item osd.48 weight 3.640 > > item osd.49 weight 3.640 > > } > > root default { > > id -1 # do not change unnecessarily > > # weight 253.780 > > alg straw2 > > hash 0 # rjenkins1 > > item sn1 weight 72.800 > > item sn3 weight 72.800 > > item sn4 weight 36.060 > > item sn5 weight 36.060 > > item sn6 weight 36.060 > > } > > > > # rules > > rule replicated_ruleset { > > ruleset 0 > > type replicated > > min_size 1 > > max_size 10 > > step take default > > step chooseleaf firstn 0 type host > > step emit > > } > > > > Thanks, > > Ashley > > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ce
Re: [ceph-users] Unbalanced OSD's
Thanks, I'll try a manual reweight at first. Have a happy new year's eve (yes, I know it's a day early)! Regards, Kees On 30-12-16 11:17, Wido den Hollander wrote: > For this reason you can do a OSD reweight by running the 'ceph osd > reweight-by-utilization' command or do it manually with 'ceph osd reweight X > 0-1' ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Upgrade to Infernalis: failed to pick suitable auth object
Hi Cephers, For the last months (well... years actually) we were quite happy using Hammer. So far, there was no immediate cause implying an upgrade. However, having seen Luminous providing support for BlueStore, it seemed like a good idea to perform some upgrade steps. Doing baby steps, I wanted to upgrade from Hammer to Infernalis first since all ownerships should be changed because of using an unprivileged user (good stuff!) instead of root. So far, I've upgraded all monitors from Hammer (0.94.10) to Infernalis (9.2.1). All seemed well resulting in HEALTH_OK. Then, I tried upgrading one OSD server using the following procedure: 1. Alter APT sources to utilise Infernalis instead of Hammer. 2. Update and upgrade the packages. 3. Since I didn't want any rebalancing going on, I ran "ceph osd set noout" as well. 4. Stop a OSD, then chown ceph:ceph -R /var/lib/ceph/osd/ceph-X, start the OSD and so on. Maybe I acted too quickly (ehrm... didn't wait long enough) but at some point it seemed not all ownership was changed during the process. Meanwhile we were still HEALTH_OK so I didn't really worry and fixed left-overs using find /var/lib/ceph -not -user ceph -exec chown ceph:ceph '{}' ';' It seemed to work well and two days passed without any issues. But then... Deep scrubbing happened: > health HEALTH_ERR > 1 pgs inconsistent > 2 scrub errors So far, I figured out the two scrubbing errors apply to the same OSD, being osd.0. The log at the OSD shows: > 2018-08-17 15:25:36.810866 7fa3c9e09700 0 log_channel(cluster) log > [INF] : 3.72 deep-scrub starts > 2018-08-17 15:25:37.221562 7fa3c7604700 -1 log_channel(cluster) log > [ERR] : 3.72 soid -5/0072/temp_3.72_0_16187756_3476/head: failed > to pick suitable auth object > 2018-08-17 15:25:37.221566 7fa3c7604700 -1 log_channel(cluster) log > [ERR] : 3.72 soid -5/0072/temp_3.72_0_16195026_251/head: failed to > pick suitable auth object > 2018-08-17 15:46:36.257994 7fa3c7604700 -1 log_channel(cluster) log > [ERR] : 3.72 deep-scrub 2 errors The situation seems similar to http://tracker.ceph.com/issues/13862 but so far I'm unable to repair the placement group. Meanwhile I'm forcing deep scrubbing for all placement groups applicable to osd.0, hopefully resulting in just PG 3.72 having errors. Awaiting deep scrubbing to finish, it seemed like a good idea to ask you guys for help. What's the best approach at this point? > eph health detail > HEALTH_ERR 1 pgs inconsistent; 2 scrub errors > pg 3.72 is active+clean+inconsistent, acting [0,33,39] > 2 scrub errors OSDs 33 and 39 are untouched (still running 0.94.10) and seem fine without errors. Thanks in advance for any comments or thoughts. Regards and enjoy your weekend! Kees -- https://nefos.nl/contact Nefos IT bv Ambachtsweg 25 (industrienummer 4217) 5627 BZ Eindhoven Nederland KvK 66494931 /Aanwezig op maandag, dinsdag, woensdag en vrijdag/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object
Hi David, Thank you for pointing out the option. On http://docs.ceph.com/docs/infernalis/release-notes/ one can read: * Ceph daemons now run as user and group ceph by default. The ceph user has a static UID assigned by Fedora and Debian (also used by derivative distributions like RHEL/CentOS and Ubuntu). On SUSE the ceph user will currently get a dynamically assigned UID when the user is created. If your systems already have a ceph user, upgrading the package will cause problems. We suggest you first remove or rename the existing ‘ceph’ user before upgrading. When upgrading, administrators have two options: 1. Add the following line to ceph.conf on all hosts: setuser match path = /var/lib/ceph/$type/$cluster-$id This will make the Ceph daemons run as root (i.e., not drop privileges and switch to user ceph) if the daemon’s data directory is still owned by root. Newly deployed daemons will be created with data owned by user ceph and will run with reduced privileges, but upgraded daemons will continue to run as root. 2. Fix the data ownership during the upgrade. This is the preferred option, but is more work. The process for each host would be to: 1. Upgrade the ceph package. This creates the ceph user and group. For example: ceph-deploy install --stable infernalis HOST 2. Stop the daemon(s).: service ceph stop # fedora, centos, rhel, debian stop ceph-all # ubuntu 3. Fix the ownership: chown -R ceph:ceph /var/lib/ceph 4. Restart the daemon(s).: start ceph-all# ubuntu systemctl start ceph.target # debian, centos, fedora, rhel Since it seemed more elegant to me, I chose the second option and followed the steps. To be continued... Over night, some more placement groups seem to be inconsistent. I'll post my findings later on. Regards, Kees On 17-08-18 17:21, David Turner wrote: In your baby step upgrade you should avoid the 2 non-LTS releases of Infernalis and Kraken. You should go from Hammer to Jewel to Luminous. The general rule of doing the upgrade to put all of your OSDs to be owned by ceph was to not change the ownership as part of the upgrade. There is a [1] config option that tells Ceph to override the user the daemons run as so that you can separate these 2 operations from each other simplifying each maintenance task. It will set the user to whatever the user is for each daemon's folder. [1] setuser match path = /var/lib/ceph/$type/$cluster-$id ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object
Hi again, After listing all placement groups the problematic OSD (osd.0) being part of, I forced a deep-scrub for all those PGs. A few hours later (and some other deep scrubbing as well) the result seems to be: HEALTH_ERR 8 pgs inconsistent; 14 scrub errors pg 3.6c is active+clean+inconsistent, acting [14,2,38] pg 3.32 is active+clean+inconsistent, acting [0,11,33] pg 3.13 is active+clean+inconsistent, acting [8,34,9] pg 3.30 is active+clean+inconsistent, acting [14,35,26] pg 3.31 is active+clean+inconsistent, acting [44,35,26] pg 3.7d is active+clean+inconsistent, acting [46,37,35] pg 3.70 is active+clean+inconsistent, acting [0,36,11] pg 3.72 is active+clean+inconsistent, acting [0,33,39] 14 scrub errors OSDs (in order) 0, 8, 14 and 46 all reside on the same server. Obviously being the one upgraded to Infernalis. It makes sense I acted too quick given a OSD (regarding to fixing the ownerships while maybe still running), maybe two but not all of them. Although it's very likely it wouldn't make a difference, I'll try a ceph pg repair for each PG. To be continued again! Regards, Kees On 18-08-18 10:52, Kees Meijs wrote: To be continued... Over night, some more placement groups seem to be inconsistent. I'll post my findings later on. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object
Good morning, And... the results: 2018-08-18 17:45:08.927387 7fa3cbe0d700 0 log_channel(cluster) log [INF] : 3.32 repair starts 2018-08-18 17:45:12.350343 7fa3c9608700 -1 log_channel(cluster) log [ERR] : 3.32 soid -5/0032/temp_3.32_0_16187756_293/head: failed to pick suitable auth object 2018-08-18 18:07:43.908310 7fa3c9608700 -1 log_channel(cluster) log [ERR] : 3.32 repair 1 errors, 0 fixed 2018-08-18 18:27:48.141634 7fa3c8606700 0 log_channel(cluster) log [INF] : 3.70 repair starts 2018-08-18 18:27:49.073504 7fa3c8606700 -1 log_channel(cluster) log [ERR] : 3.70 soid -5/0070/temp_3.70_0_16187756_4006/head: failed to pick suitable auth object 2018-08-18 18:51:57.393099 7fa3cae0b700 -1 log_channel(cluster) log [ERR] : 3.70 repair 1 errors, 0 fixed 2018-08-18 19:21:20.456610 7fa3c7604700 0 log_channel(cluster) log [INF] : 3.72 repair starts 2018-08-18 19:21:21.303999 7fa3c9e09700 -1 log_channel(cluster) log [ERR] : 3.72 soid -5/0072/temp_3.72_0_16187756_3476/head: failed to pick suitable auth object 2018-08-18 19:21:21.304051 7fa3c9e09700 -1 log_channel(cluster) log [ERR] : 3.72 soid -5/0072/temp_3.72_0_16187756_5344/head: failed to pick suitable auth object 2018-08-18 19:21:21.304077 7fa3c9e09700 -1 log_channel(cluster) log [ERR] : 3.72 soid -5/0072/temp_3.72_0_16195026_251/head: failed to pick suitable auth object 2018-08-18 19:48:00.016879 7fa3c9e09700 -1 log_channel(cluster) log [ERR] : 3.72 repair 3 errors, 0 fixed 2018-08-18 17:45:08.807173 7f047f9a2700 0 log_channel(cluster) log [INF] : 3.13 repair starts 2018-08-18 17:45:10.669835 7f04821a7700 -1 log_channel(cluster) log [ERR] : 3.13 soid -5/0013/temp_3.13_0_16175425_287/head: failed to pick suitable auth object 2018-08-18 18:05:28.966015 7f04795c7700 0 -- 10.128.4.3:6816/5641 >> 10.128.4.4:6800/3454 pipe(0x564161026000 sd=59 :46182 s=2 pgs=11994 cs=31 l=0 c=0x56415b4fc2c0).fault with nothing to send, going to standby 2018-08-18 18:09:46.667875 7f047f9a2700 -1 log_channel(cluster) log [ERR] : 3.13 repair 1 errors, 0 fixed 2018-08-18 17:45:00.099722 7f1e4f857700 0 log_channel(cluster) log [INF] : 3.6c repair starts 2018-08-18 17:45:01.982007 7f1e4f857700 -1 log_channel(cluster) log [ERR] : 3.6c soid -5/006c/temp_3.6c_0_16187760_5765/head: failed to pick suitable auth object 2018-08-18 17:45:01.982042 7f1e4f857700 -1 log_channel(cluster) log [ERR] : 3.6c soid -5/006c/temp_3.6c_0_16187760_796/head: failed to pick suitable auth object 2018-08-18 18:07:33.490940 7f1e4f857700 -1 log_channel(cluster) log [ERR] : 3.6c repair 2 errors, 0 fixed 2018-08-18 18:29:24.339018 7f1e4d052700 0 log_channel(cluster) log [INF] : 3.30 repair starts 2018-08-18 18:29:25.689341 7f1e4f857700 -1 log_channel(cluster) log [ERR] : 3.30 soid -5/0030/temp_3.30_0_16187760_3742/head: failed to pick suitable auth object 2018-08-18 18:29:25.689346 7f1e4f857700 -1 log_channel(cluster) log [ERR] : 3.30 soid -5/0030/temp_3.30_0_16187760_3948/head: failed to pick suitable auth object 2018-08-18 18:54:59.123152 7f1e4f857700 -1 log_channel(cluster) log [ERR] : 3.30 repair 2 errors, 0 fixed 2018-08-18 18:05:27.421858 7efc52942700 0 log_channel(cluster) log [INF] : 3.7d repair starts 2018-08-18 18:05:29.511779 7efc5013d700 -1 log_channel(cluster) log [ERR] : 3.7d soid -5/007d/temp_3.7d_0_16204674_4402/head: failed to pick suitable auth object 2018-08-18 18:29:23.159691 7efc52942700 -1 log_channel(cluster) log [ERR] : 3.7d repair 1 errors, 0 fixed I'll investigate further. Regards, Kees On 18-08-18 17:43, Kees Meijs wrote: Although it's very likely it wouldn't make a difference, I'll try a ceph pg repair for each PG. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object
Hi again, Over night some other PGs seem inconsistent as well after being deep scrubbed. All affected OSDs log similar errors like: > log [ERR] : 3.13 soid -5/0013/temp_3.13_0_16175425_287/head: > failed to pick suitable auth object Since there's temp in the name and we're running a 3-replica cluster, I'm thinking of just reboiling the comprised OSDs. Any thoughts on this, can I do this safely? Current status: > 12 active+clean+inconsistent Nota bene: it cannot be file ownership is the real culprit of this. Like I mentioned earlier in this thread it might be the case for one or maybe two OSDs but definitely not all. Regards, Kees On 19-08-18 08:55, Kees Meijs wrote: > I'll investigate further. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object
Ehrm, that should of course be rebuilding. (I.e. removing the OSD, reformat, re-add.) On 20-08-18 11:51, Kees Meijs wrote: > Since there's temp in the name and we're running a 3-replica cluster, > I'm thinking of just reboiling the comprised OSDs. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object
Hi David, Thanks for your advice. My end goal is BlueStore so to upgrade to Jewel and then Luminous would be ideal. Currently all monitors are (succesfully) running Internalis, one OSD node is running Infernalis and all other OSD nodes have Hammer. I'll try freeing up one Infernalis OSD at first and see what'll happen. If it goes well I'll just (for now) give up all OSDs on the given node. If it works, I'll end up with Hammer OSDs only and Infernalis monitors. To be continued again! Regards, Kees On 20-08-18 12:04, David Turner wrote: > My suggestion would be to remove the osds and let the cluster recover > from all of the other copies. I would deploy the node back to Hammer > instead of Infernalis. Either that or remove these osds, let the > cluster backfill, and then upgrade to Jewel, and then luminous, and > maybe mimic if you're planning on making it to the newest LTS before > adding the node back in. That way you could add them back in as > bluestore (on either luminous or mimic) if that's a part of your plan. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Ensure Hammer client compatibility
Good afternoon Cephers, While I'm fixing our upgrade-semi-broken cluster (see thread Upgrade to Infernalis: failed to pick suitable auth object) I'm wondering about ensuring client compatibility. My end goal is BlueStore (i.e. running Luminous) and unfortunately I'm obliged to offer Hammer client compatibility. Any pointers on how to ensure this configuration-wise? Thanks! Regards, Kees -- https://nefos.nl/contact Nefos IT bv Ambachtsweg 25 (industrienummer 4217) 5627 BZ Eindhoven Nederland KvK 66494931 /Aanwezig op maandag, dinsdag, woensdag en vrijdag/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object
Bad news: I've got a PG stuck in down+peering now. Please advice. K. On 20-08-18 12:12, Kees Meijs wrote: > Thanks for your advice. My end goal is BlueStore so to upgrade to Jewel > and then Luminous would be ideal. > > Currently all monitors are (succesfully) running Internalis, one OSD > node is running Infernalis and all other OSD nodes have Hammer. > > I'll try freeing up one Infernalis OSD at first and see what'll happen. > If it goes well I'll just (for now) give up all OSDs on the given node. > If it works, I'll end up with Hammer OSDs only and Infernalis monitors. > > To be continued again! ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to Infernalis: failed to pick suitable auth object
The given PG is back online, phew... Meanwhile, some OSDs still on Hammer seem to crash with errors alike: > 2018-08-20 13:06:33.819569 7f8962b2f700 -1 osd/ReplicatedPG.cc: In > function 'void ReplicatedPG::scan_range(int, int, > PG::BackfillInterval*, ThreadPool::TPHandle&)' thread 7f8962b2f700 > time 2018-08-20 13:06:33.709922 > osd/ReplicatedPG.cc: 10115: FAILED assert(r >= 0) Restarting the OSDs seems to work. K. On 20-08-18 13:14, Kees Meijs wrote: > Bad news: I've got a PG stuck in down+peering now. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time
Hi again, I'm starting to feel really unlucky here... At the moment, the situation is "sort of okay": 1387 active+clean 11 active+clean+inconsistent 7 active+recovery_wait+degraded 1 active+recovery_wait+undersized+degraded+remapped 1 active+undersized+degraded+remapped+wait_backfill 1 active+undersized+degraded+remapped+inconsistent+backfilling To ensure nothing is in the way, I disabled both scrubbing and deep scrubbing for the time being. However, random OSDs (still on Hammer) constantly crash giving the error as mentioned earlier (osd/ReplicatedPG.cc: 10115: FAILED assert(r >= 0)). It felt like they started crashing when hitting the PG currently backfilling, so I set the nobackfill flag. For now, the crashing seems to have stopped. However, the cluster seems slow at the moment when trying to access the given PG via KVM/QEMU (RBD). Recap: * All monitors run Infernalis. * One OSD node runs Infernalis. * All other OSD nodes run Hammer. * One OSD on Infernalis is set to "out" and is stopped. This OSD seemed to contain one inconsistent PG. * Backfilling started. * After hours and hours of backfilling, OSDs started to crash. Other than restarting the "out" and stopped OSD for the time being (haven't tried that yet) I'm quite lost. Hopefully someone has some pointers for me. Regards, Kees On 20-08-18 13:23, Kees Meijs wrote: The given PG is back online, phew... Meanwhile, some OSDs still on Hammer seem to crash with errors alike: 2018-08-20 13:06:33.819569 7f8962b2f700 -1 osd/ReplicatedPG.cc: In function 'void ReplicatedPG::scan_range(int, int, PG::BackfillInterval*, ThreadPool::TPHandle&)' thread 7f8962b2f700 time 2018-08-20 13:06:33.709922 osd/ReplicatedPG.cc: 10115: FAILED assert(r >= 0) Restarting the OSDs seems to work. K. On 20-08-18 13:14, Kees Meijs wrote: Bad news: I've got a PG stuck in down+peering now. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time
Hi there, A few hours ago I started the given OSD again and gave it weight 1.0. Backfilling started and more PGs became active+clean. After a while the same crashing behaviour started to act up so I stopped the backfilling. Running with noout,nobackfill,norebalance,noscrub,nodeep-scrub flags now but at least it seems the cluster seems stable (fingers crossed...) Possible plan of attack: 1. Stopping all Infernalis OSDs. 2. Remove Ceph Infernalis packages from OSD node. 3. Install Hammer packages. 4. Start the OSDs (or maybe the package installation does this already.) Effectively this is an OSD downgrade. Is this supported or does Ceph "upgrade" data structures on disk as well? Recap: this would imply going from Infernalis back to Hammer. Any thoughts are more than welcome (maybe a completely different approach makes sense...) Meanwhile, I'll try to catch some sleep. Thanks, thanks! Best regards, Kees On 20-08-18 21:46, Kees Meijs wrote: Other than restarting the "out" and stopped OSD for the time being (haven't tried that yet) I'm quite lost. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ensure Hammer client compatibility
Hi Lincoln, We're looking at (now existing) RBD support using KVM/QEMU, so this is an upgrade path. Regards, Kees On 20-08-18 16:37, Lincoln Bryant wrote: What interfaces do your Hammer clients need? If you're looking at CephFS, we have had reasonable success moving our older clients (EL6) to NFS Ganesha with the Ceph FSAL. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time
Hello David, Thank you and I'm terribly sorry; I was unaware I was starting new threads. From the top of my mind I say "yes it'll fit" but obviously I make sure at first. Regards, Kees On 21-08-18 16:34, David Turner wrote: Ceph does not support downgrading OSDs. When you removed the single OSD, it was probably trying to move data onto the other OSDs in the node with Infernalis OSDs. I would recommend stopping every OSD in that node and marking them out so the cluster will rebalance without them. Assuming your cluster is able to get healthy after that, we'll see where things are. Also, please stop opening so many email threads about this same issue. It makes tracking this in the archives impossible. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time
Hi list, A little update: meanwhile we added a new node consisting of Hammer OSDs to ensure sufficient cluster capacity. The upgraded node with Infernalis OSDs is completely removed from the CRUSH map and the OSDs removed (obviously we didn't wipe the disks yet). At the moment we're still running using flags noout,nobackfill,noscrub,nodeep-scrub. Although now only Hammer OSDs reside, we still experience OSD crashes on backfilling so we're unable to achieve HEALTH_OK state. Using debug 20 level we're (mostly my coworker Willem Jan is) figuring out why the crashes happen exactly. Hopefully we'll figure it out. To be continued... Regards, Kees ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] All SSD cluster performance
Hi Maxime, Given your remark below, what kind of SATA SSD do you recommend for OSD usage? Thanks! Regards, Kees On 15-01-17 21:33, Maxime Guyot wrote: > I don’t have firsthand experience with the S3520, as Christian pointed out > their endurance doesn’t make them suitable for OSDs in most cases. I can only > advise you to keep a close eye on the SMART status of the SSDs. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Shrink cache target_max_bytes
Hi Cephers, Long story short: I'd like to shrink our cache pool a little. Is it safe to just alter cache target_max_byte and wait for objects to get evicted? Anything to take into account? Thanks! Regards, Kees ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Shrink cache target_max_bytes
Hi Cephers, Although I might be stating an obvious fact: altering the parameter works as advertised. The only issue I encountered was lowering the parameter too much at once results in some slow requests because the cache pool is "full". So in short: it works when lowering the parameter bit by bit. Regards, Kees ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade to Infernalis: OSDs crash all the time
Hi list, Between crashes we were able to allow the cluster to backfill as much as possible (all monitors Infernalis, OSDs being Hammer again). Leftover PGs wouldn't backfill until we removed files such as: 8.0M -rw-r--r-- 1 root root 8.0M Aug 24 23:56 temp\u3.bd\u0\u16175417\u2718__head_00BD__fffb 8.0M -rw-r--r-- 1 root root 8.0M Aug 28 05:51 temp\u3.bd\u0\u16175417\u3992__head_00BD__fffb 8.0M -rw-r--r-- 1 root root 8.0M Aug 30 03:40 temp\u3.bd\u0\u16175417\u4521__head_00BD__fffb 8.0M -rw-r--r-- 1 root root 8.0M Aug 31 03:46 temp\u3.bd\u0\u16175417\u4817__head_00BD__fffb 8.0M -rw-r--r-- 1 root root 8.0M Sep 5 19:44 temp\u3.bd\u0\u16175417\u6252__head_00BD__fffb 8.0M -rw-r--r-- 1 root root 8.0M Sep 6 14:44 temp\u3.bd\u0\u16175417\u6593__head_00BD__fffb 8.0M -rw-r--r-- 1 root root 8.0M Sep 7 10:21 temp\u3.bd\u0\u16175417\u6870__head_00BD__fffb Restarting the given OSD didn't seem necessary; backfilling started to work and at some point enough replicas were available for each PG. Finally deep scrubbing repaired the inconsistent PGs automagically and we arrived at HEALTH_OK again! Case closed: up to Jewel. For everyone involved: a big, big and even bigger thank you for all pointers and support! Regards, Kees On 10-09-18 16:43, Kees Meijs wrote: A little update: meanwhile we added a new node consisting of Hammer OSDs to ensure sufficient cluster capacity. The upgraded node with Infernalis OSDs is completely removed from the CRUSH map and the OSDs removed (obviously we didn't wipe the disks yet). At the moment we're still running using flags noout,nobackfill,noscrub,nodeep-scrub. Although now only Hammer OSDs reside, we still experience OSD crashes on backfilling so we're unable to achieve HEALTH_OK state. Using debug 20 level we're (mostly my coworker Willem Jan is) figuring out why the crashes happen exactly. Hopefully we'll figure it out. To be continued... ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ensure Hammer client compatibility
Hi list, Having finished our adventures with Infernalis we're now finally running Jewel (10.2.11) on all Ceph nodes. Woohoo! However, there's still KVM production boxes with block-rbd.so being linked to librados 0.94.10 which is Hammer. Current relevant status parts: health HEALTH_WARN crush map has legacy tunables (require bobtail, min is firefly) no legacy OSD present but 'sortbitwise' flag is not set Obviously we would like go to HEALTH_OK again without the warnings mentioned maintaining Hammer client support. Running ceph osd set require_jewel_osds seemed harmless in terms of client compatibility so that's done already. However, what about sortbitwise and tunables? Thanks, Kees On 21-08-18 03:47, Kees Meijs wrote: We're looking at (now existing) RBD support using KVM/QEMU, so this is an upgrade path. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ensure Hammer client compatibility
Hi again, I just read (and reread, and again) the chapter of Ceph Cookbook on upgrades and http://docs.ceph.com/docs/jewel/rados/operations/crush-map/#tunables and figured there's a way back if needed. The sortbitwise flag is set (repeering was almost instant) and tunables to "hammer". There's a lot of data shuffling going on now, so fingers crossed. Cheers, Kees On 12-11-18 09:14, Kees Meijs wrote: > However, what about sortbitwise and tunables? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Huge latency spikes
Hi Alex, What kind of clients do you use? Is it KVM (QEMU) using NBD driver, kernel, or...? Regards, Kees On 17-11-18 20:17, Alex Litvak wrote: > Hello everyone, > > I am trying to troubleshoot cluster exhibiting huge spikes of latency. > I cannot quite catch it because it happens during the light activity > and randomly affects one osd node out of 3 in the pool. > > This is a file store. > I see some osds exhibit applied latency of 400 ms, 1 minute load > average shuts to 60. Client commit latency with queue shoots to 300ms > and journal latency (return write ack for client) (journal on Intel > DC-S3710 SSD) shoots on 40 ms > > op_w_process_latency showed 250 ms and client read-modify-write > operation readable/applied latency jumped to 1.25 s on one of the OSDs > > I rescheduled the scrubbing and deep scrubbing and was watching ceph > -w activity so it is definitely not related. > > At the same time node shows 98 % cpu idle no significant changes in > memory utilization, no errors on network with bandwidth utilization > between 20 - 50 Mbit on client and back end networks > > OSD node has 12 OSDs (2TB rust) 2 partitioned SSD journal disks, 32 GB > RAM, dial 6 core / 12 thread CPUs > > This is perhaps the most relevant part of ceph config > > debug lockdep = 0/0 > debug context = 0/0 > debug crush = 0/0 > debug buffer = 0/0 > debug timer = 0/0 > debug journaler = 0/0 > debug osd = 0/0 > debug optracker = 0/0 > debug objclass = 0/0 > debug filestore = 0/0 > debug journal = 0/0 > debug ms = 0/0 > debug monc = 0/0 > debug tp = 0/0 > debug auth = 0/0 > debug finisher = 0/0 > debug heartbeatmap = 0/0 > debug perfcounter = 0/0 > debug asok = 0/0 > debug throttle = 0/0 > > [osd] > journal_dio = true > journal_aio = true > osd_journal = /var/lib/ceph/osd/$cluster-$id-journal/journal > osd_journal_size = 2048 ; journal size, in megabytes > osd crush update on start = false > osd mount options xfs = > "rw,noatime,inode64,logbsize=256k,delaylog,allocsize=4M" > osd_op_threads = 5 > osd_disk_threads = 4 > osd_pool_default_size = 2 > osd_pool_default_min_size = 1 > osd_pool_default_pg_num = 512 > osd_pool_default_pgp_num = 512 > osd_crush_chooseleaf_type = 1 > ; osd pool_default_crush_rule = 1 > ; new options 04.12.2015 > filestore_op_threads = 4 > osd_op_num_threads_per_shard = 1 > osd_op_num_shards = 25 > filestore_fd_cache_size = 64 > filestore_fd_cache_shards = 32 > filestore_fiemap = false > ; Reduce impact of scrub (needs cfq on osds) > osd_disk_thread_ioprio_class = "idle" > osd_disk_thread_ioprio_priority = 7 > osd_deep_scrub_interval = 1211600 > osd_scrub_begin_hour = 19 > osd_scrub_end_hour = 4 > osd_scrub_sleep = 0.1 > [client] > rbd_cache = true > rbd_cache_size = 67108864 > rbd_cache_max_dirty = 50331648 > rbd_cache_target_dirty = 33554432 > rbd_cache_max_dirty_age = 2 > rbd_cache_writethrough_until_flush = true > > OSD logs and system log at that time show nothing interesting. > > Any clue of what to look for in order to diagnose the load / latency > spikes would be really appreciated. > > Thank you > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Altering crush-failure-domain
Hi Cephers, Documentation on http://docs.ceph.com/docs/master/rados/operations/erasure-code/ states: > Choosing the right profile is important because it cannot be modified > after the pool is created: a new pool with a different profile needs > to be created and all objects from the previous pool moved to the new. Right, that makes sense. However, is it possible to "migrate" crush-failure-domain from osd to host, to rack and so on without copying pools? Regards, Kees -- https://nefos.nl/contact Nefos IT bv Ambachtsweg 25 (industrienummer 4217) 5627 BZ Eindhoven Nederland KvK 66494931 /Aanwezig op maandag, dinsdag, woensdag en vrijdag/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Altering crush-failure-domain
Thanks guys. Regards, Kees On 04-03-19 22:18, Smith, Eric wrote: > This will cause data migration. > > -Original Message- > From: ceph-users On Behalf Of Paul > Emmerich > Sent: Monday, March 4, 2019 2:32 PM > To: Kees Meijs > Cc: Ceph Users > Subject: Re: [ceph-users] Altering crush-failure-domain > > Yes, these parts of the profile are just used to create a crush rule. > You can change the crush rule like any other crush rule. > > > Paul > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > On Mon, Mar 4, 2019 at 8:13 PM Kees Meijs wrote: >> Hi Cephers, >> >> Documentation on >> http://docs.ceph.com/docs/master/rados/operations/erasure-code/ states: >> >> Choosing the right profile is important because it cannot be modified after >> the pool is created: a new pool with a different profile needs to be created >> and all objects from the previous pool moved to the new. >> >> >> Right, that makes sense. However, is it possible to "migrate" >> crush-failure-domain from osd to host, to rack and so on without copying >> pools? >> >> Regards, >> Kees >> >> -- >> https://nefos.nl/contact >> >> Nefos IT bv >> Ambachtsweg 25 (industrienummer 4217) >> 5627 BZ Eindhoven >> Nederland >> >> KvK 66494931 >> >> Aanwezig op maandag, dinsdag, woensdag en vrijdag >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Random slow requests without any load
Hi, Experienced similar issues. Our cluster internal network (completely separated) now has NOTRACK (no connection state tracking) iptables rules. In full: > # iptables-save > # Generated by xtables-save v1.8.2 on Wed Jul 17 14:57:38 2019 > *filter > :FORWARD DROP [0:0] > :OUTPUT ACCEPT [0:0] > :INPUT ACCEPT [0:0] > COMMIT > # Completed on Wed Jul 17 14:57:38 2019 > # Generated by xtables-save v1.8.2 on Wed Jul 17 14:57:38 2019 > *raw > :OUTPUT ACCEPT [0:0] > :PREROUTING ACCEPT [0:0] > -A OUTPUT -j NOTRACK > -A PREROUTING -j NOTRACK > COMMIT > # Completed on Wed Jul 17 14:57:38 2019 Ceph uses IPv4 in our case, but to be complete: > # ip6tables-save > # Generated by xtables-save v1.8.2 on Wed Jul 17 14:58:20 2019 > *filter > :OUTPUT ACCEPT [0:0] > :INPUT ACCEPT [0:0] > :FORWARD DROP [0:0] > COMMIT > # Completed on Wed Jul 17 14:58:20 2019 > # Generated by xtables-save v1.8.2 on Wed Jul 17 14:58:20 2019 > *raw > :OUTPUT ACCEPT [0:0] > :PREROUTING ACCEPT [0:0] > -A OUTPUT -j NOTRACK > -A PREROUTING -j NOTRACK > COMMIT > # Completed on Wed Jul 17 14:58:20 2019 Using this configuration, state tables never ever can fill up with dropped connections as effect. Cheers, Kees On 17-07-2019 11:27, Maximilien Cuony wrote: > Just a quick update about this if somebody else get the same issue: > > The problem was with the firewall. Port range and established > connection are allowed, but for some reasons it seems the tracking of > connections are lost, leading to a strange state where one machine > refuse data (RST are replied) and the sender never get the RST packed > (even with 'related' packets allowed). > > There was a similar post on this list in February ("Ceph and TCP > States") where lossing of connections in conntrack created issues, but > the fix, net.netfilter.nf_conntrack_tcp_be_liberal=1 did not improve > that particular case. > > As a workaround, we installed lighter rules for the firewall (allowing > all packets from machines inside the cluster by default) and that > "fixed" the issue :) > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Blacklisting during boot storm
Hi list, Yesterday afternoon we experienced a compute node outage in our OpenStack (obviously Ceph backed) cluster. We tried to (re)start compute instances again as fast as possible, resulting in some KVM/RBD clients getting blacklisted. The problem was spotted very quickly so we could remove the listing at once while the cluster was coping fine with the boot storm. Question: what can we do to prevent the blacklisting? Or, does it make sense to completely disable the mechanism (doesn't feel right) or maybe configure it differently? Thanks! Regards, Kees -- https://nefos.nl/contact Nefos IT bv Ambachtsweg 25 (industrienummer 4217) 5627 BZ Eindhoven Nederland KvK 66494931 /Aanwezig op maandag, dinsdag, woensdag en vrijdag/ ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Blacklisting during boot storm
Hi Paul, Okay, thanks for clarifying. If we see the phenomenon again, we'll just leave it be. K. On 03-08-2019 14:33, Paul Emmerich wrote: > The usual reason for blacklisting RBD clients is breaking an exclusive > lock because the previous owner seemed to have crashed. > Blacklisting the old owner is necessary in case you had a network > partition and not a crash. Note that this is entirely normal and no > reason to worry. ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com