[ceph-users] How does the Cash app refund go into account immediately?
On the other hand, Square revived this application to keep cash into the Cash app balance, until you move it genuinely into the monetary equalization. To move cash into your budgetary parity which you get for any portion or Cash app refund, you need to press the Cash Out catch so to speak. https://www.contact-customer-service.net/blog/cash-app-refund/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Mix-up while sending money through Cash App? Talk to a Cash App representative.
If you've fail to send money on account of some tech issues, by then you can use the help from the specialized help districts or you can watch help chronicles on Youtube. Despite that, you can call the assistance gathering and select to talk to a Cash App representative for getting the issue settled. https://www.cash-app-customer-service.com/blog/talk-to-a-cash-app-representative ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: block.db/block.wal device performance dropped after upgrade to 14.2.10
I cannot confirm that more memory target will solve the problem completly. In my case the OSDs have 14GB memory target and I did have huge user IO impact while snaptrim (many slow ops the whole time). Since I set bluefs_bufferd_io=true it seems to work without issue. In my cluster I don't use rgw. But I don't see why different types of access the cluster do affect the form the kernel manages its memory. My experience why the kernel begins to swap are mostly numa related and/or memory fragmentation. Manuel On Thu, 6 Aug 2020 15:06:49 -0500 Mark Nelson wrote: > a 2GB memory target will absolutely starve the OSDs of memory for > rocksdb block cache which probably explains why you are hitting the > disk for reads and a shared page cache is helping so much. It's > definitely more memory efficient to have a page cache scheme rather > than having more cache for each OSD, but for NVMe drives you can end > up having more contention and overhead. For older systems with > slower devices and lower amounts of memory the page cache is probably > a win. FWIW with a 4GB+ memory target I suspect you would see far > fewer cache miss reads (but obviously you can't do that on your > nodes). > > > Mark > > > On 8/6/20 1:47 PM, Vladimir Prokofev wrote: > > In my case I only have 16GB RAM per node with 5 OSD on each of > > them, so I actually have to tune osd_memory_target=2147483648 > > because with the default value of 4GB my osd processes tend to get > > killed by OOM. That is what I was looking into before the correct > > solution. I disabled osd_memory_target limitation essentially > > setting it to default 4GB > > - it helped in a sense that workload on the block.db device > > significantly dropped, but overall pattern was not the same - for > > example there still were no merges on the block.db device. It all > > came back to the usual pattern with bluefs_buffered_io=true. > > osd_memory_target limitation was implemented somewhere around 10 > > > 12 release upgrade I think, before memory auto scaling feature for > > bluestore was introduced - that's when my osds started to get OOM. > > They worked fine before that. > > > > чт, 6 авг. 2020 г. в 20:28, Mark Nelson : > > > >> Yeah, there are cases where enabling it will improve performance as > >> rocksdb can then used the page cache as a (potentially large) > >> secondary cache beyond the block cache and avoid hitting the > >> underlying devices for reads. Do you have a lot of spare memory > >> for page cache on your OSD nodes? You may be able to improve the > >> situation with bluefs_buffered_io=false by increasing the > >> osd_memory_target which should give the rocksdb block cache more > >> memory to work with directly. One downside is that we currently > >> double cache onodes in both the rocksdb cache and bluestore onode > >> cache which hurts us when memory limited. We have some > >> experimental work that might help in this area by better balancing > >> bluestore onode and rocksdb block caches but it needs to be > >> rebased after Adam's column family sharding work. > >> > >> The reason we had to disable bluefs_buffered_io again was that we > >> had users with certain RGW workloads where the kernel started > >> swapping large amounts of memory on the OSD nodes despite > >> seemingly have free memory available. This caused huge latency > >> spikes and IO slowdowns (even stalls). We never noticed it in our > >> QA test suites and it doesn't appear to happen with RBD workloads > >> as far as I can tell, but when it does happen it's really painful. > >> > >> > >> Mark ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] A high-class Hyderabad escort, who teams up with another escort for your ultimate fun
Hi guys, thanks for visiting my profile page. I am Sherya a young high-class Hyderabad escort, whose ultimate objective is to offer the desired fun and entertainment. You can pick up two Hyderabad escorts from Hyderabad Escorts Service and believe me, when you visit this website, the options are really endless. I have no problem in teaming-up with another high-profile independent Hyderabad escort in a bid to satisfy you. Surely, two cuties are going to double your pleasure. You’ll definitely enjoy being sand witched between two cute and curvaceous bodies. Why be entertained by one Hyderabad call girl, when you can have two such beauties. I am among the few Hyderabad gals who mix business with pleasure. One of the main advantages of hiring me is my great skill to blend into any social situation. So, if there is a party you don’t want to go alone, I can be your true companion. For sure, I’ll be the center of attraction for the guests and most of the eyes will be focused on me. For the perfect companionship inside and outside of the bedroom, no option is better than me. Call right away. www.shreyasehgal.com/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Can you block gmail.com or so!!!
While you are thinking about the mailing list configuration, can you consider that it is very DMARC-unfriendly, which is why I have to use an email address from an ISP domain that does not publish DMARC. If I post from my normal email accounts: * We publish SPF, DKIM & DMARC policies that request rejection of emails purportedly from our domain that fail both SPD & DKIM. We also request DMARC forensic reports. * I post to the list, and the list "forwards" the email to everyone with my email as the sender, and modifies the subject by prepending [ceph-users] * Modifying the subject invalidates my DKIM signature * Many receiving domains check DMARC, and see that I fail SPF by trying to send from an unauthorised relay (i.e. the mailing list server) and that I fail DKIM as the signature is now invalid due to the subject change * All those domains reject my message, some sending me bounce messages * All of the domains send me daily reject reports so I can see that many are being rejected * Some send me a forensic report for each bounced message (I have this enabled after one of our domains was used as the sender address for a mass-spamming toolkit) * So for each message I post I can receive 50-100 blowback messages, and know that most people haven't seen my posts! Forwarding a message with the original sender, as well as modifying the message, is a no-no. It's already a problem, and will continue to grow as a problem as spam mitigations increase. Hope that helps explain the issue. Regards, Chris On 06/08/2020 20:14, David Galloway wrote: Oh, interesting. You appear to be correct. I'm running each of the mailing lists' services in their own containers so the private IP makes sense. I just commented on a FR for Hyperkitty to disable posting via Web UI: https://gitlab.com/mailman/hyperkitty/-/issues/264 Aside from that, I can confirm my new SPF filter has already blocked one spam e-mail from getting through so that's good. Thanks for the tip. On 8/6/20 2:34 PM, Tony Lill wrote: I looked at the received-from headers, and it looks to me like these messages are being fed into the list from the web interface. The first received from is from mailman web and a private IP. On 8/6/20 2:09 PM, David Galloway wrote: Hi all, As previously mentioned, blocking the gmail domain isn't a feasible solution since the vast majority of @gmail.com subscribers (about 500 in total) are likely legitimate Ceph users. A mailing list member recommended some additional SPF checking a couple weeks ago which I just implemented today. I think what's actually happening is a bot will subscribe using a gmail address and then "clicks" the confirmation link. They then spam from a different domain pretending to be coming from gmail.com but it's not. The new config I put in place should block that. Hopefully this should cut down on the spam. I took over the Ceph mailing lists last year and it's been a never-ending cat and mouse game of spam filters/services, configuration changes, etc. I'm still learning how to be a mail admin so your patience and understanding is appreciated. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Are Yahoo Mail Down deleted emails forever?
If you delete a message from Inbox, then it moves into the Trash folder, but if you delete a message from the Trash folder then emails deleted forever from your Yahoo mail account. But, there is a backup created, from here you can restore it from any 60 days. If Yahoo Mail Down, then it only stops to access this site for a few times, not more. https://www.contact-customer-service.net/blog/yahoo-mail-down/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Approach Epson Customer Service If Unable To Print Any Document.
Are you facing any kind of problems and hurdles during the course of epson printer? Consider taking help directly from the Epson Customer Service who will help you to resolve all your problems within a few seconds. So, you need instant technical help from the professionals who will surely assist you out in no time. https://www.epsonprintersupportpro.net/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Can you block gmail.com or so!!!
Thanks Chris for your details. Notice, though, that all/most mailing lists, especially all using mailman, do have this issue, which is caused by the DMARC standard neglecting to mind how most mailing lists work(ed at the time). That is, prepending the original subject's line with the list's name in brackets. So this isn't an issue with this mailing list per se. More info (and how to circumvent this problem in the future) can be found here: https://wiki.list.org/DEV/DMARC Anyway, thanks David for your hard work! -Ursprüngliche Nachricht- Von: Chris Palmer Gesendet: Freitag, 7. August 2020 10:25 An: ceph-users@ceph.io Betreff: [ceph-users] Re: Can you block gmail.com or so!!! While you are thinking about the mailing list configuration, can you consider that it is very DMARC-unfriendly, which is why I have to use an email address from an ISP domain that does not publish DMARC. If I post from my normal email accounts: * We publish SPF, DKIM & DMARC policies that request rejection of emails purportedly from our domain that fail both SPD & DKIM. We also request DMARC forensic reports. * I post to the list, and the list "forwards" the email to everyone with my email as the sender, and modifies the subject by prepending [ceph-users] * Modifying the subject invalidates my DKIM signature * Many receiving domains check DMARC, and see that I fail SPF by trying to send from an unauthorised relay (i.e. the mailing list server) and that I fail DKIM as the signature is now invalid due to the subject change * All those domains reject my message, some sending me bounce messages * All of the domains send me daily reject reports so I can see that many are being rejected * Some send me a forensic report for each bounced message (I have this enabled after one of our domains was used as the sender address for a mass-spamming toolkit) * So for each message I post I can receive 50-100 blowback messages, and know that most people haven't seen my posts! Forwarding a message with the original sender, as well as modifying the message, is a no-no. It's already a problem, and will continue to grow as a problem as spam mitigations increase. Hope that helps explain the issue. Regards, Chris On 06/08/2020 20:14, David Galloway wrote: > Oh, interesting. You appear to be correct. I'm running each of the > mailing lists' services in their own containers so the private IP > makes sense. > > I just commented on a FR for Hyperkitty to disable posting via Web UI: > https://gitlab.com/mailman/hyperkitty/-/issues/264 > > Aside from that, I can confirm my new SPF filter has already blocked > one spam e-mail from getting through so that's good. > > Thanks for the tip. > > On 8/6/20 2:34 PM, Tony Lill wrote: >> I looked at the received-from headers, and it looks to me like these >> messages are being fed into the list from the web interface. The >> first received from is from mailman web and a private IP. >> >> On 8/6/20 2:09 PM, David Galloway wrote: >>> Hi all, >>> >>> As previously mentioned, blocking the gmail domain isn't a feasible >>> solution since the vast majority of @gmail.com subscribers (about >>> 500 in >>> total) are likely legitimate Ceph users. >>> >>> A mailing list member recommended some additional SPF checking a >>> couple weeks ago which I just implemented today. I think what's >>> actually happening is a bot will subscribe using a gmail address and >>> then "clicks" the confirmation link. They then spam from a >>> different domain pretending to be coming from gmail.com but it's >>> not. The new config I put in place should block that. >>> >>> Hopefully this should cut down on the spam. I took over the Ceph >>> mailing lists last year and it's been a never-ending cat and mouse >>> game of spam filters/services, configuration changes, etc. I'm >>> still learning how to be a mail admin so your patience and >>> understanding is appreciated. >>> >> >> ___ >> ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an >> email to ceph-users-le...@ceph.io >> > ___ > ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an > email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] How i can use bucket policy with subuser
Hi all. I have a cluster 14.2.10 versions. First, I create user hoannv radosgw-admin user create --uid=hoannv --display-name=hoannv Then create subuser hoannv:subuser1 by command : radosgw-admin subuser create --uid=hoannv --subuser=subuser1 --key-type=swift --gen-secret --access=full hoannv user create bucket : test1 hoannv:subuser1 subuser also full access permission to test1 bucket I want to subuser has no permission then i can use bucket policy to subuser. How i can do this? Thanks. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Get an Error Free Printing with the Help of HP Support
Printing of a document or a picture is not a big task but sometimes errors make this one very difficult. A minor problem may stop the whole printing task. If you are using HP printer then support.hp.com is the best option for you to get rid of all printing problems. You will get complete solution for every single query regarding HP printers. https://www.hpprintersupportpro.us/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] OSDs flapping since upgrade to 14.2.10
Hi list, since our upgrade 14.2.9 -> 14.2.10 we observe flapping OSDs: * The mons claim every few minutes: 2020-08-07 09:49:09.783648 osd.243 (osd.243) 246 : cluster [WRN] Monitor daemon marked osd.243 down, but it is still running 2020-08-07 10:04:40.753704 osd.243 (osd.243) 248 : cluster [WRN] Monitor daemon marked osd.243 down, but it is still running 2020-08-07 10:07:21.187945 osd.253 (osd.253) 469 : cluster [WRN] Monitor daemon marked osd.253 down, but it is still running 2020-08-07 10:04:35.440547 mon.cephmon01 (mon.0) 390132 : cluster [DBG] osd.243 reported failed by osd.33 2020-08-07 10:04:35.508412 mon.cephmon01 (mon.0) 390133 : cluster [DBG] osd.243 reported failed by osd.187 2020-08-07 10:04:35.508529 mon.cephmon01 (mon.0) 390134 : cluster [INF] osd.243 failed (root=default,datacenter=of,row=row-of-02,host=cephosd16) (2 reporters from different host after 44.000150 >= grace 25.935545) 2020-08-07 10:04:35.695171 mon.cephmon01 (mon.0) 390135 : cluster [DBG] osd.243 reported failed by osd.203 2020-08-07 10:04:35.771704 mon.cephmon01 (mon.0) 390136 : cluster [DBG] osd.243 reported failed by osd.163 2020-08-07 10:04:41.588530 mon.cephmon01 (mon.0) 390148 : cluster [INF] osd.243 [v2:10.198.10.16:6882/6611,v1:10.198.10.16:6885/6611] boot 2020-08-07 10:04:40.753704 osd.243 (osd.243) 248 : cluster [WRN] Monitor daemon marked osd.243 down, but it is still running 2020-08-07 10:04:40.753712 osd.243 (osd.243) 249 : cluster [DBG] map e2683535 wrongly marked me down at e2683534 osd.33 says: 2020-08-07 10:04:35.437 7fcaaa4f3700 -1 osd.33 2683533 heartbeat_check: no reply from 10.198.10.16:6802 osd.243 since back 2020-08-07 10:03:51.223911 front 2020-08-07 10:03:51.224322 (oldest deadline 2020-08-07 10:04:35.322704) osd.243 says: 2020-08-07 10:03:55.065 7f0d33911700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0d13acb700' had timed out after 15 2020-08-07 10:03:55.065 7f0d34112700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0d13acb700' had timed out after 15 [.. ~3000(!) Lines ..] 2020-08-07 10:04:33.644 7f0d33110700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0d13acb700' had timed out after 15 2020-08-07 10:04:33.688 7f0d13acb700 0 bluestore(/var/lib/ceph/osd/ceph-243) log_latency_fn slow operation observed for upper_bound, latency = 20.9013s, after = omap_iterator(cid = 19.58a_head, oid = #19:51a21a27::: .dir.default.223091333.1.3:head#) 2020-08-07 10:04:33.688 7f0d13acb700 1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f0d13acb700' had timed out after 15 2020-08-07 10:04:40.748 7f0d2279b700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.243 down, but it is still running 2020-08-07 10:04:40.748 7f0d2279b700 0 log_channel(cluster) log [DBG] : map e2683535 wrongly marked me down at e2683534 * as a consequence, old deep-scrubs did not finish, because they would be interrupted -> ' pgs not deep-scrubbed in time' for the latter, I increased the op-thread-timeout back to the pre 12(!).2.11 value of 30 i`m am not sure, if we really have a problem, but it does not look healthy. Any ideas, thoughts? regards, Ingo -- Ingo Reimann Teamleiter Technik [ https://www.dunkel.de/ ] Dunkel GmbH Philipp-Reis-Straße 2 65795 Hattersheim Fon: +49 6190 889-100 Fax: +49 6190 889-399 eMail: supp...@dunkel.de https://www.Dunkel.de/ Amtsgericht Frankfurt/Main HRB: 37971 Geschäftsführer: Axel Dunkel Ust-ID: DE 811622001 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] RGW Garbage Collection (GC) does not make progress
Hi, On a Nautilus 14.2.8 cluster I'm seeing a large amount of GC data and the GC on the RGW does not seem to make progress. The .rgw.gc pool contains 39GB of data spread out over 32 objects. In the logs we do see references of the RGW GC doing work and it says it is removing objects. Those objects however still exist and only their 'refcount' attribute is updated. 2020-08-07 10:28:01.946 7fbd79f9a7c0 5 garbage collection: RGWGC::process removing .rgw.buckets.ec-v2:default.1834866551.1__multipart_fedora-data/datastreamStore/XX-YYY-5/5c/f3/info%3Afedora%2FCH-001514-5%3A36%2FORIGINAL%2FORIGINAL.0.2~yKGz1-SLXINhZvm3cQMBWgx9BJVoH5j.1 2020-08-07 10:28:01.946 7fbd79f9a7c0 5 garbage collection: RGWGC::process removing .rgw.buckets.ec-v2:default.1834866551.1__shadow_fedora-data/datastreamStore/XXX-YYY-5/5c/f3/info%3Afedora%2FCH-001514-5%3A36%2FORIGINAL%2FORIGINAL.0.2~yKGz1-SLXINhZvm3cQMBWgx9BJVoH5j.1_1 This objects however still exist, 'rados stat' shows me: mtime 2020-08-07 12:28:44.00, size 4194304 Has anybody seen this before and has clues on what this could be? Wido ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: block.db/block.wal device performance dropped after upgrade to 14.2.10
Sure. totalusedfree shared buff/cache available Mem: 394582604 355494400 52829321784 3380527229187220 Swap: 1047548 1047548 0 On the node are 24 14TB OSDs with 14G configured memory target. Manuel On Fri, 7 Aug 2020 11:24:53 +0200 Stefan Kooman wrote: > Can you share the amount of buffer cache available on your storage > nodes? > > We run the OSDs with osd_memory_target=11G and 22 GB of buffer cache > available. And with the buffer on (Mimic 13.2.8). > > Thanks, > > Gr. Stefan > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSDs flapping since upgrade to 14.2.10
Hi, Maybe this help, You can increase the osd_op_tp thread in ceph conf to something similar to: [osd] osd_op_thread_suicide_timeout = 900 osd_op_thread_timeout = 300 osd_recovery_thread_timeout = 300 Regards -Mensaje original- De: Ingo Reimann Enviado el: viernes, 7 de agosto de 2020 12:08 Para: ceph-users Asunto: [ceph-users] OSDs flapping since upgrade to 14.2.10 Hi list, since our upgrade 14.2.9 -> 14.2.10 we observe flapping OSDs: * The mons claim every few minutes: 2020-08-07 09:49:09.783648 osd.243 (osd.243) 246 : cluster [WRN] Monitor daemon marked osd.243 down, but it is still running 2020-08-07 10:04:40.753704 osd.243 (osd.243) 248 : cluster [WRN] Monitor daemon marked osd.243 down, but it is still running 2020-08-07 10:07:21.187945 osd.253 (osd.253) 469 : cluster [WRN] Monitor daemon marked osd.253 down, but it is still running 2020-08-07 10:04:35.440547 mon.cephmon01 (mon.0) 390132 : cluster [DBG] osd.243 reported failed by osd.33 2020-08-07 10:04:35.508412 mon.cephmon01 (mon.0) 390133 : cluster [DBG] osd.243 reported failed by osd.187 2020-08-07 10:04:35.508529 mon.cephmon01 (mon.0) 390134 : cluster [INF] osd.243 failed (root=default,datacenter=of,row=row-of-02,host=cephosd16) (2 reporters from different host after 44.000150 >= grace 25.935545) 2020-08-07 10:04:35.695171 mon.cephmon01 (mon.0) 390135 : cluster [DBG] osd.243 reported failed by osd.203 2020-08-07 10:04:35.771704 mon.cephmon01 (mon.0) 390136 : cluster [DBG] osd.243 reported failed by osd.163 2020-08-07 10:04:41.588530 mon.cephmon01 (mon.0) 390148 : cluster [INF] osd.243 [v2:10.198.10.16:6882/6611,v1:10.198.10.16:6885/6611] boot 2020-08-07 10:04:40.753704 osd.243 (osd.243) 248 : cluster [WRN] Monitor daemon marked osd.243 down, but it is still running 2020-08-07 10:04:40.753712 osd.243 (osd.243) 249 : cluster [DBG] map e2683535 wrongly marked me down at e2683534 osd.33 says: 2020-08-07 10:04:35.437 7fcaaa4f3700 -1 osd.33 2683533 heartbeat_check: no reply from 10.198.10.16:6802 osd.243 since back 2020-08-07 10:03:51.223911 front 2020-08-07 10:03:51.224322 (oldest deadline 2020-08-07 10:04:35.322704) osd.243 says: 2020-08-07 10:03:55.065 7f0d33911700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0d13acb700' had timed out after 15 2020-08-07 10:03:55.065 7f0d34112700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0d13acb700' had timed out after 15 [.. ~3000(!) Lines ..] 2020-08-07 10:04:33.644 7f0d33110700 1 heartbeat_map is_healthy 'OSD::osd_op_tp thread 0x7f0d13acb700' had timed out after 15 2020-08-07 10:04:33.688 7f0d13acb700 0 bluestore(/var/lib/ceph/osd/ceph-243) log_latency_fn slow operation observed for upper_bound, latency = 20.9013s, after = omap_iterator(cid = 19.58a_head, oid = #19:51a21a27::: .dir.default.223091333.1.3:head#) 2020-08-07 10:04:33.688 7f0d13acb700 1 heartbeat_map reset_timeout 'OSD::osd_op_tp thread 0x7f0d13acb700' had timed out after 15 2020-08-07 10:04:40.748 7f0d2279b700 0 log_channel(cluster) log [WRN] : Monitor daemon marked osd.243 down, but it is still running 2020-08-07 10:04:40.748 7f0d2279b700 0 log_channel(cluster) log [DBG] : map e2683535 wrongly marked me down at e2683534 * as a consequence, old deep-scrubs did not finish, because they would be interrupted -> ' pgs not deep-scrubbed in time' for the latter, I increased the op-thread-timeout back to the pre 12(!).2.11 value of 30 i`m am not sure, if we really have a problem, but it does not look healthy. Any ideas, thoughts? regards, Ingo -- Ingo Reimann Teamleiter Technik [ https://www.dunkel.de/ ] Dunkel GmbH Philipp-Reis-Straße 2 65795 Hattersheim Fon: +49 6190 889-100 Fax: +49 6190 889-399 eMail: supp...@dunkel.de https://www.Dunkel.de/ Amtsgericht Frankfurt/Main HRB: 37971 Geschäftsführer: Axel Dunkel Ust-ID: DE 811622001 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSDs flapping since upgrade to 14.2.10
Hi Stefan, Hi Manuel, thanks for your quick advices. In fact, since i set "ceph config set osd bluefs_buffered_io true", the problems disappeared. We have lots of RAM in our osd hosts, so buffering is ok. I`ll trak this issue down further after the weekend! best regards, Ingo - Ursprüngliche Mail - Von: "Stefan Kooman" An: "ceph-users" Gesendet: Freitag, 7. August 2020 12:24:08 Betreff: [ceph-users] Re: OSDs flapping since upgrade to 14.2.10 On 2020-08-07 12:07, Ingo Reimann wrote: > i`m am not sure, if we really have a problem, but it does not look healthy. It might be related to the change that is mentioned in another thread: "block.db/block.wal device performance dropped after upgrade to 14.2.10" TL;DR: bluefs_buffered_io has been changed to "false" in 14.2.10. It doesn't use buffer cache in that case, and in certain workloads (i.e. snap trimming) this seem to have a big impact, even for environments that have large osd_memory_target. I would change that back to "true" ("ceph config set osd bluefs_buffered_io true" should do the trick). Not sure if the OSDs need a restart afterwards, as the config change seem to be effectve immediately for running daemons. Gr. Stefan -- | BIT BV https://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Ingo Reimann Teamleiter Technik [ https://www.dunkel.de/ ] Dunkel GmbH Philipp-Reis-Straße 2 65795 Hattersheim Fon: +49 6190 889-100 Fax: +49 6190 889-399 eMail: supp...@dunkel.de https://www.Dunkel.de/ Amtsgericht Frankfurt/Main HRB: 37971 Geschäftsführer: Axel Dunkel Ust-ID: DE 811622001 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: block.db/block.wal device performance dropped after upgrade to 14.2.10
It's quite possible that the issue is really about rocksdb living on top of bluefs with bluefs_buffered_io and rgw causing a ton of OMAP traffic. rgw is the only case so far where the issue has shown up, but it was significant enough that we didn't feel like we could leave bluefs_buffered_io enabled. In your case with a 14GB target per OSD, do you still see significantly increased disk reads with blufs_buffered_io=false? Mark On 8/7/20 2:27 AM, Manuel Lausch wrote: I cannot confirm that more memory target will solve the problem completly. In my case the OSDs have 14GB memory target and I did have huge user IO impact while snaptrim (many slow ops the whole time). Since I set bluefs_bufferd_io=true it seems to work without issue. In my cluster I don't use rgw. But I don't see why different types of access the cluster do affect the form the kernel manages its memory. My experience why the kernel begins to swap are mostly numa related and/or memory fragmentation. Manuel On Thu, 6 Aug 2020 15:06:49 -0500 Mark Nelson wrote: a 2GB memory target will absolutely starve the OSDs of memory for rocksdb block cache which probably explains why you are hitting the disk for reads and a shared page cache is helping so much. It's definitely more memory efficient to have a page cache scheme rather than having more cache for each OSD, but for NVMe drives you can end up having more contention and overhead. For older systems with slower devices and lower amounts of memory the page cache is probably a win. FWIW with a 4GB+ memory target I suspect you would see far fewer cache miss reads (but obviously you can't do that on your nodes). Mark On 8/6/20 1:47 PM, Vladimir Prokofev wrote: In my case I only have 16GB RAM per node with 5 OSD on each of them, so I actually have to tune osd_memory_target=2147483648 because with the default value of 4GB my osd processes tend to get killed by OOM. That is what I was looking into before the correct solution. I disabled osd_memory_target limitation essentially setting it to default 4GB - it helped in a sense that workload on the block.db device significantly dropped, but overall pattern was not the same - for example there still were no merges on the block.db device. It all came back to the usual pattern with bluefs_buffered_io=true. osd_memory_target limitation was implemented somewhere around 10 > 12 release upgrade I think, before memory auto scaling feature for bluestore was introduced - that's when my osds started to get OOM. They worked fine before that. чт, 6 авг. 2020 г. в 20:28, Mark Nelson : Yeah, there are cases where enabling it will improve performance as rocksdb can then used the page cache as a (potentially large) secondary cache beyond the block cache and avoid hitting the underlying devices for reads. Do you have a lot of spare memory for page cache on your OSD nodes? You may be able to improve the situation with bluefs_buffered_io=false by increasing the osd_memory_target which should give the rocksdb block cache more memory to work with directly. One downside is that we currently double cache onodes in both the rocksdb cache and bluestore onode cache which hurts us when memory limited. We have some experimental work that might help in this area by better balancing bluestore onode and rocksdb block caches but it needs to be rebased after Adam's column family sharding work. The reason we had to disable bluefs_buffered_io again was that we had users with certain RGW workloads where the kernel started swapping large amounts of memory on the OSD nodes despite seemingly have free memory available. This caused huge latency spikes and IO slowdowns (even stalls). We never noticed it in our QA test suites and it doesn't appear to happen with RBD workloads as far as I can tell, but when it does happen it's really painful. Mark ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSDs flapping since upgrade to 14.2.10
Hi Ingo, If you are able and have lots of available memory, could you also try setting it to false but increasing the osd_memory_target size? I'd like to understand a little bit deeper what's going on here. Ultimately I don't want our only line of defense against slow snap trimming to be having page cache available! Mark On 8/7/20 6:51 AM, Ingo Reimann wrote: Hi Stefan, Hi Manuel, thanks for your quick advices. In fact, since i set "ceph config set osd bluefs_buffered_io true", the problems disappeared. We have lots of RAM in our osd hosts, so buffering is ok. I`ll trak this issue down further after the weekend! best regards, Ingo - Ursprüngliche Mail - Von: "Stefan Kooman" An: "ceph-users" Gesendet: Freitag, 7. August 2020 12:24:08 Betreff: [ceph-users] Re: OSDs flapping since upgrade to 14.2.10 On 2020-08-07 12:07, Ingo Reimann wrote: i`m am not sure, if we really have a problem, but it does not look healthy. It might be related to the change that is mentioned in another thread: "block.db/block.wal device performance dropped after upgrade to 14.2.10" TL;DR: bluefs_buffered_io has been changed to "false" in 14.2.10. It doesn't use buffer cache in that case, and in certain workloads (i.e. snap trimming) this seem to have a big impact, even for environments that have large osd_memory_target. I would change that back to "true" ("ceph config set osd bluefs_buffered_io true" should do the trick). Not sure if the OSDs need a restart afterwards, as the config change seem to be effectve immediately for running daemons. Gr. Stefan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: block.db/block.wal device performance dropped after upgrade to 14.2.10
On 2020-08-07 09:27, Manuel Lausch wrote: > I cannot confirm that more memory target will solve the problem > completly. In my case the OSDs have 14GB memory target and I did have > huge user IO impact while snaptrim (many slow ops the whole time). Since > I set bluefs_bufferd_io=true it seems to work without issue. > In my cluster I don't use rgw. But I don't see why > different types of access the cluster do affect the form the kernel > manages its memory. My experience why the kernel begins to swap are > mostly numa related and/or memory fragmentation. Can you share the amount of buffer cache available on your storage nodes? We run the OSDs with osd_memory_target=11G and 22 GB of buffer cache available. And with the buffer on (Mimic 13.2.8). Thanks, Gr. Stefan -- | BIT BV https://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSDs flapping since upgrade to 14.2.10
Hi Mark, i`ll check that after the weekend! Ingo - Ursprüngliche Mail - Von: "Mark Nelson" An: "ceph-users" Gesendet: Freitag, 7. August 2020 15:15:08 Betreff: [ceph-users] Re: OSDs flapping since upgrade to 14.2.10 Hi Ingo, If you are able and have lots of available memory, could you also try setting it to false but increasing the osd_memory_target size? I'd like to understand a little bit deeper what's going on here. Ultimately I don't want our only line of defense against slow snap trimming to be having page cache available! Mark On 8/7/20 6:51 AM, Ingo Reimann wrote: > Hi Stefan, Hi Manuel, > > thanks for your quick advices. > > In fact, since i set "ceph config set osd bluefs_buffered_io true", the > problems disappeared. We have lots of RAM in our osd hosts, so buffering is > ok. I`ll trak this issue down further after the weekend! > > best regards, > Ingo > > - Ursprüngliche Mail - > Von: "Stefan Kooman" > An: "ceph-users" > Gesendet: Freitag, 7. August 2020 12:24:08 > Betreff: [ceph-users] Re: OSDs flapping since upgrade to 14.2.10 > > On 2020-08-07 12:07, Ingo Reimann wrote: >> i`m am not sure, if we really have a problem, but it does not look healthy. > It might be related to the change that is mentioned in another thread: > "block.db/block.wal device performance dropped after upgrade to 14.2.10" > > TL;DR: bluefs_buffered_io has been changed to "false" in 14.2.10. It > doesn't use buffer cache in that case, and in certain workloads (i.e. > snap trimming) this seem to have a big impact, even for environments > that have large osd_memory_target. > > I would change that back to "true" ("ceph config set osd > bluefs_buffered_io true" should do the trick). Not sure if the OSDs need > a restart afterwards, as the config change seem to be effectve > immediately for running daemons. > > Gr. Stefan > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io -- Ingo Reimann Teamleiter Technik [ https://www.dunkel.de/ ] Dunkel GmbH Philipp-Reis-Straße 2 65795 Hattersheim Fon: +49 6190 889-100 Fax: +49 6190 889-399 eMail: supp...@dunkel.de https://www.Dunkel.de/ Amtsgericht Frankfurt/Main HRB: 37971 Geschäftsführer: Axel Dunkel Ust-ID: DE 811622001 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: block.db/block.wal device performance dropped after upgrade to 14.2.10
Thinking about this a little more, one thing that I remember when I was writing the priority cache manager is that in some cases I saw strange behavior with the rocksdb block cache when compaction was performed. It appeared that the entire contents of the cache could be invalidated. I guess that would only make sense if it was trimming old entries from the cache instead of entries associated with (now deleted) sst files or perhaps waiting to delete all SST files until the end of the compaction cycle thus forcing old entries out of the cache and then invalidating the whole works. In any event, I wonder if having the secondary page cache is enough on your clusters to sort of get around all of this by still having the SST files associated with the previously heavily used blocks in page cache kicking around until compaction completes. Maybe the combination of snap trimming or other background work along with compaction is just totally thrashing the rocksdb block cache. For folks that feel comfortable watching IO hitting your DB devices, can you see if you have increased bursts of reads to the DB device after a compaction event has occurred? They look like this in the OSD logs: 2020-08-04T17:15:56.603+ 7fb0cf60d700 4 rocksdb: (Original Log Time 2020/08/04-17:15:56.603585) EVENT_LOG_v1 {"time_micros": 1596561356603574, "job": 5, "event": "compaction_finished", "compaction_time_micros": 744532, "compaction_time_cpu_micros": 607655, "output_level": 1, "num_output_files": 2, "total_output_size": 84712923, "num_input_records": 1714260, "num_output_records": 658541, "num_subcompactions": 1, "output_compression": "NoCompression", "num_single_delete_mismatches": 0, "num_single_delete_fallthrough": 0, "lsm_state": [0, 2, 0, 0, 0, 0, 0]} You can also run this tool to get a nicely formatted list of them, though I don't have it reporting timestamps, just the time offset from the start of the log so looking at the OSD logs directly would be easier to match up timestamps. https://github.com/ceph/cbt/blob/master/tools/ceph_rocksdb_log_parser.py Mark On 8/6/20 8:07 AM, Vladimir Prokofev wrote: Maneul, thank you for your input. This is actually huge, and the problem is exactly that. On a side note I will add, that I observed lower memory utilisation on OSD nodes since the update, and a big throughput on block.db devices(up to 100+MB/s) that was not there before, so logically that meant that some operations that were performed in memory before, now were executed directly on block device. Was digging through possible causes, but your time-saving message arrived earlier. Thank you! чт, 6 авг. 2020 г. в 14:56, Manuel Lausch : Hi, I found the reasen of this behavior change. With 14.2.10 the default value of "bluefs_buffered_io" was changed from true to false. https://tracker.ceph.com/issues/44818 configureing this to true my problems seems to be solved. Regards Manuel On Wed, 5 Aug 2020 13:30:45 +0200 Manuel Lausch wrote: Hello Vladimir, I just tested this with a single node testcluster with 60 HDDs (3 of them with bluestore without separate wal and db). With the 14.2.10, I see on the bluestore OSDs a lot of read IOPs while snaptrimming. With 14.2.9 this was not an issue. I wonder if this would explain the huge amount of slowops on my big testcluster (44 Nodes 1056 OSDs) while snaptrimming. I cannot test a downgrade there, because there are no packages of older releases for CentOS 8 available. Regards Manuel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: block.db/block.wal device performance dropped after upgrade to 14.2.10
Hi Mark, The read IOPs in "normal" operation was with bluefs_buffered_io=false somewhat about 1. And now with true around 2. So this seems slightly higher, but far away from any problem. While snapshot trimming the difference is enormous. with false: around 200 with true: around 10 scrubing read IOPs do not appear to be affected. They are around 100 IOPs I'am using librados to access my objects. So I don't know if this would be any different with rgw. Manuel On Fri, 7 Aug 2020 08:08:40 -0500 Mark Nelson wrote: > It's quite possible that the issue is really about rocksdb living on > top of bluefs with bluefs_buffered_io and rgw causing a ton of OMAP > traffic. rgw is the only case so far where the issue has shown up, > but it was significant enough that we didn't feel like we could leave > bluefs_buffered_io enabled. In your case with a 14GB target per OSD, > do you still see significantly increased disk reads with > blufs_buffered_io=false? > > > Mark > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSDs flapping since upgrade to 14.2.10
On 2020-08-07 12:07, Ingo Reimann wrote: > i`m am not sure, if we really have a problem, but it does not look healthy. It might be related to the change that is mentioned in another thread: "block.db/block.wal device performance dropped after upgrade to 14.2.10" TL;DR: bluefs_buffered_io has been changed to "false" in 14.2.10. It doesn't use buffer cache in that case, and in certain workloads (i.e. snap trimming) this seem to have a big impact, even for environments that have large osd_memory_target. I would change that back to "true" ("ceph config set osd bluefs_buffered_io true" should do the trick). Not sure if the OSDs need a restart afterwards, as the config change seem to be effectve immediately for running daemons. Gr. Stefan -- | BIT BV https://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: block.db/block.wal device performance dropped after upgrade to 14.2.10
That is super interesting regarding scrubbing. I would have expected that to be affected as well. Any chance you can check and see if there is any correlation between rocksdb compaction events, snap trimming, and increased disk reads? Also (Sorry if you already answered this) do we know for sure that it's hitting the block.db/block.wal device? I suspect it is, just wanted to verify. Mark On 8/7/20 9:04 AM, Manuel Lausch wrote: Hi Mark, The read IOPs in "normal" operation was with bluefs_buffered_io=false somewhat about 1. And now with true around 2. So this seems slightly higher, but far away from any problem. While snapshot trimming the difference is enormous. with false: around 200 with true: around 10 scrubing read IOPs do not appear to be affected. They are around 100 IOPs I'am using librados to access my objects. So I don't know if this would be any different with rgw. Manuel On Fri, 7 Aug 2020 08:08:40 -0500 Mark Nelson wrote: It's quite possible that the issue is really about rocksdb living on top of bluefs with bluefs_buffered_io and rgw causing a ton of OMAP traffic. rgw is the only case so far where the issue has shown up, but it was significant enough that we didn't feel like we could leave bluefs_buffered_io enabled. In your case with a 14GB target per OSD, do you still see significantly increased disk reads with blufs_buffered_io=false? Mark ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Nautilus slow using "ceph tell osd.* bench"
I have set it to 0.0 and let it re-balance. Then I set it back and let it re-balance again. I have a fairly small cluster, and while in production, it is not getting much use because of pandemic. So a good time to do some of these things. Because of that I have been re-balancing the osd's in groups of 3. Finished 2 nodes, with a little luck I will finish all 5 by tomorrow. Will post at completion. Off the top of my head I think it may have to do with "ceph osd crush set-all-straw-buckets-to-straw2" since that is supposed to cause a re-balance. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] How to Get Refund for Cash App Payment Pending
Welcome to Cash App Support Center +1-888-526-0829 Welcome to Cash App Support Center +1-888-526-0829 Welcome to Cash App Support Center +1-888-526-0829 https://cashapphelp.support/ ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] ceph rbd iscsi gwcli Non-existent images
Hi, I would appreciate any help/hints to solve this issue iscis (gwcli) cannot see the images anymore This configuration worked fine for many months What changed was that ceph is "nearly full" I am in the process of cleaning it up ( by deleting objects from one of the pools) and I do see reads and writes on the cluster as well as images info so not sure what gwcli does not like ( targetcli ls not working either - just froze ) Below some info ceph version ceph version 13.2.2 (02899bfda814146b021136e9d8e80eba494e1126) mimic (stable) gwcli --version gwcli - 2.7 ceph osd dump | grep ratio full_ratio 0.96 backfillfull_ratio 0.92 nearfull_ratio 0.9 [root@osd02 ~]# rbd -p rbd info rep01 rbd image 'rep01': size 7 TiB in 1835008 objects order 22 (4 MiB objects) id: 15b366b8b4567 block_name_prefix: rbd_data.15b366b8b4567 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten op_features: flags: create_timestamp: Thu Nov 1 15:57:52 2018 [root@osd02 ~]# rbd -p rbd info vmware01 rbd image 'vmware01': size 6 TiB in 1572864 objects order 22 (4 MiB objects) id: 16d3f6b8b4567 block_name_prefix: rbd_data.16d3f6b8b4567 format: 2 features: layering, exclusive-lock, object-map, fast-diff, deep-flatten op_features: flags: create_timestamp: Thu Nov 29 13:56:28 2018 [root@osd02 ~]# ceph df GLOBAL: SIZE AVAIL RAW USED %RAW USED 33 TiB 7.5 TiB 25 TiB 77.16 POOLS: NAMEID USED%USED MAX AVAIL OBJECTS cephfs_metadata 22 173 MiB 0.01 1.4 TiB 469 cephfs_data 23 1.7 TiB 69.78 775 GiB 486232 rbd 24 11 TiB 93.74 775 GiB 2974077 [root@osd02 ~]# ceph health detail HEALTH_ERR 2 nearfull osd(s); 2 pool(s) nearfull; Module 'prometheus' has failed: IOError("Port 9283 not free on '10.10.35.20'",) OSD_NEARFULL 2 nearfull osd(s) osd.12 is near full osd.17 is near full POOL_NEARFULL 2 pool(s) nearfull pool 'cephfs_data' is nearfull pool 'rbd' is nearfull gwcli /iscsi-target...nner-21faa413> info Client Iqn .. iqn.1998-01.com.vmware:banner-21faa413 Ip Address .. Alias .. Logged In .. Auth - chap .. cephuser/PASSWORD Group Name .. Luns - rbd.rep01.. lun_id=0 - rbd.vmware01 .. lun_id=1 osd02 journal: client update failed on iqn.1998-01.com.vmware:banner-21faa413 : Non-existent images ['rbd.vmware01'] requested for iqn.1998-01.com.vmware:banner-21faa413 Aug 7 14:15:39 osd02 journal: 127.0.0.1 - - [07/Aug/2020 14:15:39] "PUT /api/_clientlun/iqn.1998-01.com.vmware:banner-21faa413 HTTP/1.1" 500 - Aug 7 14:15:39 osd02 journal: _clientlun change on 127.0.0.1 failed with 500 Aug 7 14:15:39 osd02 journal: 127.0.0.1 - - [07/Aug/2020 14:15:39] "DELETE /api/clientlun/iqn.1998-01.com.vmware:banner-21faa413 HTTP/1.1" 500 - ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] EntityAddress format in ceph ssd blacklist commands
Hi, I want to understand the format for `ceph osd blacklist` commands. The documentation just says it's the address. But I am not sure if it can just be the host IP address or anything else. What does *:0/* *3710147553* represent in the following output? $ ceph osd blacklist ls listed 1 entries127.0.0.1:0/3710147553 2018-03-19 11:32:24.716146 $ ceph osd blacklist rm 127.0.0.1:0/3710147553 un-blacklisting 127.0.0.1:0/3710147553 Regards, Shridhar ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io