Re: [ceph-users] CephFS deletion performance
On Sat, Sep 14, 2019 at 8:57 PM Hector Martin wrote: > > On 13/09/2019 16.25, Hector Martin wrote: > > Is this expected for CephFS? I know data deletions are asynchronous, but > > not being able to delete metadata/directories without an undue impact on > > the whole filesystem performance is somewhat problematic. > > I think I'm getting a feeling for who the culprit is here. I just > noticed that listing directories in a snapshot that were subsequently > deleted *also* performs horribly, and kills cluster performance too. > > We just had a partial outage due to this; a snapshot+rsync triggered > while a round of deletions were happening, and as far as I can tell, > when it caught up to newly deleted files, MDS performance tanked as it > repeatedly had to open stray dirs under the hood. In fact, the > inode/dentry metrics (opened/closed) skyrocketed during that period, > from the normal ~1Kops from multiple parallel rsyncs to ~15Kops. > > As I mentioned in a prior message to the list, we have ~570k stray files > due to snapshots. It makes sense that deleting a directory/file means > moving it to a stray directory (each holding ~57k files already), and > accessing a deleted file via a snapshot means accessing the stray > directory. Am I right in thinking that these operations are at least > O(n) in the amount of strays, and in fact may iterate or otherwise touch > every single file in the stray directories? (This would explain the > sudden 15Kops spike in indoe/dentry activity). It seems that with such > bloated stray dirs, anything that involves them under the scenes just > make the MDS completely hiccup and grind away, affecting performance for > any other clients. > > I guess at this point we'll have to drastically cut down the time span > for which we keep CephFS snapshots. Maybe I'll move the snapshot history > keeping to the backup target, at least then it won't affect production > data. But since we plan on using the other cluster for production too > eventually, that would mean we need to use multi-FS in order to isolate > the workloads... > when a snapshoted directory is deleted, mds moves the directory into to stray directory. You have 57k strays, each time mds have a cache miss for stray, mds needs to load a stray dirfrag. This is very inefficient because a stray dirfrag contains lots of items, most items are useless. > -- > Hector Martin (hec...@marcansoft.com) > Public Key: https://mrcn.st/pub > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] cephfs: apache locks up after parallel reloads on multiple nodes
Quoting Paul Emmerich (paul.emmer...@croit.io): > Yeah, CephFS is much closer to POSIX semantics for a filesystem than > NFS. There's an experimental relaxed mode called LazyIO but I'm not > sure if it's applicable here. Out of curiosity, how would CephFS being more POSIX compliant cause this much delay in this situation? I'd understand if it would maybe take up to a second or maybe two, but almost fifteen minutes and then suddenly /all/ servers recover at the same time? Would this situation exist because we have so many open filehandles per server? Or could it also appear in a simpler "two servers share a CephFS" setup? I'm so curious to find out what /causes/ this. "Closer to POSIX sematics" doesn't cut it for me in this case. Not with the symptoms we're seeing. > You can debug this by dumping slow requests from the MDS servers via > the admin socket As far as i understood, there's not much to see on the MDS servers when this issue pops op. E.g. no slow ops logged during this event. Regards, -Sndr. -- | I think i want a job cleaning mirrors... | It's just something i can really see myself doing... | 4096R/20CC6CD2 - 6D40 1A20 B9AA 87D4 84C7 FBD6 F3A9 9442 20CC 6CD2 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD's keep crasching after clusterreboot
Hi together, it seems the issue described by Ansgar was reported and closed here as being fixed for newly created pools in post-Luminous releases: https://tracker.ceph.com/issues/41336 However, it is unclear to me: - How to find out if an EC cephfs you have created in Luminous is actually affected, before actually testing the "shutdown all" procedure, and thus having dying OSDs. - If affected, how to fix it without purging the pool completely (which is not so easily done if there is 0.5 PB inside, which can't be restored without a long downtime). If this is an acknowledged issue, it should probably also be mentioned in the upgrade notes from pre-Mimic to Mimic and newer before more people lose data. In our case, we have such a a CephFS on an EC pool created with Luminous, and are right now running Mimic 13.2.6, but never tried a "full shutdown". We need to try that on Friday, though... (cooling system maintenance). "osd dump" contains: pool 1 'cephfs_metadata' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 128 pgp_num 128 last_change 40903 flags hashpspool stripe_width 0 compression_algorithm snappy compression_mode aggressive application cephfs pool 2 'cephfs_data' erasure size 6 min_size 5 crush_rule 2 object_hash rjenkins pg_num 4096 pgp_num 4096 last_change 40953 flags hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 16384 compression_algorithm snappy compression_mode aggressive application cephfs and the EC profile is: # ceph osd erasure-code-profile get cephfs_data crush-device-class=hdd crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=4 m=2 plugin=jerasure technique=reed_sol_van w=8 Neither contains the stripe_unit explicitly, so I wonder how to find out if it is (in)valid. Checking the xattr ceph.file.layout.stripe_unit of some "old" files on the FS reveals 4194304 in my case. Any help appreciated. Cheers and all the best, Oliver Am 09.08.19 um 08:54 schrieb Ansgar Jazdzewski: We got our OSD's back Since we removed the EC-Pool (cephfs.data) we had to figure out how to remove the PG from teh Offline OSD and hier is how we did it. remove cehfs, remove cache layer, remove pools: #ceph mds fail 0 #ceph fs rm cephfs --yes-i-really-mean-it #ceph osd tier remove-overlay cephfs.data there is now (or already was) no overlay for 'cephfs.data' #ceph osd tier remove cephfs.data cephfs.cache pool 'cephfs.cache' is now (or already was) not a tier of 'cephfs.data' #ceph tell mon.\* injectargs '--mon-allow-pool-delete=true' #ceph osd pool delete cephfs.cache cephfs.cache --yes-i-really-really-mean-it pool 'cephfs.cache' removed #ceph osd pool delete cephfs.data cephfs.data --yes-i-really-really-mean-it pool 'cephfs.data' removed #ceph osd pool delete cephfs.metadata cephfs.metadata --yes-i-really-really-mean-it pool 'cephfs.metadata' removed remove placement groups of pool 23 (cephfs.data) from all offline OSDs: DATAPATH=/var/lib/ceph/osd/ceph-${OSD} a=`ceph-objectstore-tool --data-path ${DATAPATH} --op list-pgs | grep "^23\."` for i in $a; do echo "INFO: removing ${i} from OSD ${OSD}" ceph-objectstore-tool --data-path ${DATAPATH} --pgid ${i} --op remove --force done since we now had removed our cephfs we still not know if we could have solved it without data loss by upgrading to nautilus. Have a nice Weekend, Ansgar Am Mi., 7. Aug. 2019 um 17:03 Uhr schrieb Ansgar Jazdzewski : another update, we now took the more destructive route and removed the cephfs pools (lucky we had only test date in the filesystem) Our hope was that within the startup-process the osd will delete the no longer needed PG, But this is NOT the Case. So we are still have the same issue the only difference is that the PG does not belong to a pool anymore. -360> 2019-08-07 14:52:32.655 7fb14db8de00 5 osd.44 pg_epoch: 196586 pg[23.f8s0(unlocked)] enter Initial -360> 2019-08-07 14:52:32.659 7fb14db8de00 -1 /build/ceph-13.2.6/src/osd/ECUtil.h: In function 'ECUtil::stripe_info_t::stripe_info_t(uint64_t, uint64_t)' thread 7fb14db8de00 time 2019-08-07 14:52:32.660169 /build/ceph-13.2.6/src/osd/ECUtil.h: 34: FAILED assert(stripe_width % stripe_size == 0) we now can take one rout and try to delete the pg by hand in the OSD (bluestore) how this can be done? OR we try to upgrade to Nautilus and hope for the beset. any help hints are welcome, have a nice one Ansgar Am Mi., 7. Aug. 2019 um 11:32 Uhr schrieb Ansgar Jazdzewski : Hi, as a follow-up: * a full log of one OSD failing to start https://pastebin.com/T8UQ2rZ6 * our ec-pool cration in the fist place https://pastebin.com/20cC06Jn * ceph osd dump and ceph osd erasure-code-profile get cephfs https://pastebin.com/TRLPaWcH as we try to dig more into it, it looks like a bug
Re: [ceph-users] cephfs: apache locks up after parallel reloads on multiple nodes
On Tue, Sep 17, 2019 at 8:12 AM Sander Smeenk wrote: > > Quoting Paul Emmerich (paul.emmer...@croit.io): > > > Yeah, CephFS is much closer to POSIX semantics for a filesystem than > > NFS. There's an experimental relaxed mode called LazyIO but I'm not > > sure if it's applicable here. > > Out of curiosity, how would CephFS being more POSIX compliant cause > this much delay in this situation? I'd understand if it would maybe > take up to a second or maybe two, but almost fifteen minutes and then > suddenly /all/ servers recover at the same time? > > Would this situation exist because we have so many open filehandles per > server? Or could it also appear in a simpler "two servers share a > CephFS" setup? > > I'm so curious to find out what /causes/ this. > "Closer to POSIX sematics" doesn't cut it for me in this case. > Not with the symptoms we're seeing. Yeah this sounds weird. 15 minutes is one or two timers but I can't think of anything that should be related here. I'd look and see what sys calls the apache daemons are making and how long they're taking; in particular what's different between the first server and the rest. If they're doing a lot of the same syscalls but just much slower on the follow-on servers, that probably indicates they're all hammering the CephFS cluster with conflicting updates (especially if they're writes!) that NFS simply ignored and collapsed. If it's just one syscall that takes minutes to complete, check the mds admin socket for ops_in_flight. -Greg > > > > You can debug this by dumping slow requests from the MDS servers via > > the admin socket > > As far as i understood, there's not much to see on the MDS servers when > this issue pops op. E.g. no slow ops logged during this event. > > > Regards, > -Sndr. > -- > | I think i want a job cleaning mirrors... > | It's just something i can really see myself doing... > | 4096R/20CC6CD2 - 6D40 1A20 B9AA 87D4 84C7 FBD6 F3A9 9442 20CC 6CD2 > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Nautilus : ceph dashboard ssl not working
Hi Muthu, On 16.09.19 11:30, nokia ceph wrote: Hi Team, In ceph 14.2.2 , ceph dashboard does not have set-ssl-certificate . We are trying to enable ceph dashboard and while using the ssl certificate and key , it is not working . cn5.chn5au1c1.cdn ~# ceph dashboard set-ssl-certificate -i dashboard.crt no valid command found; 10 closest matches: dashboard set-grafana-update-dashboards dashboard reset-prometheus-api-host dashboard reset-ganesha-clusters-rados-pool-namespace dashboard set-grafana-api-username dashboard get-audit-api-log-payload dashboard get-grafana-api-password dashboard get-grafana-api-username dashboard set-rgw-api-access-key dashboard reset-rgw-api-host dashboard set-prometheus-api-host Error EINVAL: invalid command cn5.chn5au1c1.cdn ~# ceph -v ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus (stable) How to set crt and key in this case. ceph config-key dump | grep dashboard/[crt,key] Try this: ceph config-key set mgr mgr/dashboard/crt -i ssl.crt ceph config-key set mgr mgr/dashboard/key -i ssl.key Regards, Michel ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] eu.ceph.com mirror out of sync?
Dear Cephalopodians, I realized just now that: https://eu.ceph.com/rpm-nautilus/el7/x86_64/ still holds only released up to 14.2.2, and nothing is to be seen of 14.2.3 or 14.2.4, while the main repository at: https://download.ceph.com/rpm-nautilus/el7/x86_64/ looks as expected. Is this issue with the eu.ceph.com mirror already knwon? Cheers, Oliver smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com