[ceph-users] Re: Strange performance drop and low oss performance
> > > For object gateway, the performance is got by `swift-bench -t 64` which > uses 64 threads concurrently. Will the radosgw and http overhead be so > significant (94.5MB/s to 26MB/s for cluster1) when multiple threads are > used? Thanks in advance! > > Can't say what it "must" be, but if I log in to one of my rgw's (we have several, loadbalanced) and run ceph benchmarks against spindrive pools (ie, talking ceph directly), I get something like 200MB/s, if I run a write test on the same host, but talking s3-over-http against itself, I get something like 100MB/s, so the overhead in my case seems to be 100% (or 50% however you calculate it). You know there has to be some kind of penalty for doing protocol translations, if for nothing else, because of object store client does checksums, asks rgw to store it, rgw checksums the part(s), asks ceph to store, ceph sends ack, rgw sends ack to client with checksum and client compares before moving to next part. This will be far slower than just plain writes to ceph (the two innermost ops), and can in part be offset by using large IO, parallel streams, multiple rgw backends and so on. -- May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: osd_memory_target ignored
Dear Stefan, thanks for your help. I opened these: https://tracker.ceph.com/issues/44010 https://tracker.ceph.com/issues/44011 Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Stefan Kooman Sent: 05 February 2020 10:29 To: Frank Schilder Cc: ceph-users Subject: Re: [ceph-users] Re: osd_memory_target ignored Quoting Frank Schilder (fr...@dtu.dk): > Dear Stefan, > > is it possible that there is a mistake in the documentation or a bug? Out of > curiosity, I restarted one of these OSDs and the memory usage starts going up: > > ceph 881203 15.4 4.0 6201580 5344764 ? Sl 09:18 6:38 > /usr/bin/ceph-osd --cluster ceph -f -i 243 --setuser ceph --setgroup disk > > The documentation of ods_memory_target says "Can update at runtime: true", > but it seems that a restart is required to activate the setting, so it can > *not* be updated at runtime (meaning it takes effect without restart). Ah, that might be. If the documentation states it can be updated at runtime it's a bug (in eiter the code or the documentation). > > > In addition to that, I would like to have different default memory > targets set for different device classes. Unfortunately, there seem > not to be different memory_target_[devide class] default options. Is > there a good way to set different while avoiding to bloat "ceph config > dump" unnecessarily? I'm afraid not. You might want to file a tracker issue with an enhancement request. Gr. Stefan -- | BIT BV https://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Write i/o in CephFS metadata pool
> On 4 Feb 2020, at 16:14, Samy Ascha wrote: > > > >> On 2 Feb 2020, at 12:45, Patrick Donnelly wrote: >> >> On Wed, Jan 29, 2020 at 1:25 AM Samy Ascha wrote: >>> >>> Hi! >>> >>> I've been running CephFS for a while now and ever since setting it up, I've >>> seen unexpectedly large write i/o on the CephFS metadata pool. >>> >>> The filesystem is otherwise stable and I'm seeing no usage issues. >>> >>> I'm in a read-intensive environment, from the clients' perspective and >>> throughput for the metadata pool is consistently larger than that of the >>> data pool. >>> >>> For example: >>> >>> # ceph osd pool stats >>> pool cephfs_data id 1 >>> client io 7.6 MiB/s rd, 19 KiB/s wr, 404 op/s rd, 1 op/s wr >>> >>> pool cephfs_metadata id 2 >>> client io 338 KiB/s rd, 43 MiB/s wr, 84 op/s rd, 26 op/s wr >>> >>> I realise, of course, that this is a momentary display of statistics, but I >>> see this unbalanced r/w activity consistently when monitoring it live. >>> >>> I would like some insight into what may be causing this large imbalance in >>> r/w, especially since I'm in a read-intensive (web hosting) environment. >> >> The MDS is still writing its journal and updating the "open file >> table". The MDS needs to record certain information about the state of >> its cache and the state issued to clients. Even if the clients aren't >> changing anything. (This is workload dependent but will be most >> obvious when clients are opening files _not_ in cache already.) >> >> -- >> Patrick Donnelly, Ph.D. >> He / Him / His >> Senior Software Engineer >> Red Hat Sunnyvale, CA >> GPG: 19F28A586F808C2402351B93C3301A3E258DD79D >> > > Hi Patrick, > > Thanks for this extra information. > > I should be able to confirm this by checking network traffic flowing from the > MDSes to the OSDs, and compare it to what's coming in from the CephFS clients. > > I'll report back when I have more information on that. I'm a little caught up > in other stuff right now, but I wanted to just acknowledge your message. > > Samy > Hi! I've confirmed that the write IO to the metadata pool is coming form active MDSes. I'm experiencing very poor write performance on clients and I would like to see if there's anything I can do to optimise the performance. Right now, I'm specifically focussing on speeding up this use case: In CephFS mounted dir: $ time unzip -q wordpress-seo.12.9.1.zip real0m47.596s user0m0.218s sys 0m0.157s On RBD mount: $ time unzip -q wordpress-seo.12.9.1.zip real0m0.176s user0m0.131s sys 0m0.045s The difference is just too big. I'm having real trouble finding a good reference to check my setup for bad configuration etc. I have network bandwidth, RAM and CPU to spare, but I'm unsure on how to put it to work to help my case. Thanks a lot, Samy ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Write i/o in CephFS metadata pool
> Hi! > > I've confirmed that the write IO to the metadata pool is coming form active > MDSes. > > I'm experiencing very poor write performance on clients and I would like to > see if there's anything I can do to optimise the performance. > > Right now, I'm specifically focussing on speeding up this use case: > > In CephFS mounted dir: > > $ time unzip -q wordpress-seo.12.9.1.zip > > real 0m47.596s > user 0m0.218s > sys 0m0.157s > > On RBD mount: > > $ time unzip -q wordpress-seo.12.9.1.zip > > real 0m0.176s > user 0m0.131s > sys 0m0.045s > > The difference is just too big. I'm having real trouble finding a good > reference to check my setup for bad configuration etc. > > I have network bandwidth, RAM and CPU to spare, but I'm unsure on how to put > it to work to help my case. Are there a lot of directories to be created from that zip file? I think it boils down to the directory operations that need to be performed synchrously. See https://fosdem.org/2020/schedule/event/sds_ceph_async_directory_ops/ https://fosdem.org/2020/schedule/event/sds_ceph_async_directory_ops/attachments/slides/3962/export/events/attachments/sds_ceph_async_directory_ops/slides/3962/async_dirops_cephfs.pdf https://video.fosdem.org/2020/H.1308/sds_ceph_async_directory_ops.webm Gr. Stefan -- | BIT BV https://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Stuck with an unavailable iscsi gateway
Hello I can't find, a way to resolve my problem. I lost a iscsi gateway in a pool of 4 gateway, there is 3 lefts. I can't delete the lost gateway from host and I can't change the Owner of the resource owned by the lost gateway. Finally, I have ressources which are inaccessible from clients and I can't reconfigure them because of the lost gateway. Please, tell me there is a way to remove a lost gateway and that I won't be stuck for ever. If I do delete compute04.adm.local it answers Failed : Gateway deletion failed, gateway(s) unavailable:compute04.adm.local(UNKNOWN state) I saw a reference of my problem in the thread "Error in add new ISCSI gateway" but unfortunatly, no answer seems to be avalaible. Thanks for any help ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Understanding Bluestore performance characteristics
Hi Stefan, Do you mean more info than: Yes, there's more... I don't remember exactly, I think some information ends up included into OSD perf counters and some information is dumped into the OSD log, maybe there's even a 'ceph daemon' command to trigger it... There are 4 options that enable various parts of it: #rocksdb_perf = true #rocksdb_collect_compaction_stats = true #rocksdb_collect_extended_stats = true #rocksdb_collect_memory_stats = true ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Need info about ceph bluestore autorepair
Hello, if I have a pool with replica 3 what happens when one replica is corrupted? I suppose ceph detects bad replica using checksums and replace it with good one If I have a pool with replica 2 what happens? Thanks, Mario ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Stuck with an unavailable iscsi gateway
Originally, the idea of a gateway just permanently disappearing out-of-the-blue was never a concern. However, since this seems to be a recurring issue, the latest version of ceph-iscsi includes support for force-deleting a permanently dead iSCSI gateway [1]. I don't think that fix is in an official release yet, but it's available as a dev build here [2]. On Thu, Feb 6, 2020 at 6:45 AM wrote: > > Hello > > I can't find, a way to resolve my problem. > I lost a iscsi gateway in a pool of 4 gateway, there is 3 lefts. I can't > delete the lost gateway from host and I can't change the Owner of the > resource owned by the lost gateway. > > Finally, I have ressources which are inaccessible from clients and I can't > reconfigure them because of the lost gateway. > Please, tell me there is a way to remove a lost gateway and that I won't be > stuck for ever. > > If I do > delete compute04.adm.local > > it answers >Failed : Gateway deletion failed, gateway(s) > unavailable:compute04.adm.local(UNKNOWN state) > > I saw a reference of my problem in the thread "Error in add new ISCSI > gateway" but unfortunatly, no answer seems to be avalaible. > > > Thanks for any help > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > [1] https://github.com/ceph/ceph-iscsi/pull/156 [2] https://2.chacra.ceph.com/r/ceph-iscsi/master/945fc555a0434cd0b9f5dbcb0ebaadcde8989d0a/centos/7/flavors/default/ -- Jason ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Need info about ceph bluestore autorepair
Den tors 6 feb. 2020 kl 15:06 skrev Mario Giammarco : > Hello, > if I have a pool with replica 3 what happens when one replica is corrupted? > The PG on which this happens will turn from active+clean to active+inconsistent. > I suppose ceph detects bad replica using checksums and replace it with good > one > There is a "osd fix on error = true/false" setting (whose name I can't remember right off the bat now) which controls this. If false, you need to "ceph pg repair" it, then it happens as you describe. > If I have a pool with replica 2 what happens? > Same. Except with repl=2, you run a higher chance of surprises* on the remaining replica while the first one is bad until it gets repaired. *) ie, data loss, tears and less sleep for ceph admins -- May the most significant bit of your life be positive. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] RBD cephx read-only key
I'm trying to set up a cephx key to mount RBD images read-only. I have the following two keys: [client.rbd] key = xxx caps mgr = "profile rbd" caps mon = "profile rbd" caps osd = "profile rbd pool=rbd_vm" [client.rbd-ro] key = xxx caps mgr = "profile rbd-read-only" caps mon = "profile rbd" caps osd = "profile rbd-read-only pool=rbd_vm" The following works: # rbd map --pool rbd_vm andras_test --name client.rbd /dev/rbd0 and so does this: # rbd map --pool rbd_vm andras_test --name client.rbd --read-only /dev/rbd0 but the using the rbd-ro key doesn't work: # rbd map --pool rbd_vm andras_test --name client.rbd-ro --read-only rbd: sysfs write failed In some cases useful info is found in syslog - try "dmesg | tail". rbd: map failed: (1) Operation not permitted the logs only have the following: [1281776.788709] libceph: mon4 10.128.150.14:6789 session established [1281776.801747] libceph: client88900164 fsid d7b33135-0940-4e48-8aa6-1d2026597c2f The back end of mimic 13.2.8, the kernel is the CentOS kernel 3.10.0-957.27.2.el7.x86_64 Any ideas what I'm doing wrong here? Andras ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD cephx read-only key
On Thu, Feb 6, 2020 at 11:20 AM Andras Pataki wrote: > > I'm trying to set up a cephx key to mount RBD images read-only. I have > the following two keys: > > [client.rbd] > key = xxx > caps mgr = "profile rbd" > caps mon = "profile rbd" > caps osd = "profile rbd pool=rbd_vm" > > [client.rbd-ro] > key = xxx > caps mgr = "profile rbd-read-only" > caps mon = "profile rbd" > caps osd = "profile rbd-read-only pool=rbd_vm" > > The following works: > > # rbd map --pool rbd_vm andras_test --name client.rbd > /dev/rbd0 > > and so does this: > > # rbd map --pool rbd_vm andras_test --name client.rbd --read-only > /dev/rbd0 > > but the using the rbd-ro key doesn't work: > > # rbd map --pool rbd_vm andras_test --name client.rbd-ro --read-only > rbd: sysfs write failed > In some cases useful info is found in syslog - try "dmesg | tail". > rbd: map failed: (1) Operation not permitted > > the logs only have the following: > > [1281776.788709] libceph: mon4 10.128.150.14:6789 session established > [1281776.801747] libceph: client88900164 fsid > d7b33135-0940-4e48-8aa6-1d2026597c2f > > The back end of mimic 13.2.8, the kernel is the CentOS kernel > 3.10.0-957.27.2.el7.x86_64 > > Any ideas what I'm doing wrong here? You need kernel v5.5 or later to map an RBD image via krbd using read-only caps [1]. Prior to this patch, krbd would be in a quasi-read-only state internally. > Andras > > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io [1] https://tracker.ceph.com/issues/42667 -- Jason ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: RBD cephx read-only key
Ah, that makes sense. Thanks for the quick reply! Andras On 2/6/20 11:24 AM, Jason Dillaman wrote: On Thu, Feb 6, 2020 at 11:20 AM Andras Pataki wrote: I'm trying to set up a cephx key to mount RBD images read-only. I have the following two keys: [client.rbd] key = xxx caps mgr = "profile rbd" caps mon = "profile rbd" caps osd = "profile rbd pool=rbd_vm" [client.rbd-ro] key = xxx caps mgr = "profile rbd-read-only" caps mon = "profile rbd" caps osd = "profile rbd-read-only pool=rbd_vm" The following works: # rbd map --pool rbd_vm andras_test --name client.rbd /dev/rbd0 and so does this: # rbd map --pool rbd_vm andras_test --name client.rbd --read-only /dev/rbd0 but the using the rbd-ro key doesn't work: # rbd map --pool rbd_vm andras_test --name client.rbd-ro --read-only rbd: sysfs write failed In some cases useful info is found in syslog - try "dmesg | tail". rbd: map failed: (1) Operation not permitted the logs only have the following: [1281776.788709] libceph: mon4 10.128.150.14:6789 session established [1281776.801747] libceph: client88900164 fsid d7b33135-0940-4e48-8aa6-1d2026597c2f The back end of mimic 13.2.8, the kernel is the CentOS kernel 3.10.0-957.27.2.el7.x86_64 Any ideas what I'm doing wrong here? You need kernel v5.5 or later to map an RBD image via krbd using read-only caps [1]. Prior to this patch, krbd would be in a quasi-read-only state internally. Andras ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io [1] https://tracker.ceph.com/issues/42667 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Different memory usage on OSD nodes after update to Nautilus
Dear all In the mid of January I updated my ceph cluster from Luminous to Nautilus. Attached you can see the memory metrics collected on one OSD node (I see the very same behavior on all OSD hosts) graphed via Ganglia This is Centos 7 node, with 64 GB of RAM, hosting 10 OSDs. So before the update there were about 20 GB of FreeMem. Now FreeMem is basically 0, but I see 20 GB of Buffers, I guess this triggered some swapping, probably because I forgot to set vm.swappiness to 0 (it was set to 60, the default value). I was wondering if this the expected behavior PS: Actually besides updating ceph, I also updated all the other packages (yum update), so I am not sure that this different memory usage is because of the ceph update For the record in this update the kernel was updated from 3.10.0-1062.1.2 to 3.10.0-1062.9.1 Thanks, Massimo ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Different memory usage on OSD nodes after update to Nautilus
Thanks for your feedback The Ganglia graphs are available here: https://cernbox.cern.ch/index.php/s/0xBDVwNkRqcoGdF Replying to the other questions: - Free Memory in ganglia is derived from "MemFree" in /proc/meminfo - Memory Buffers in ganglia is derived from "Buffers" in /proc/meminfo - On this host, the OSDs are 6TB. On other hosts we have 10TB OSDs - "osd memory target" is set to ~ 4.5 GB (actually, while debugging this issue, I have just lowered the value to 3.2 GB) - "ceph tell osd.x heap stats" basically always reports 0 (or a very low value) for "Bytes in page heap freelist" and a heap release doesn't change the memory usage - I can agree that swap is antiquated. But so far it was simply not used and didn't cause any problems. At any rate I am now going to remove the swap (or setting the swappiness to 0). Thanks again ! Cheers, Massimo On Thu, Feb 6, 2020 at 6:28 PM Anthony D'Atri wrote: > Attachments are usually filtered by mailing lists. Yours did not come > through. A URL to Skitch or some other hosting works better. > > Your kernel version sounds like RHEL / CentOS? I can say that memory > accounting definitely did change between upstream 3.19 and 4.9 > > > osd04-cephstorage1-gsc:~ # head /proc/meminfo > MemTotal: 197524684 kB > MemFree:80388504 kB > MemAvailable: 86055708 kB > Buffers: 633768 kB > Cached: 4705408 kB > SwapCached:0 kB > > Specifically, node_memory_Active as reported by node_exporter changes > dramatically, and MemAvailable is the more meaningful metric. What is your > “FreeMem” metric actually derived from? > > 64GB for 10 OSDs might be on the light side, how large are those OSDs? > > For sure swap is antiquated. If your systems have any swap provisioned at > all, you’re doing it wrong. I’ve had good results setting it to 1. > > Do `ceph daemon osd.xx heap stats`, see if your OSD processes have much > unused memory that has not been released to the OS. If they do, “heap > release” can be useful. > > > > > On Feb 6, 2020, at 9:08 AM, Massimo Sgaravatto < > massimo.sgarava...@gmail.com> wrote: > > > > Dear all > > > > In the mid of January I updated my ceph cluster from Luminous to > Nautilus. > > > > Attached you can see the memory metrics collected on one OSD node (I see > > the very same behavior on all OSD hosts) graphed via Ganglia > > This is Centos 7 node, with 64 GB of RAM, hosting 10 OSDs. > > > > So before the update there were about 20 GB of FreeMem. > > Now FreeMem is basically 0, but I see 20 GB of Buffers, > > > > I guess this triggered some swapping, probably because I forgot to > > set vm.swappiness to 0 (it was set to 60, the default value). > > > > I was wondering if this the expected behavior > > > > PS: Actually besides updating ceph, I also updated all the other packages > > (yum update), so I am not sure that this different memory usage is > because > > of the ceph update > > For the record in this update the kernel was updated from 3.10.0-1062.1.2 > > to 3.10.0-1062.9.1 > > > > Thanks, Massimo > > ___ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Ubuntu 18.04.4 Ceph 12.2.12
For the Ubuntu 18.04 LTS, the latest ceph package is 12.2.12-0ubuntu0.18.04.4 and can be found in the bionic-updates pocket [0]. There is an active SRU (stable release update) to move to the new 12.2.13 point release. You can follow its progress on launchpad [1]. I should note that the Ubuntu 18.04 LTS also supports the mimic and nautilus releases through the ubuntu cloud archive ppas. You can find details on which LTS supports which ceph releases here [2]. Please open a launchpad bug if you are having problems installing from Ubuntu sourced packaging. [0] https://packages.ubuntu.com/bionic-updates/ceph [1] https://bugs.launchpad.net/ubuntu/+source/ceph/+bug/1861793 [2] https://ubuntu.com/ceph On Mon, Feb 3, 2020 at 4:07 PM Atherion wrote: > So now that 12.2.13 has been released, now I will have a mixed environment > if I use Ubuntu 18.04 repo 12.2.12 > > I also found there is a docker container > https://hub.docker.com/r/ceph/daemon I could potentially just use the > container to run the version I need. Wondering if anyone has done this in > production? > > Managing the ubuntu repos for ceph has not been easy to say the least :( > Found this ticket but looks dead https://tracker.ceph.com/issues/24326 > > ‐‐‐ Original Message ‐‐‐ > On Friday, January 24, 2020 1:12 PM, Anthony D'Atri > wrote: > > > I applied those packages for the same reason on a staging cluster and so > far so good. > > > >> On Jan 24, 2020, at 9:15 AM, Atherion wrote: > > > >> > >> Hi Ceph Community. > >> We currently have a luminous cluster running and some machines still on > Ubuntu 14.04 > >> We are looking to upgrade these machines to 18.04 but the only upgrade > path for luminous with the ceph repo is through 16.04. > >> It is doable to get to Mimic but then we have to upgrade all those > machines to 16.04 but then we have to upgrade again to 18.04 when we get to > Mimic, it is becoming a huge time sink. > >> > >> I did notice in the Ubuntu repos they have added 12.2.12 in 18.04.4 > release. Is this a reliable build we can use? > >> > https://ubuntu.pkgs.org/18.04/ubuntu-proposed-main-amd64/ceph_12.2.12-0ubuntu0.18.04.4_amd64.deb.html > >> If so then we can go straight to 18.04.4 and not waste so much time. > >> > >> Best > >> > >> ___ > >> ceph-users mailing list -- ceph-users@ceph.io > >> To unsubscribe send an email to ceph-users-le...@ceph.io > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: mds lost very frequently
Hi, After setting: ceph config set mds mds_recall_max_caps 1 (5000 before change) and ceph config set mds mds_recall_max_decay_rate 1.0 (2.5 before change) And the: ceph tell 'mds.*' injectargs '--mds_recall_max_caps 1' ceph tell 'mds.*' injectargs '--mds_recall_max_decay_rate 1.0' our up:active MDS stopped responding and the standby-replay stepped in ... and hit an assert (same as in this thread): 2020-02-06 16:42:16.712 7ff76a528700 1 heartbeat_map reset_timeout 'MDSRank' had timed out after 15 2020-02-06 16:42:17.616 7ff76ff1b700 0 mds.beacon.mds2 MDS is no longer laggy 2020-02-06 16:42:20.348 7ff76d716700 -1 /build/ceph-13.2.8/src/mds/Locker.cc: In function 'void Locker::file_recover(ScatterLock*)' thread 7ff76d716700 time 2020-02-06 16:42:20.351124 /build/ceph-13.2.8/src/mds/Locker.cc: 5307: FAILED assert(lock->get_state() == LOCK_PRE_SCAN) ceph version 13.2.8 (5579a94fafbc1f9cc913a0f5d362953a5d9c3ae0) mimic (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14e) [0x7ff7759939de] 2: (()+0x287b67) [0x7ff775993b67] 3: (()+0x28a9ea) [0x5585eb2b79ea] 4: (MDCache::start_files_to_recover()+0xbb) [0x5585eb1f897b] 5: (MDSRank::active_start()+0x135) [0x5585eb146be5] 6: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x4e5) [0x5585eb151ea5] 7: (MDSDaemon::handle_mds_map(MMDSMap*)+0xca8) [0x5585eb134608] 8: (MDSDaemon::handle_core_message(Message*)+0x6c) [0x5585eb138bbc] 9: (MDSDaemon::ms_dispatch(Message*)+0xbb) [0x5585eb13929b] 10: (DispatchQueue::entry()+0xb92) [0x7ff775a56e52] 11: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff775af3e2d] 12: (()+0x76db) [0x7ff7752846db] 13: (clone()+0x3f) [0x7ff77446a88f] 2020-02-06 16:42:20.348 7ff76d716700 -1 *** Caught signal (Aborted) ** in thread 7ff76d716700 thread_name:ms_dispatch ceph version 13.2.8 (5579a94fafbc1f9cc913a0f5d362953a5d9c3ae0) mimic (stable) 1: (()+0x12890) [0x7ff77528f890] 2: (gsignal()+0xc7) [0x7ff774387e97] 3: (abort()+0x141) [0x7ff774389801] 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x256) [0x7ff775993ae6] 5: (()+0x287b67) [0x7ff775993b67] 6: (()+0x28a9ea) [0x5585eb2b79ea] 7: (MDCache::start_files_to_recover()+0xbb) [0x5585eb1f897b] 8: (MDSRank::active_start()+0x135) [0x5585eb146be5] 9: (MDSRankDispatcher::handle_mds_map(MMDSMap*, MDSMap*)+0x4e5) [0x5585eb151ea5] 10: (MDSDaemon::handle_mds_map(MMDSMap*)+0xca8) [0x5585eb134608] 11: (MDSDaemon::handle_core_message(Message*)+0x6c) [0x5585eb138bbc] 12: (MDSDaemon::ms_dispatch(Message*)+0xbb) [0x5585eb13929b] 13: (DispatchQueue::entry()+0xb92) [0x7ff775a56e52] 14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7ff775af3e2d] 15: (()+0x76db) [0x7ff7752846db] 16: (clone()+0x3f) [0x7ff77446a88f] NOTE: a copy of the executable, or `objdump -rdS ` is needed to interpret this. Quoting Yan, Zheng (uker...@gmail.com): > Please try below patch if you can compile ceph from source. If you > can't compile ceph or the issue still happens, please set debug_mds = > 10 for standby mds (change debug_mds to 0 after mds becomes active). > > Regards > Yan, Zheng > > diff --git a/src/mds/MDSRank.cc b/src/mds/MDSRank.cc > index 1e8b024b8a..d1150578f1 100644 > --- a/src/mds/MDSRank.cc > +++ b/src/mds/MDSRank.cc > @@ -1454,8 +1454,8 @@ void MDSRank::rejoin_done() > void MDSRank::clientreplay_start() > { >dout(1) << "clientreplay_start" << dendl; > - finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters >mdcache->start_files_to_recover(); > + finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters >queue_one_replay(); > } > > @@ -1487,8 +1487,8 @@ void MDSRank::active_start() > >mdcache->clean_open_file_lists(); >mdcache->export_remaining_imported_caps(); > - finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters >mdcache->start_files_to_recover(); > + finish_contexts(g_ceph_context, waiting_for_replay); // kick waiters > >mdcache->reissue_all_caps(); >mdcache->activate_stray_manager(); AFAICT this patch has never been tested and never commited. Do you still think this might fix the issue? Any hints on how we might reproduce this issue: failing active mds and hitting this specific recovery scenario We will happily apply this patch and do testing to check if it really fixes the issue. Gr. Stefan P.s. For my understanding: the MDS should never stop responding by setting these parameters, right? -- | BIT BV https://www.bit.nl/Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / i...@bit.nl ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: slow using ISCSI - Help-me
On 02/05/2020 07:03 AM, Gesiel Galvão Bernardes wrote: > Em dom., 2 de fev. de 2020 às 00:37, Gesiel Galvão Bernardes > mailto:gesiel.bernar...@gmail.com>> escreveu: > > Hi, > > Just now was possible continue this. Below is the information > required. Thanks advan Hey, sorry for the late reply. I just back from PTO. > > esxcli storage nmp device list -d naa.6001405ba48e0b99e4c418ca13506c8e > naa.6001405ba48e0b99e4c418ca13506c8e >Device Display Name: LIO-ORG iSCSI Disk > (naa.6001405ba48e0b99e4c418ca13506c8e) >Storage Array Type: VMW_SATP_ALUA >Storage Array Type Device Config: {implicit_support=on; > explicit_support=off; explicit_allow=on; alua_followover=on; > action_OnRetryErrors=on; {TPG_id=1,TPG_state=ANO}} >Path Selection Policy: VMW_PSP_MRU >Path Selection Policy Device Config: Current Path=vmhba68:C0:T0:L0 >Path Selection Policy Device Custom Config: >Working Paths: vmhba68:C0:T0:L0 >Is USB: false > Failed: H:0x0 D:0x2 P:0x0 Valid sense data: 0x2 0x4 0xa. Act:FAILOVER Are you sure you are using tcmu-runner 1.4? Is that the actual daemon reversion running? Did you by any chance install the 1.4 rpm, but you/it did not restart the daemon? The error code above is returned in 1.3 and earlier. You are probably hitting a combo of 2 issues. We had only listed ESX 6.5 in the docs you probably saw, and in 6.7 the value of action_OnRetryErrors defaulted to on instead of off. You should set this back to off. You should also upgrade to the current version of tcmu-runner 1.5.x. It should fix the issue you are hitting, so non IO commands like inquiry, RTPG, etc are executed while failing over/back, so you would not hit the problem where path initialization and path testing IO is failed causing the path to marked as failed. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Benefits of high RAM on a metadata server?
Hi, we are planning out a Ceph storage cluster and were choosing between 64GB, 128GB, or even 256GB on metadata servers. We are considering having 2 metadata servers overall. Does going to high levels of RAM possibly yield any performance benefits? Is there a size beyond which there are just diminishing returns vs cost? The expected use case would be for a cluster where there might be 10-20 concurrent users working on individual datasets of 5TB in size. I expect there would be lots of reads of the 5TB datasets matched with the creation of hundreds to thousands of smaller files during processing of the images. Thanks! -Matt -- Matt Larson, PhD Madison, WI 53705 U.S.A. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Benefits of high RAM on a metadata server?
Hi, I am running on 3 MDS servers (1 active and 2 backups and I recommend that) each of 128 GB of RAM (the clients are running ML analysis) and I have about 20 mil inodes loaded in ram. It's working fine except some warnings I have "client X is failing to respond to cache pressure." Besides that there are no complaints but I thing you would need the 256GB of ram specially if the datasets will increase... just my 2 cents.. Will you have SSD ? On Fri, Feb 7, 2020 at 12:02 AM Matt Larson wrote: > Hi, we are planning out a Ceph storage cluster and were choosing > between 64GB, 128GB, or even 256GB on metadata servers. We are > considering having 2 metadata servers overall. > > Does going to high levels of RAM possibly yield any performance > benefits? Is there a size beyond which there are just diminishing > returns vs cost? > > The expected use case would be for a cluster where there might be > 10-20 concurrent users working on individual datasets of 5TB in size. > I expect there would be lots of reads of the 5TB datasets matched with > the creation of hundreds to thousands of smaller files during > processing of the images. > > Thanks! > -Matt > > -- > Matt Larson, PhD > Madison, WI 53705 U.S.A. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Benefits of high RAM on a metadata server?
On 2/6/20 11:01 PM, Matt Larson wrote: > Hi, we are planning out a Ceph storage cluster and were choosing > between 64GB, 128GB, or even 256GB on metadata servers. We are > considering having 2 metadata servers overall. > > Does going to high levels of RAM possibly yield any performance > benefits? Is there a size beyond which there are just diminishing > returns vs cost? > The MDS will try to cache as much inodes as you allow it to. So the amount of users nor the total amount of bytes doesn't matter, it's the amount of inodes, thus: files and directories. The more you have of those, the more memory it requires. A lot of small files? A lot of memory! Wido > The expected use case would be for a cluster where there might be > 10-20 concurrent users working on individual datasets of 5TB in size. > I expect there would be lots of reads of the 5TB datasets matched with > the creation of hundreds to thousands of smaller files during > processing of the images. > > Thanks! > -Matt > ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Benefits of high RAM on a metadata server?
Hi Bogdan, Are the "client failing to respond" messages indicating that you actually exceed the 128 GB ram on your MDS hosts? The MDS servers are not planned to have SSD drives. The storage servers would have HD's and 1 nVME SSD drive that could hold metadata volumes. On Thu, Feb 6, 2020 at 4:11 PM Bogdan Adrian Velica wrote: > > Hi, > I am running on 3 MDS servers (1 active and 2 backups and I recommend that) > each of 128 GB of RAM (the clients are running ML analysis) and I have about > 20 mil inodes loaded in ram. It's working fine except some warnings I have > "client X is failing to respond to cache pressure." > Besides that there are no complaints but I thing you would need the 256GB of > ram specially if the datasets will increase... just my 2 cents.. > > Will you have SSD ? > > > > On Fri, Feb 7, 2020 at 12:02 AM Matt Larson wrote: >> >> Hi, we are planning out a Ceph storage cluster and were choosing >> between 64GB, 128GB, or even 256GB on metadata servers. We are >> considering having 2 metadata servers overall. >> >> Does going to high levels of RAM possibly yield any performance >> benefits? Is there a size beyond which there are just diminishing >> returns vs cost? >> >> The expected use case would be for a cluster where there might be >> 10-20 concurrent users working on individual datasets of 5TB in size. >> I expect there would be lots of reads of the 5TB datasets matched with >> the creation of hundreds to thousands of smaller files during >> processing of the images. >> >> Thanks! >> -Matt >> >> -- >> Matt Larson, PhD >> Madison, WI 53705 U.S.A. >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io -- Matt Larson, PhD Madison, WI 53705 U.S.A. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io