Re: [ceph-users] Debugging 'slow requests' ...
Thanks for your feedback ! I increased debug_ms to 1/5. This is another slow request (full output from 'ceph daemon osd.5 dump_historic_ops' for this event is attached): { "description": "osd_op(client.171725953.0:404377591 8.9b 8:d90adab6: ::rbd_data.c47f3c390c8495.0001934a:head [set-alloc-hint object_size 4194 304 write_size 4194304,write 1413120~122880] snapc 0=[] ondisk+write+known_if_re directed e1205833)", "initiated_at": "2019-02-09 07:29:34.404655", "age": 387.914193, "duration": 340.224154, "type_data": { "flag_point": "commit sent; apply or cleanup", "client_info": { "client": "client.171725953", "client_addr": "192.168.61.66:0/4056439540", "tid": 404377591 }, "events": [ { "time": "2019-02-09 07:29:34.404655", "event": "initiated" }, { "time": "2019-02-09 07:29:34.416752", "event": "op_applied" }, { "time": "2019-02-09 07:29:34.417200", "event": "sub_op_commit_rec from 14" }, { "time": "2019-02-09 07:35:14.628724", "event": "sub_op_commit_rec from 29" }, osd.5 has IP 192.168.222.202 osd.14 has IP 192.168.222.203 osd.29 has IP 192.168.222.204 Grepping using that client id from osd.5 log as far as I can understand (please correct me if my debugging is completely wrong) the request to 5 and 14 is sent at 07:29:34: 2019-02-09 07:29:34.415808 7f5faf4b4700 1 -- 192.168.222.202:6816/157436 --> 192.168.222.203:6811/158495 -- osd_repop(client.171725953.0:404377591 8.9b e1205833/1205735 8:d90ada\ b6:::rbd_data.c47f3c390c8495.0001934a:head v 1205833'4767322) v2 -- 0x56307bb61e00 con 0 2019-02-09 07:29:34.415868 7f5faf4b4700 1 -- 192.168.222.202:6816/157436 --> 192.168.222.204:6804/4159520 -- osd_repop(client.171725953.0:404377591 8.9b e1205833/1205735 8:d90ad\ ab6:::rbd_data.c47f3c390c8495.0001934a:head v 1205833'4767322) v2 -- 0x56308ca42400 con 0 2019-02-09 07:29:34.417132 7f5fcf60f700 1 -- 192.168.222.202:6816/157436 <== osd.14 192.168.222.203:6811/158495 11242 osd_repop_reply(client.171725953.0:404377591 8.9b e120\ 5833/1205735) v2 111+0+0 (634943494 0 0) 0x563090642780 con 0x56308bbd The answer from 14 arrives immediately: 2019-02-09 07:29:34.417132 7f5fcf60f700 1 -- 192.168.222.202:6816/157436 <== osd.14 192.168.222.203:6811/158495 11242 osd_repop_reply(client.171725953.0:404377591 8.9b e120\ 5833/1205735) v2 111+0+0 (634943494 0 0) 0x563090642780 con 0x56308bbd while the one from 29 arrives only at 7.35: 2019-02-09 07:35:14.628614 7f5fcee0e700 1 -- 192.168.222.202:6816/157436 <== osd.29 192.168.222.204:6804/4159520 1952 osd_repop_reply(client.171725953.0:404377591 8.9b e120\ 5833/1205735) v2 111+0+0 (3804866849 0 0) 0x56308f3f2a00 con 0x56308bf61000 In osd.29 log it looks like the request only arrives at 07.35 (and it promptly replies): 2019-02-09 07:35:14.627462 7f99972cc700 1 -- 192.168.222.204:6804/4159520 <== osd.5 192.168.222.202:6816/157436 2527 osd_repop(client.171725953.0:404377591 8.9b e1205833/1205735) v2 1050+0+123635 (1225076790 0 171428115) 0x5610f5128a00 con 0x5610fc5bf000 2019-02-09 07:35:14.628343 7f998d6d4700 1 -- 192.168.222.204:6804/4159520 --> 192.168.222.202:6816/157436 -- osd_repop_reply(client.171725953.0:404377591 8.9b e1205833/1205735 ondisk, result = 0) v2 -- 0x5610f4a51180 con 0 Network problems ? Full logs for the 3 relevant OSDs (just for that time period) is at: https://drive.google.com/drive/folders/1TG5MomMJsqVbsuFosvYokNptLufxOnPY?usp=sharing Thanks again ! Cheers, Massimo On Fri, Feb 8, 2019 at 11:50 PM Brad Hubbard wrote: > Try capturing another log with debug_ms turned up. 1 or 5 should be Ok > to start with. > > On Fri, Feb 8, 2019 at 8:37 PM Massimo Sgaravatto > wrote: > > > > Our Luminous ceph cluster have been worked without problems for a while, > but in the last days we have been suffering from continuous slow requests. > > > > We have indeed done some changes in the infrastructure recently: > > > > - Moved OSD nodes to a new switch > > - Increased pg nums for a pool, to have about ~ 100 PGs/OSD (also > because we have to install new OSDs in the cluster). The output of 'ceph > osd df' is attached. > > > > The problem could also be due to some ''bad' client, but in the log I > don't see a clear "correlation" with specific clients or images for such > blocked requests. > > > > I also tried to update to latest luminous release and latest CentOS7, > but this didn't help. > > > > > > > > Attached you can find the
Re: [ceph-users] Questions about using existing HW for PoC cluster
Hi Am 27. Januar 2019 18:20:24 MEZ schrieb Will Dennis : >Been reading "Learning Ceph - Second Edition" >(https://learning.oreilly.com/library/view/learning-ceph-/9781787127913/8f98bac7-44d4-45dc-b672-447d162ea604.xhtml) >and in Ch. 4 I read this: > >"We've noted that Ceph OSDs built with the new BlueStore back end do >not require journals. One might reason that additional cost savings can >be had by not having to deploy journal devices, and this can be quite >true. However, BlueStore does still benefit from provisioning certain >data components on faster storage, especially when OSDs are deployed on >relatively slow HDDs. Today's investment in fast FileStore journal >devices for HDD OSDs is not wasted when migrating to BlueStore. When >repaving OSDs as BlueStore devices the former journal devices can be >readily re purposed for BlueStore's RocksDB and WAL data. When using >SSD-based OSDs, this BlueStore accessory data can reasonably be >colocated with the OSD data store. For even better performance they can >employ faster yet NVMe or other technloogies for WAL and RocksDB. This >approach is not unknown for traditional FileStore journals as well, >though it is not inexpensive.Ceph clusters that are fortunate to >exploit SSDs as primary OSD dri >ves usually do not require discrete journal devices, though use cases >that require every last bit of performance may justify NVMe journals. >SSD clusters with NVMe journals are as uncommon as they are expensive, >but they are not unknown." > >So can I get by with using a single SATA SSD (size?) per server for >RocksDB / WAL if I'm using Bluestore? IIRC there is a rule of thump where the Size of DB-partition should be 4% of the OSD Size. I.e. 4TB OSD should have At least a DB Partition of 160GB Hth - Mehmet > > >> - Is putting the journal on a partition of the SATA drives a real I/O >killer? (this is how my Proxmox boxes are set up) >> - If YES to the above, then is a SATA SSD acceptable for journal >device, or should I definitely consider PCIe SSD? (I'd have to limit to >one per server, which I know isn't optimal, but price prevents >otherwise...) >___ >ceph-users mailing list >ceph-users@lists.ceph.com >http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Failed to load ceph-mgr modules: telemetry
Hi All, Any further suggestions, should i just ignore the error "Failed to load ceph-mgr modules: telemetry" or is this my route cause for no realtime I/O readings in the Dashboard? Thanks On Fri, Feb 8, 2019 at 3:59 PM Ashley Merrick wrote: > Message went away, but obviously still don't get the stats showing in the > dashboard (I am guessing this isn't a known bug currently?) and that they > should be working. > Everything work's fine apart from the dashboard does not show the live I/O > stats. > > Nothing is mentioned in mgr lags at the default log level, just the > message that showed under Cluster Logs. > > ,Ashley > > On Fri, Feb 8, 2019 at 3:57 PM Wido den Hollander wrote: > >> >> >> On 2/8/19 8:13 AM, Ashley Merrick wrote: >> > I have had issues on a mimic cluster (latest release) where the >> > dashboard does not display any read or write ops under the pool's >> > section on the main dashboard page. >> > >> > I have just noticed during restarting the mgr service the following >> > shows under "Cluster Logs", nothing else just the following : "Failed to >> > load ceph-mgr modules: telemetry" >> > >> > Would this be causing the issue, if so is there anything I can do to >> > find the issue? >> > >> >> Like you already did, the Mgr logs. Did the message go away or did you >> need to disable the module? >> >> Wido >> >> > I have looked through the mgr log's but nothing that matches the time >> > stamp shown in the Cluster Logs under the dashboard home page. >> > >> > ___ >> > ceph-users mailing list >> > ceph-users@lists.ceph.com >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > >> > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] OSD fails to start (fsck error, unable to read osd superblock)
Hi there, Running 12.2.11-1xenial on a machine with 6 SSD OSD with bluestore. Today we had two disks fail out of the controller, and after a reboot they both seemed to come back fine but ceph-osd was only able to start in one of them. The other one gets this: 2019-02-08 18:53:00.703376 7f64f948ce00 -1 bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x95104dfc, expected 0xb9e3e26d, device location [0x4000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2019-02-08 18:53:00.703406 7f64f948ce00 -1 osd.3 0 OSD::init() : unable to read osd superblock Note that there are no actual IO errors being shown by the controller in dmesg, and that the disk is readable. The metadata FS is mounted and looks normal. I tried running "ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-3 --deep 1" and that gets many instances of: 2019-02-08 19:00:31.783815 7fa35bd0df80 -1 bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000 checksum at blob offset 0x0, got 0x95104dfc, expected 0xb9e3e26d, device location [0x4000~1000], logical extent 0x0~1000, object #-1:7b3f43c4:::osd_superblock:0# 2019-02-08 19:00:31.783866 7fa35bd0df80 -1 bluestore(/var/lib/ceph/osd/ceph-3) fsck error: #-1:7b3f43c4:::osd_superblock:0# error during read: (5) Input/output error ...which is the same error. Due to a host being down for unrelated reasons, this is preventing some PG's from activating, keeping one pool inaccessible. There is no critical data in it, but I'm more interested in solving the issue for reliability. Any advice? What does bad crc indicate in this context? Should I send this to the bug tracker instead? -- Ruben Rodriguez | Chief Technology Officer, Free Software Foundation GPG Key: 05EF 1D2F FE61 747D 1FC8 27C3 7FAC 7D26 472F 4409 https://fsf.org | https://gnu.org ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Upgrade Luminous to mimic on Ubuntu 18.04
Hello m8s, Im curious how we should do an Upgrade of our ceph Cluster on Ubuntu 16/18.04. As (At least on our 18.04 nodes) we only have 12.2.7 (or .8?) For an Upgrade to mimic we should First Update to Last version, actualy 12.2.11 (iirc). Which is not possible on 18.04. Is there a Update path from 12.2.7/8 to actual mimic release or better the upcoming nautilus? Any advice? - Mehmet___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade Luminous to mimic on Ubuntu 18.04
Around available versions, are you using the Ubuntu repo’s or the CEPH 18.04 repo. The updates will always be slower to reach you if your waiting for it to hit the Ubuntu repo vs adding CEPH’s own. On Sun, 10 Feb 2019 at 12:19 AM, wrote: > Hello m8s, > > Im curious how we should do an Upgrade of our ceph Cluster on Ubuntu > 16/18.04. As (At least on our 18.04 nodes) we only have 12.2.7 (or .8?) > > For an Upgrade to mimic we should First Update to Last version, actualy > 12.2.11 (iirc). > Which is not possible on 18.04. > > Is there a Update path from 12.2.7/8 to actual mimic release or better the > upcoming nautilus? > > Any advice? > > - Mehmet___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade Luminous to mimic on Ubuntu 18.04
Hello Ashley, Thank you for this fast response. I cannt prove this jet but i am using already cephs own repo for Ubuntu 18.04 and this 12.2.7/8 is the latest available there... - Mehmet Am 9. Februar 2019 17:21:32 MEZ schrieb Ashley Merrick : >Around available versions, are you using the Ubuntu repo’s or the CEPH >18.04 repo. > >The updates will always be slower to reach you if your waiting for it >to >hit the Ubuntu repo vs adding CEPH’s own. > > >On Sun, 10 Feb 2019 at 12:19 AM, wrote: > >> Hello m8s, >> >> Im curious how we should do an Upgrade of our ceph Cluster on Ubuntu >> 16/18.04. As (At least on our 18.04 nodes) we only have 12.2.7 (or >.8?) >> >> For an Upgrade to mimic we should First Update to Last version, >actualy >> 12.2.11 (iirc). >> Which is not possible on 18.04. >> >> Is there a Update path from 12.2.7/8 to actual mimic release or >better the >> upcoming nautilus? >> >> Any advice? >> >> - Mehmet___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Upgrade Luminous to mimic on Ubuntu 18.04
What does the output of apt-get update look like on one of the nodes? You can just list the lines that mention CEPH Thanks On Sun, 10 Feb 2019 at 12:28 AM, wrote: > Hello Ashley, > > Thank you for this fast response. > > I cannt prove this jet but i am using already cephs own repo for Ubuntu > 18.04 and this 12.2.7/8 is the latest available there... > > - Mehmet > > Am 9. Februar 2019 17:21:32 MEZ schrieb Ashley Merrick < > singap...@amerrick.co.uk>: > >Around available versions, are you using the Ubuntu repo’s or the CEPH > >18.04 repo. > > > >The updates will always be slower to reach you if your waiting for it > >to > >hit the Ubuntu repo vs adding CEPH’s own. > > > > > >On Sun, 10 Feb 2019 at 12:19 AM, wrote: > > > >> Hello m8s, > >> > >> Im curious how we should do an Upgrade of our ceph Cluster on Ubuntu > >> 16/18.04. As (At least on our 18.04 nodes) we only have 12.2.7 (or > >.8?) > >> > >> For an Upgrade to mimic we should First Update to Last version, > >actualy > >> 12.2.11 (iirc). > >> Which is not possible on 18.04. > >> > >> Is there a Update path from 12.2.7/8 to actual mimic release or > >better the > >> upcoming nautilus? > >> > >> Any advice? > >> > >> - Mehmet___ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >> > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] faster switch to another mds
Hi! I have ceph cluster with 3 nodes with mon/mgr/mds servers. I reboot one node and see this in client log: Feb 09 20:29:14 ceph-nfs1 kernel: libceph: mon2 10.5.105.40:6789 socket closed (con state OPEN) Feb 09 20:29:14 ceph-nfs1 kernel: libceph: mon2 10.5.105.40:6789 session lost, hunting for new mon Feb 09 20:29:14 ceph-nfs1 kernel: libceph: mon0 10.5.105.34:6789 session established Feb 09 20:29:22 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed (con state OPEN) Feb 09 20:29:23 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed (con state CONNECTING) Feb 09 20:29:24 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed (con state CONNECTING) Feb 09 20:29:24 ceph-nfs1 kernel: libceph: mds0 10.5.105.40:6800 socket closed (con state CONNECTING) Feb 09 20:29:53 ceph-nfs1 kernel: ceph: mds0 reconnect start Feb 09 20:29:53 ceph-nfs1 kernel: ceph: mds0 reconnect success Feb 09 20:30:05 ceph-nfs1 kernel: ceph: mds0 recovery completed As I understand it, the following has happened: 1. Client detects - link with mon server broken and fast switches to another mon (less that 1 seconds). 2. Client detects - link with mds server broken, 3 times trying reconnect (unsuccessful), waiting and reconnects to the same mds after 30 seconds downtime. I have 2 questions: 1. Why? 2. How to reduce switching time to another mds? ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RDMA/RoCE enablement failed with (113) No route to host
Hi Roman, We recently discussed your tests and a simple idea came to my mind - can you repeat your tests targeting latency instead of max throughput? I mean just use iodepth=1. What the latency is and on what hardware? Well, I am playing with ceph rdma implementation quite a while and it has unsolved problems, thus I would say the status is "not completely broken", but "you can run it on your own risk and smile": 1. On disconnect of previously active (high write load) connection there is a race that can lead to osd (or any receiver) crash: https://github.com/ceph/ceph/pull/25447 2. Recent qlogic hardware (qedr drivers) does not support IBV_EVENT_QP_LAST_WQE_REACHED, which is used in ceph rdma implementation, pull request from 1. also targets this incompatibility. 3. On high write load and many connections there is a chance, that osd can run out of receive WRs and rdma connection (QP) on sender side will get IBV_WC_RETRY_EXC_ERR, thus disconnected. This is fundamental design problem, which has to be fixed on protocol level (e.g. propagate backpressure to senders). 4. Unfortunately neither rdma or any other 0-latency network can bring significant value, because the bottle neck is not a network, please consider this for further reading regarding transport performance in ceph: https://www.spinics.net/lists/ceph-devel/msg43555.html Problems described above have quite a big impact on overall transport performance. -- Roman -- With best regards, Vitaliy Filippov ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] OSD fails to start (fsck error, unable to read osd superblock)
On Sun, Feb 10, 2019 at 1:56 AM Ruben Rodriguez wrote: > > Hi there, > > Running 12.2.11-1xenial on a machine with 6 SSD OSD with bluestore. > > Today we had two disks fail out of the controller, and after a reboot > they both seemed to come back fine but ceph-osd was only able to start > in one of them. The other one gets this: > > 2019-02-08 18:53:00.703376 7f64f948ce00 -1 > bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000 > checksum at blob offset 0x0, got 0x95104dfc, expected 0xb9e3e26d, device > location [0x4000~1000], logical extent 0x0~1000, object > #-1:7b3f43c4:::osd_superblock:0# > 2019-02-08 18:53:00.703406 7f64f948ce00 -1 osd.3 0 OSD::init() : unable > to read osd superblock > > Note that there are no actual IO errors being shown by the controller in > dmesg, and that the disk is readable. The metadata FS is mounted and > looks normal. > > I tried running "ceph-bluestore-tool repair --path > /var/lib/ceph/osd/ceph-3 --deep 1" and that gets many instances of: Running this with debug_bluestore=30 might give more information on the nature of the IO error. > > 2019-02-08 19:00:31.783815 7fa35bd0df80 -1 > bluestore(/var/lib/ceph/osd/ceph-3) _verify_csum bad crc32c/0x1000 > checksum at blob offset 0x0, got 0x95104dfc, expected 0xb9e3e26d, device > location [0x4000~1000], logical extent 0x0~1000, object > #-1:7b3f43c4:::osd_superblock:0# > 2019-02-08 19:00:31.783866 7fa35bd0df80 -1 > bluestore(/var/lib/ceph/osd/ceph-3) fsck error: > #-1:7b3f43c4:::osd_superblock:0# error during read: (5) Input/output error > > ...which is the same error. Due to a host being down for unrelated > reasons, this is preventing some PG's from activating, keeping one pool > inaccessible. There is no critical data in it, but I'm more interested > in solving the issue for reliability. > > Any advice? What does bad crc indicate in this context? Should I send > this to the bug tracker instead? > -- > Ruben Rodriguez | Chief Technology Officer, Free Software Foundation > GPG Key: 05EF 1D2F FE61 747D 1FC8 27C3 7FAC 7D26 472F 4409 > https://fsf.org | https://gnu.org > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Cheers, Brad ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Debugging 'slow requests' ...
The log ends at $ zcat ceph-osd.5.log.gz |tail -2 2019-02-09 07:37:00.022534 7f5fce60d700 1 -- 192.168.61.202:6816/157436 >> - conn(0x56308edcf000 :6816 s=STATE_ACCEPTING pgs=0 cs=0 l=0)._process_connection sd=296 - The last two messages are outbound to 192.168.222.204 and there are no further messages between these two hosts (other than osd_pings) in the log. $ zcat ceph-osd.5.log.gz |gawk '!/osd_ping/&&/192.168.222.202/&&/192.168.222.204/&&/07:29:3/'|tail -5 2019-02-09 07:29:34.267744 7f5fcee0e700 1 -- 192.168.222.202:6816/157436 <== osd.29 192.168.222.204:6804/4159520 1946 rep_scrubmap(8.2bc e1205735 from shard 29) v2 40+0+1492 (3695125937 0 2050362985) 0x563090674d80 con 0x56308bf61000 2019-02-09 07:29:34.375223 7f5faf4b4700 1 -- 192.168.222.202:6816/157436 --> 192.168.222.204:6804/4159520 -- replica_scrub(pg: 8.2bc,from:0'0,to:0'0,epoch:1205833/1205735,start:8:3d4e6145:::rbd_data.35f46d19abe4ed.77a4:0,end:8:3d4e6916:::rbd_data.a6dc2425de9600.0006249c:0,chunky:1,deep:0,version:9,allow_preemption:1,priority=5) v9 -- 0x56308bdf2000 con 0 2019-02-09 07:29:34.378535 7f5fcee0e700 1 -- 192.168.222.202:6816/157436 <== osd.29 192.168.222.204:6804/4159520 1947 rep_scrubmap(8.2bc e1205735 from shard 29) v2 40+0+1494 (3695125937 0 865217733) 0x563092d90900 con 0x56308bf61000 2019-02-09 07:29:34.415868 7f5faf4b4700 1 -- 192.168.222.202:6816/157436 --> 192.168.222.204:6804/4159520 -- osd_repop(client.171725953.0:404377591 8.9b e1205833/1205735 8:d90adab6:::rbd_data.c47f3c390c8495.0001934a:head v 1205833'4767322) v2 -- 0x56308ca42400 con 0 2019-02-09 07:29:34.486296 7f5faf4b4700 1 -- 192.168.222.202:6816/157436 --> 192.168.222.204:6804/4159520 -- replica_scrub(pg: 8.2bc,from:0'0,to:0'0,epoch:1205833/1205735,start:8:3d4e6916:::rbd_data.a6dc2425de9600.0006249c:0,end:8:3d4e7434:::rbd_data.47c1b437840214.0003c594:0,chunky:1,deep:0,version:9,allow_preemption:1,priority=5) v9 -- 0x56308e565340 con 0 I'd be taking a good, hard look at the network, yes. On Sat, Feb 9, 2019 at 6:33 PM Massimo Sgaravatto wrote: > > Thanks for your feedback ! > > I increased debug_ms to 1/5. > > This is another slow request (full output from 'ceph daemon osd.5 > dump_historic_ops' for this event is attached): > > > { > "description": "osd_op(client.171725953.0:404377591 8.9b > 8:d90adab6: > ::rbd_data.c47f3c390c8495.0001934a:head [set-alloc-hint object_size > 4194 > 304 write_size 4194304,write 1413120~122880] snapc 0=[] > ondisk+write+known_if_re > directed e1205833)", > "initiated_at": "2019-02-09 07:29:34.404655", > "age": 387.914193, > "duration": 340.224154, > "type_data": { > "flag_point": "commit sent; apply or cleanup", > "client_info": { > "client": "client.171725953", > "client_addr": "192.168.61.66:0/4056439540", > "tid": 404377591 > }, > "events": [ > { > "time": "2019-02-09 07:29:34.404655", > "event": "initiated" > }, > > >{ > "time": "2019-02-09 07:29:34.416752", > "event": "op_applied" > }, > { > "time": "2019-02-09 07:29:34.417200", > "event": "sub_op_commit_rec from 14" > }, > { > "time": "2019-02-09 07:35:14.628724", > "event": "sub_op_commit_rec from 29" > }, > > osd.5 has IP 192.168.222.202 > osd.14 has IP 192.168.222.203 > osd.29 has IP 192.168.222.204 > > > Grepping using that client id from osd.5 log as far as I can understand > (please correct me if my debugging is completely wrong) the request to 5 and > 14 is sent at 07:29:34: > > 2019-02-09 07:29:34.415808 7f5faf4b4700 1 -- 192.168.222.202:6816/157436 --> > 192.168.222.203:6811/158495 -- osd_repop(client.171725953.0:404377591 8.9b > e1205833/1205735 8:d90ada\ > b6:::rbd_data.c47f3c390c8495.0001934a:head v 1205833'4767322) v2 -- > 0x56307bb61e00 con 0 > 2019-02-09 07:29:34.415868 7f5faf4b4700 1 -- 192.168.222.202:6816/157436 --> > 192.168.222.204:6804/4159520 -- osd_repop(client.171725953.0:404377591 8.9b > e1205833/1205735 8:d90ad\ > ab6:::rbd_data.c47f3c390c8495.0001934a:head v 1205833'4767322) v2 -- > 0x56308ca42400 con 0 > 2019-02-09 07:29:34.417132 7f5fcf60f700 1 -- 192.168.222.202:6816/157436 <== > osd.14 192.168.222.203:6811/158495 11242 > osd_repop_reply(client.171725953.0:404377591 8.9b e120\ > 5833/1205735) v2 111+0+0 (634943494 0 0) 0x563090642780 con > 0x56308bbd > > The answer from 14 arrives immediately: > > 2019-02-09 07:29:34.417132 7f5fcf60f700 1 -- 192.168.222.202:6816/15743