Re: [ceph-users] ceph osd pg-upmap-items not working
On Thu, 4 Apr 2019 at 13:32, Dan van der Ster wrote: > > There are several more fixes queued up for v12.2.12: > > 16b7cc1bf9 osd/OSDMap: add log for better debugging > 3d2945dd6e osd/OSDMap: calc_pg_upmaps - restrict optimization to > origin pools only > ab2dbc2089 osd/OSDMap: drop local pool filter in calc_pg_upmaps > 119d8cb2a1 crush: fix upmap overkill > 0729a78877 osd/OSDMap: using std::vector::reserve to reduce memory > reallocation > f4f66e4f0a osd/OSDMap: more improvements to upmap > 7bebc4cd28 osd/OSDMap: be more aggressive when trying to balance > 1763a879e3 osd/OSDMap: potential access violation fix > 8b3114ea62 osd/OSDMap: don't mapping all pgs each time in calc_pg_upmaps > > I haven't personally tried the newest of those yet because the > balancer is working pretty well in our environment. > Though one thing we definitely need to improve is the osd failure / > upmap interplay. We currently lose all related upmaps when an osd is > out -- this means that even though we re-use an osd-id we still need > the balancer to work for awhile to restore the perfect balancing. > > If you have simple reproducers for your issues, please do create a tracker. > Upgraded to v13.2.x, and it's still the same. https://tracker.ceph.com/issues/39136 -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] RGW: Reshard index of non-master zones in multi-site
On Mon, 8 Apr 2019 at 05:01, Matt Benjamin wrote: > > Hi Christian, > > Dynamic bucket-index sharding for multi-site setups is being worked > on, and will land in the N release cycle. > What about removing orphaned shards on the master? Is the existing tools able to work with that? On the secondaries, it is no problem to proxy_pass all requests to the master whilst all rgw pools are destroyed and recreated. I would have though however that manually removing the known orphaned indexes be safe though, side-stepping the annoying job of having to force degrade the working service. -- Iain Buclaw *(p < e ? p++ : p) = (c & 0x0f) + '0'; ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Latency spikes in OSD's based on bluestore
Hi Anthony, Thanks for answering. >> Which SSD model and firmware are you using? Which HBA? Well, from what I can see it's basically from all our SSD's, which unfortunately varies a bit. But from the example I posted the particular disk was, SSD SATA 6.0 Gb/s/0/100/1/0/0.8.0 /dev/sdgdisk 800GB INTEL SSDSC2BX80 PERC H730P Mini (25.5.3.0005) All our SSD's are configured as pass through so I wouldn't think that the controller would be involved to much. >> Compaction may well be a factor as well, but I’ve experienced >> hardware/firmware issues as well so I had to ask. Well, my guess is that it is the compaction, and that there may be ways of tuning this. I'm just curious about "how to do it", and if those "spikes" we see are "normal operation and nothing to worry about", or if one actually should take some sort of action. Thanks for answering! Best Regards, Patrik Martinsson, Sweden ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore
Which OS are you using? With CentOS we find that the heap is not always automatically released. (You can check the heap freelist with `ceph tell osd.0 heap stats`). As a workaround we run this hourly: ceph tell mon.* heap release ceph tell osd.* heap release ceph tell mds.* heap release -- Dan On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet wrote: > > Hi, > > on a Luminous 12.2.11 deploiement, my bluestore OSD exceed the > osd_memory_target : > > daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd > ceph3646 17.1 12.0 6828916 5893136 ? Ssl mars29 1903:42 > /usr/bin/ceph-osd -f --cluster ceph --id 143 --setuser ceph --setgroup ceph > ceph3991 12.9 11.2 6342812 5485356 ? Ssl mars29 1443:41 > /usr/bin/ceph-osd -f --cluster ceph --id 144 --setuser ceph --setgroup ceph > ceph4361 16.9 11.8 6718432 5783584 ? Ssl mars29 1889:41 > /usr/bin/ceph-osd -f --cluster ceph --id 145 --setuser ceph --setgroup ceph > ceph4731 19.7 12.2 6949584 5982040 ? Ssl mars29 2198:47 > /usr/bin/ceph-osd -f --cluster ceph --id 146 --setuser ceph --setgroup ceph > ceph5073 16.7 11.6 6639568 5701368 ? Ssl mars29 1866:05 > /usr/bin/ceph-osd -f --cluster ceph --id 147 --setuser ceph --setgroup ceph > ceph5417 14.6 11.2 6386764 5519944 ? Ssl mars29 1634:30 > /usr/bin/ceph-osd -f --cluster ceph --id 148 --setuser ceph --setgroup ceph > ceph5760 16.9 12.0 6806448 5879624 ? Ssl mars29 1882:42 > /usr/bin/ceph-osd -f --cluster ceph --id 149 --setuser ceph --setgroup ceph > ceph6105 16.0 11.6 6576336 5694556 ? Ssl mars29 1782:52 > /usr/bin/ceph-osd -f --cluster ceph --id 150 --setuser ceph --setgroup ceph > > daevel-ob@ssdr712h:~$ free -m > totalusedfree shared buff/cache > available > Mem: 47771 452101643 17 917 > 43556 > Swap: 0 0 0 > > # ceph daemon osd.147 config show | grep memory_target > "osd_memory_target": "4294967296", > > > And there is no recovery / backfilling, the cluster is fine : > >$ ceph status > cluster: >id: de035250-323d-4cf6-8c4b-cf0faf6296b1 >health: HEALTH_OK > > services: >mon: 5 daemons, quorum tolriq,tsyne,olkas,lorunde,amphel >mgr: tsyne(active), standbys: olkas, tolriq, lorunde, amphel >osd: 120 osds: 116 up, 116 in > > data: >pools: 20 pools, 12736 pgs >objects: 15.29M objects, 31.1TiB >usage: 101TiB used, 75.3TiB / 177TiB avail >pgs: 12732 active+clean > 4 active+clean+scrubbing+deep > > io: >client: 72.3MiB/s rd, 26.8MiB/s wr, 2.30kop/s rd, 1.29kop/s wr > > >On an other host, in the same pool, I see also high memory usage : > >daevel-ob@ssdr712g:~$ ps auxw | grep ceph-osd >ceph6287 6.6 10.6 6027388 5190032 ? Ssl mars21 1511:07 > /usr/bin/ceph-osd -f --cluster ceph --id 131 --setuser ceph --setgroup ceph >ceph6759 7.3 11.2 6299140 5484412 ? Ssl mars21 1665:22 > /usr/bin/ceph-osd -f --cluster ceph --id 132 --setuser ceph --setgroup ceph >ceph7114 7.0 11.7 6576168 5756236 ? Ssl mars21 1612:09 > /usr/bin/ceph-osd -f --cluster ceph --id 133 --setuser ceph --setgroup ceph >ceph7467 7.4 11.1 6244668 5430512 ? Ssl mars21 1704:06 > /usr/bin/ceph-osd -f --cluster ceph --id 134 --setuser ceph --setgroup ceph >ceph7821 7.7 11.1 6309456 5469376 ? Ssl mars21 1754:35 > /usr/bin/ceph-osd -f --cluster ceph --id 135 --setuser ceph --setgroup ceph >ceph8174 6.9 11.6 6545224 5705412 ? Ssl mars21 1590:31 > /usr/bin/ceph-osd -f --cluster ceph --id 136 --setuser ceph --setgroup ceph >ceph8746 6.6 11.1 6290004 5477204 ? Ssl mars21 1511:11 > /usr/bin/ceph-osd -f --cluster ceph --id 137 --setuser ceph --setgroup ceph >ceph9100 7.7 11.6 6552080 5713560 ? Ssl mars21 1757:22 > /usr/bin/ceph-osd -f --cluster ceph --id 138 --setuser ceph --setgroup ceph > >But ! On a similar host, in a different pool, the problem is less visible : > >daevel-ob@ssdr712i:~$ ps auxw | grep ceph-osd >ceph3617 2.8 9.9 5660308 4847444 ? Ssl mars29 313:05 > /usr/bin/ceph-osd -f --cluster ceph --id 151 --setuser ceph --setgroup ceph >ceph3958 2.3 9.8 5661936 4834320 ? Ssl mars29 256:55 > /usr/bin/ceph-osd -f --cluster ceph --id 152 --setuser ceph --setgroup ceph >ceph4299 2.3 9.8 5620616 4807248 ? Ssl mars29 266:26 > /usr/bin/ceph-osd -f --cluster ceph --id 153 --setuser ceph --setgroup ceph >ceph4643 2.3 9.6 5527724 4713572 ? Ssl mars29 262:50 > /usr/bin/ceph-osd -f --cluster ceph --id 154 --setuser ceph --setgroup ceph >ceph5016 2.2 9.7 5597504 4783412 ? Ssl mars29 248:37 > /usr/bin/ceph-osd -f --cluster ceph --id 155 --setuser ceph --setgroup
[ceph-users] radosgw cloud sync aws s3 auth failed
hi,all I had test the cloud sync module in radosgw. ceph verion is 13.2.5 , git commit id is cbff874f9007f1869bfd3821b7e33b2a6ffd4988; when sync to a aws s3 endpoint ,get http 400 error , so I use http:// protocol ,use the tcpick tool to dump some message like this. PUT /wuxi01 HTTP/1.1 Host: s3.cn-north-1.amazonaws.com.cn Accept: */* Authorization: AWS AKIAUQ2G7NKZFVDQ76FZ:7ThaXKa3axR7Egf1tkwZc/YNRm4= Date: Mon, 08 Apr 2019 10:04:37 + Content-Length: 0 HTTP/1.1 400 Bad Request x-amz-request-id: 65803EFC370CF11A x-amz-id-2: py6N1QJw+pd91mvL0XpQhiwIVOiWIUprAX8PwAuSVOx3vrqat/Ka+xIVW3D1zC0+tJSLQyr4qC4= x-amz-region: cn-north-1 Content-Type: application/xml Transfer-Encoding: chunked Date: Mon, 08 Apr 2019 10:04:37 GMT Connection: close Server: AmazonS3 144 InvalidRequestThe authorization mechanism you have provided is not supported. Please use AWS4-HMAC-SHA256.65803EFC370CF11Apy6N1QJw+pd91mvL0XpQhiwIVOiWIUprAX8PwAuSVOx3vrqat/Ka+xIVW3D1zC0+tJSLQyr4qC4= 0 it looks like that the client use a old auth method, not use the aws4-hmac-sha256. but , how can enable the aws4-hmac-sha256 auth method?___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] NFS-Ganesha Mounts as a Read-Only Filesystem
Possibly the client doesn't like the server returning SecType = "none"; Maybe try SecType = "sys":? Leon L. Robinson > On 6 Apr 2019, at 12:06, > wrote: > > Hi all, > > I have recently setup a Ceph cluster and on request using CephFS (MDS > version: ceph version 13.2.5 (cbff874f9007f1869bfd3821b7e33b2a6ffd4988) mimic > (stable)) as a backend for NFS-Ganesha. I have successfully tested a direct > mount with CephFS to read/write files, however I’m perplexed as to NFS > mounting as read-only despite setting the RW flags. > > [root@mon02 mnt]# touch cephfs/test.txt > touch: cannot touch âcephfs/test.txtâ: Read-only file system > > Configuration of Ganesha is below: > > NFS_CORE_PARAM > { > Enable_NLM = false; > Enable_RQUOTA = false; > Protocols = 4; > } > > NFSv4 > { > Delegations = true; > RecoveryBackend = rados_ng; > Minor_Versions = 1,2; > } > > CACHEINODE { > Dir_Chunk = 0; > NParts = 1; > Cache_Size = 1; > } > > EXPORT > { > Export_ID = 15; > Path = "/"; > Pseudo = "/cephfs/"; > Access_Type = RW; > NFS_Protocols = "4"; > Squash = No_Root_Squash; > Transport_Protocols = TCP; > SecType = "none"; > Attr_Expiration_Time = 0; > Delegations = R; > > FSAL { > Name = CEPH; > User_Id = "ganesha"; > Filesystem = "cephfs"; > Secret_Access_Key = ""; > } > } > > > Provided mount parameters: > > mount -t nfs -o nfsvers=4.1,proto=tcp,rw,noatime,sync 172.16.32.15:/ > /mnt/cephfs > > I have tried stripping much of the config and altering mount options, but so > far completely unable to decipher the cause. Also seems I’m not the only one > who has been caught on this: > > https://www.spinics.net/lists/ceph-devel/msg41201.html > > Thanks in advance, > > Thomas > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] how to judge the results? - rados bench comparison
Hi there, i'm new to ceph and just got my first cluster running. Now i'd like to know if the performance we get is expectable. Is there a website with benchmark results somewhere where i could have a look to compare with our HW and our results? This are the results: rados bench single threaded: # rados bench 10 write --rbd-cache=false -t 1 Object size:4194304 Bandwidth (MB/sec): 53.7186 Stddev Bandwidth: 3.86437 Max bandwidth (MB/sec): 60 Min bandwidth (MB/sec): 48 Average IOPS: 13 Stddev IOPS:0.966092 Average Latency(s): 0.0744599 Stddev Latency(s): 0.00911778 nearly maxing out one (idle) client with 28 threads # rados bench 10 write --rbd-cache=false -t 28 Bandwidth (MB/sec): 850.451 Stddev Bandwidth: 40.6699 Max bandwidth (MB/sec): 904 Min bandwidth (MB/sec): 748 Average IOPS: 212 Stddev IOPS:10.1675 Average Latency(s): 0.131309 Stddev Latency(s): 0.0318489 four concurrent benchmarks on four clients each with 24 threads: Bandwidth (MB/sec): 396 376 381 389 Stddev Bandwidth: 30 25 22 22 Max bandwidth (MB/sec): 440 420 416 428 Min bandwidth (MB/sec): 352 348 344 364 Average IOPS: 99 94 95 97 Stddev IOPS:7.5 6.3 5.6 5.6 Average Latency(s): 0.240.250.250.24 Stddev Latency(s): 0.120.150.150.14 summing up: write mode ~1500 MB/sec Bandwidth ~385 IOPS ~0.25s Latency rand mode: ~3500 MB/sec ~920 IOPS ~0.154s Latency Maybe someone could judge our numbers. I am actually very satisfied with the values. The (mostly idle) cluster is build from these components: * 10GB frontend network, bonding two connections to mon-, mds- and osd-nodes ** no bonding to clients * 25GB backend network, bonding two connections to osd-nodes cluster: * 3x mon, 2x Intel(R) Xeon(R) Bronze 3104 CPU @ 1.70GHz, 64GB RAM * 3x mds, 1x Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz, 128MB RAM * 7x OSD-nodes, 2x Intel(R) Xeon(R) Silver 4112 CPU @ 2.60GHz, 96GB RAM ** 4x 6TB SAS HDD HGST HUS726T6TAL5204 (5x on two nodes, max. 6x per chassis for later growth) ** 2x 800GB SAS SSD WDC WUSTM3280ASS200 => SW-RAID1 => LVM ~116 GiB per OSD for DB and WAL erasure encoded pool: (made for CephFS) * plugin=clay k=5 m=2 d=6 crush-failure-domain=host Thanks and best regards Lars ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Ceph Replication not working
The log appears to be missing all the librbd log messages. The process seems to stop at attempting to open the image from the remote cluster: 2019-04-05 12:07:29.992323 7f0f3bfff700 20 rbd::mirror::image_replayer::OpenImageRequest: 0x7f0f28018a20 send_open_image Assuming you are using the default log file naming settings, the log should be located at "/var/log/ceph/ceph-client.mirrorprod.log". Of course, looking at your cluster naming makes me think that since your primary cluster is named "ceph" on the DR-site side, have you changed your "/etc/default/ceph" file to rename the local cluster from "ceph" to "cephdr" so that the "rbd-mirror" daemon connects to the correct local cluster? On Fri, Apr 5, 2019 at 3:28 PM Vikas Rana wrote: > > Hi Jason, > > 12.2.11 is the version. > > Attached is the complete log file. > > We removed the pool to make sure there's no image left on DR site and > recreated an empty pool. > > Thanks, > -Vikas > > -Original Message- > From: Jason Dillaman > Sent: Friday, April 5, 2019 2:24 PM > To: Vikas Rana > Cc: ceph-users > Subject: Re: [ceph-users] Ceph Replication not working > > What is the version of rbd-mirror daemon and your OSDs? It looks it found two > replicated images and got stuck on the "wait_for_deletion" > step. Since I suspect those images haven't been deleted, it should have > immediately proceeded to the next step of the image replay state machine. Are > there any additional log messages after 2019-04-05 12:07:29.981203? > > On Fri, Apr 5, 2019 at 1:56 PM Vikas Rana wrote: > > > > Hi there, > > > > We are trying to setup a rbd-mirror replication and after the setup, > > everything looks good but images are not replicating. > > > > > > > > Can some please please help? > > > > > > > > Thanks, > > > > -Vikas > > > > > > > > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool info nfs > > > > Mode: pool > > > > Peers: > > > > UUID NAME CLIENT > > > > bcd54bc5-cd08-435f-a79a-357bce55011d ceph client.mirrorprod > > > > > > > > root@local:/etc/ceph# rbd mirror pool info nfs > > > > Mode: pool > > > > Peers: > > > > UUID NAME CLIENT > > > > 612151cf-f70d-49d0-94e2-a7b850a53e4f cephdr client.mirrordr > > > > > > > > > > > > root@local:/etc/ceph# rbd info nfs/test01 > > > > rbd image 'test01': > > > > size 102400 kB in 25 objects > > > > order 22 (4096 kB objects) > > > > block_name_prefix: rbd_data.11cd3c238e1f29 > > > > format: 2 > > > > features: layering, exclusive-lock, object-map, fast-diff, > > deep-flatten, journaling > > > > flags: > > > > journal: 11cd3c238e1f29 > > > > mirroring state: enabled > > > > mirroring global id: 06fbfe68-b7e4-4d3a-93b2-cd18c569f7f7 > > > > mirroring primary: true > > > > > > > > > > > > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool status nfs > > --verbose > > > > health: OK > > > > images: 0 total > > > > > > > > root@remote:/var/log/ceph# rbd info nfs/test01 > > > > rbd: error opening image test01: (2) No such file or directory > > > > > > > > > > > > root@remote:/var/log/ceph# ceph -s --cluster cephdr > > > > cluster: > > > > id: ade49174-1f84-4c3c-a93c-b293c3655c93 > > > > health: HEALTH_WARN > > > > noout,nodeep-scrub flag(s) set > > > > > > > > services: > > > > mon:3 daemons, quorum nidcdvtier1a,nidcdvtier2a,nidcdvtier3a > > > > mgr:nidcdvtier1a(active), standbys: nidcdvtier2a > > > > osd:12 osds: 12 up, 12 in > > > > flags noout,nodeep-scrub > > > > rbd-mirror: 1 daemon active > > > > > > > > data: > > > > pools: 5 pools, 640 pgs > > > > objects: 1.32M objects, 5.03TiB > > > > usage: 10.1TiB used, 262TiB / 272TiB avail > > > > pgs: 640 active+clean > > > > > > > > io: > > > > client: 170B/s rd, 0B/s wr, 0op/s rd, 0op/s wr > > > > > > > > > > > > 2019-04-05 12:07:29.720742 7f0fa5e284c0 0 ceph version 12.2.11 > > (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable), process > > rbd-mirror, pid 3921391 > > > > 2019-04-05 12:07:29.721752 7f0fa5e284c0 0 pidfile_write: ignore empty > > --pid-file > > > > 2019-04-05 12:07:29.726580 7f0fa5e284c0 20 rbd::mirror::ServiceDaemon: > > 0x560200d29bb0 ServiceDaemon: > > > > 2019-04-05 12:07:29.732654 7f0fa5e284c0 20 rbd::mirror::ServiceDaemon: > > 0x560200d29bb0 init: > > > > 2019-04-05 12:07:29.734920 7f0fa5e284c0 1 mgrc > > service_daemon_register rbd-mirror.admin metadata > > {arch=x86_64,ceph_version=ceph version 12.2.11 > > (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous > > (stable),cpu=Intel(R) Xeon(R) CPU E5-2690 v2 @ > > 3.00GHz,distro=ubuntu,distro_description=Ubuntu 14.04.5 > > LTS,distro_version=14.04,hostname=nidcdvtier3a,instance_id=464360,kern > > el_description=#93 SMP Sat Jun 17 04:01:23 EDT > > 2017,kernel_version=3.19.0-85-vtier,mem_swap_kb=6710578
Re: [ceph-users] Ceph Replication not working
Hi Jason, On Prod side, we have cluster ceph and on DR side we renamed to cephdr Accordingly, we renamed the ceph.conf to cephdr.conf on DR side. This setup used to work and one day we tried to promote the DR to verify the replication and since then it's been a nightmare. The resync didn’t work and then we eventually gave up and deleted the pool on DR side to start afresh. We deleted and recreated the peer relationship also. Is there any debugging we can do on Prod or DR side to see where its stopping or waiting while "send_open_image"? Rbd-mirror is running as "rbd-mirror --cluster=cephdr" Thanks, -Vikas -Original Message- From: Jason Dillaman Sent: Monday, April 8, 2019 9:30 AM To: Vikas Rana Cc: ceph-users Subject: Re: [ceph-users] Ceph Replication not working The log appears to be missing all the librbd log messages. The process seems to stop at attempting to open the image from the remote cluster: 2019-04-05 12:07:29.992323 7f0f3bfff700 20 rbd::mirror::image_replayer::OpenImageRequest: 0x7f0f28018a20 send_open_image Assuming you are using the default log file naming settings, the log should be located at "/var/log/ceph/ceph-client.mirrorprod.log". Of course, looking at your cluster naming makes me think that since your primary cluster is named "ceph" on the DR-site side, have you changed your "/etc/default/ceph" file to rename the local cluster from "ceph" to "cephdr" so that the "rbd-mirror" daemon connects to the correct local cluster? On Fri, Apr 5, 2019 at 3:28 PM Vikas Rana wrote: > > Hi Jason, > > 12.2.11 is the version. > > Attached is the complete log file. > > We removed the pool to make sure there's no image left on DR site and > recreated an empty pool. > > Thanks, > -Vikas > > -Original Message- > From: Jason Dillaman > Sent: Friday, April 5, 2019 2:24 PM > To: Vikas Rana > Cc: ceph-users > Subject: Re: [ceph-users] Ceph Replication not working > > What is the version of rbd-mirror daemon and your OSDs? It looks it found two > replicated images and got stuck on the "wait_for_deletion" > step. Since I suspect those images haven't been deleted, it should have > immediately proceeded to the next step of the image replay state machine. Are > there any additional log messages after 2019-04-05 12:07:29.981203? > > On Fri, Apr 5, 2019 at 1:56 PM Vikas Rana wrote: > > > > Hi there, > > > > We are trying to setup a rbd-mirror replication and after the setup, > > everything looks good but images are not replicating. > > > > > > > > Can some please please help? > > > > > > > > Thanks, > > > > -Vikas > > > > > > > > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool info nfs > > > > Mode: pool > > > > Peers: > > > > UUID NAME CLIENT > > > > bcd54bc5-cd08-435f-a79a-357bce55011d ceph client.mirrorprod > > > > > > > > root@local:/etc/ceph# rbd mirror pool info nfs > > > > Mode: pool > > > > Peers: > > > > UUID NAME CLIENT > > > > 612151cf-f70d-49d0-94e2-a7b850a53e4f cephdr client.mirrordr > > > > > > > > > > > > root@local:/etc/ceph# rbd info nfs/test01 > > > > rbd image 'test01': > > > > size 102400 kB in 25 objects > > > > order 22 (4096 kB objects) > > > > block_name_prefix: rbd_data.11cd3c238e1f29 > > > > format: 2 > > > > features: layering, exclusive-lock, object-map, fast-diff, > > deep-flatten, journaling > > > > flags: > > > > journal: 11cd3c238e1f29 > > > > mirroring state: enabled > > > > mirroring global id: 06fbfe68-b7e4-4d3a-93b2-cd18c569f7f7 > > > > mirroring primary: true > > > > > > > > > > > > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool status > > nfs --verbose > > > > health: OK > > > > images: 0 total > > > > > > > > root@remote:/var/log/ceph# rbd info nfs/test01 > > > > rbd: error opening image test01: (2) No such file or directory > > > > > > > > > > > > root@remote:/var/log/ceph# ceph -s --cluster cephdr > > > > cluster: > > > > id: ade49174-1f84-4c3c-a93c-b293c3655c93 > > > > health: HEALTH_WARN > > > > noout,nodeep-scrub flag(s) set > > > > > > > > services: > > > > mon:3 daemons, quorum nidcdvtier1a,nidcdvtier2a,nidcdvtier3a > > > > mgr:nidcdvtier1a(active), standbys: nidcdvtier2a > > > > osd:12 osds: 12 up, 12 in > > > > flags noout,nodeep-scrub > > > > rbd-mirror: 1 daemon active > > > > > > > > data: > > > > pools: 5 pools, 640 pgs > > > > objects: 1.32M objects, 5.03TiB > > > > usage: 10.1TiB used, 262TiB / 272TiB avail > > > > pgs: 640 active+clean > > > > > > > > io: > > > > client: 170B/s rd, 0B/s wr, 0op/s rd, 0op/s wr > > > > > > > > > > > > 2019-04-05 12:07:29.720742 7f0fa5e284c0 0 ceph version 12.2.11 > > (26dc3775efc7bb286a1d6d66faee0ba30ea23eee) luminous (stable), > > process rbd-mirror, pid 3921391 > > > > 2019-04-05
Re: [ceph-users] Ceph Replication not working
On Mon, Apr 8, 2019 at 9:47 AM Vikas Rana wrote: > > Hi Jason, > > On Prod side, we have cluster ceph and on DR side we renamed to cephdr > > Accordingly, we renamed the ceph.conf to cephdr.conf on DR side. > > This setup used to work and one day we tried to promote the DR to verify the > replication and since then it's been a nightmare. > The resync didn’t work and then we eventually gave up and deleted the pool on > DR side to start afresh. > > We deleted and recreated the peer relationship also. > > Is there any debugging we can do on Prod or DR side to see where its stopping > or waiting while "send_open_image"? You need to add "debug rbd = 20" to both your ceph.conf and cephdr.conf (if you haven't already) and you would need to provide the log associated w/ the production cluster connection (see below). Also, please use pastebin or similar service to avoid mailing the logs to the list. > Rbd-mirror is running as "rbd-mirror --cluster=cephdr" > > > Thanks, > -Vikas > > -Original Message- > From: Jason Dillaman > Sent: Monday, April 8, 2019 9:30 AM > To: Vikas Rana > Cc: ceph-users > Subject: Re: [ceph-users] Ceph Replication not working > > The log appears to be missing all the librbd log messages. The process seems > to stop at attempting to open the image from the remote cluster: > > 2019-04-05 12:07:29.992323 7f0f3bfff700 20 > rbd::mirror::image_replayer::OpenImageRequest: 0x7f0f28018a20 send_open_image > > Assuming you are using the default log file naming settings, the log should > be located at "/var/log/ceph/ceph-client.mirrorprod.log". Of course, looking > at your cluster naming makes me think that since your primary cluster is > named "ceph" on the DR-site side, have you changed your "/etc/default/ceph" > file to rename the local cluster from "ceph" > to "cephdr" so that the "rbd-mirror" daemon connects to the correct local > cluster? > > > On Fri, Apr 5, 2019 at 3:28 PM Vikas Rana wrote: > > > > Hi Jason, > > > > 12.2.11 is the version. > > > > Attached is the complete log file. > > > > We removed the pool to make sure there's no image left on DR site and > > recreated an empty pool. > > > > Thanks, > > -Vikas > > > > -Original Message- > > From: Jason Dillaman > > Sent: Friday, April 5, 2019 2:24 PM > > To: Vikas Rana > > Cc: ceph-users > > Subject: Re: [ceph-users] Ceph Replication not working > > > > What is the version of rbd-mirror daemon and your OSDs? It looks it found > > two replicated images and got stuck on the "wait_for_deletion" > > step. Since I suspect those images haven't been deleted, it should have > > immediately proceeded to the next step of the image replay state machine. > > Are there any additional log messages after 2019-04-05 12:07:29.981203? > > > > On Fri, Apr 5, 2019 at 1:56 PM Vikas Rana wrote: > > > > > > Hi there, > > > > > > We are trying to setup a rbd-mirror replication and after the setup, > > > everything looks good but images are not replicating. > > > > > > > > > > > > Can some please please help? > > > > > > > > > > > > Thanks, > > > > > > -Vikas > > > > > > > > > > > > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool info nfs > > > > > > Mode: pool > > > > > > Peers: > > > > > > UUID NAME CLIENT > > > > > > bcd54bc5-cd08-435f-a79a-357bce55011d ceph client.mirrorprod > > > > > > > > > > > > root@local:/etc/ceph# rbd mirror pool info nfs > > > > > > Mode: pool > > > > > > Peers: > > > > > > UUID NAME CLIENT > > > > > > 612151cf-f70d-49d0-94e2-a7b850a53e4f cephdr client.mirrordr > > > > > > > > > > > > > > > > > > root@local:/etc/ceph# rbd info nfs/test01 > > > > > > rbd image 'test01': > > > > > > size 102400 kB in 25 objects > > > > > > order 22 (4096 kB objects) > > > > > > block_name_prefix: rbd_data.11cd3c238e1f29 > > > > > > format: 2 > > > > > > features: layering, exclusive-lock, object-map, fast-diff, > > > deep-flatten, journaling > > > > > > flags: > > > > > > journal: 11cd3c238e1f29 > > > > > > mirroring state: enabled > > > > > > mirroring global id: 06fbfe68-b7e4-4d3a-93b2-cd18c569f7f7 > > > > > > mirroring primary: true > > > > > > > > > > > > > > > > > > root@remote:/var/log/ceph# rbd --cluster cephdr mirror pool status > > > nfs --verbose > > > > > > health: OK > > > > > > images: 0 total > > > > > > > > > > > > root@remote:/var/log/ceph# rbd info nfs/test01 > > > > > > rbd: error opening image test01: (2) No such file or directory > > > > > > > > > > > > > > > > > > root@remote:/var/log/ceph# ceph -s --cluster cephdr > > > > > > cluster: > > > > > > id: ade49174-1f84-4c3c-a93c-b293c3655c93 > > > > > > health: HEALTH_WARN > > > > > > noout,nodeep-scrub flag(s) set > > > > > > > > > > > > services: > > > > > > mon:3 daemons, quorum nidcdvtier1a,nidcdvtier2a,nidcdvtier3a > > > > > > mgr:
Re: [ceph-users] Unable to list rbd block > images in nautilus dashboard
It's definitely ceph-mgr that is struggling here. It uses 100% of a cpu for for several tens of seconds and reports the followinf in its log a few times before anything gets displayed Traceback (most recent call last): File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 88, in dashboard_exception_handler return handler(*args, **kwargs) File "/usr/lib64/python2.7/site-packages/cherrypy/_cpdispatch.py", line 54, in __call__ return self.callable(*self.args, **self.kwargs) File "/usr/local/share/ceph/mgr/dashboard/controllers/__init__.py", line 649, in inner ret = func(*args, **kwargs) File "/usr/local/share/ceph/mgr/dashboard/controllers/__init__.py", line 842, in wrapper return func(*vpath, **params) File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 44, in wrapper return f(*args, **kwargs) File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 44, in wrapper return f(*args, **kwargs) File "/usr/local/share/ceph/mgr/dashboard/controllers/rbd.py", line 270, in list return self._rbd_list(pool_name) File "/usr/local/share/ceph/mgr/dashboard/controllers/rbd.py", line 261, in _rbd_list status, value = self._rbd_pool_list(pool) File "/usr/local/share/ceph/mgr/dashboard/tools.py", line 244, in wrapper return rvc.run(fn, args, kwargs) File "/usr/local/share/ceph/mgr/dashboard/tools.py", line 232, in run raise ViewCacheNoDataException() ViewCacheNoDataException: ViewCache: unable to retrieve data - On 5 Apr, 2019, at 5:06 PM, Wes Cilldhaire w...@sol1.com.au wrote: > Hi Lenz, > > Thanks for responding. I suspected that the number of rbd images might have > had > something to do with it so I cleaned up old disposable VM images I am no > longer > using, taking the list down from ~30 to 16, 2 in the EC pool on hdds and the > rest on the replicated ssd pool. They vary in size from 50GB to 200GB, I don't > have the # of objects per rbd on hand right now but maybe this is a factor as > well, particularly with 'du'. This doesn't appear to have made a difference in > the time and number of attempts required to list them in the dashboard. > > I suspect it might be a case of 'du on all images is always going to take > longer > than the current dashboard timeout', in which case the behaviour of the > dashboard might possibly need to change to account for this, maybe fetch and > listt the images in parallel and asynchronously or something. As it stand it > means the dashboard isn't really usable for managing existing images, which is > a shame because having that ability makes ceph accessible to our clients who > are considering it and begins affording some level of self-service for them - > one of the reasons we've been really excited for Mimic's release actually. I > really hope I've just done something wrong :) > > I'll try to isolate which process the delay is coming from tonight as well as > collecting other useful metrics when I'm back on that network tonight. > > Thanks, > Wes > > (null) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Unable to list rbd block > images in nautilus dashboard
Hi Wes, I just filed a bug ticket in the Ceph tracker about this: http://tracker.ceph.com/issues/39140 Will work on a solution ASAP. Thanks, Ricardo Dias On 08/04/19 15:41, Wes Cilldhaire wrote: > It's definitely ceph-mgr that is struggling here. It uses 100% of a cpu for > for several tens of seconds and reports the followinf in its log a few times > before anything gets displayed > > Traceback (most recent call last): > File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 88, in > dashboard_exception_handler > return handler(*args, **kwargs) > File "/usr/lib64/python2.7/site-packages/cherrypy/_cpdispatch.py", line 54, > in __call__ > return self.callable(*self.args, **self.kwargs) > File "/usr/local/share/ceph/mgr/dashboard/controllers/__init__.py", line 649, > in inner > ret = func(*args, **kwargs) > File "/usr/local/share/ceph/mgr/dashboard/controllers/__init__.py", line 842, > in wrapper > return func(*vpath, **params) > File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 44, in > wrapper > return f(*args, **kwargs) > File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 44, in > wrapper > return f(*args, **kwargs) > File "/usr/local/share/ceph/mgr/dashboard/controllers/rbd.py", line 270, in > list > return self._rbd_list(pool_name) > File "/usr/local/share/ceph/mgr/dashboard/controllers/rbd.py", line 261, in > _rbd_list > status, value = self._rbd_pool_list(pool) > File "/usr/local/share/ceph/mgr/dashboard/tools.py", line 244, in wrapper > return rvc.run(fn, args, kwargs) > File "/usr/local/share/ceph/mgr/dashboard/tools.py", line 232, in run > raise ViewCacheNoDataException() > ViewCacheNoDataException: ViewCache: unable to retrieve data > > - On 5 Apr, 2019, at 5:06 PM, Wes Cilldhaire w...@sol1.com.au wrote: > >> Hi Lenz, >> >> Thanks for responding. I suspected that the number of rbd images might have >> had >> something to do with it so I cleaned up old disposable VM images I am no >> longer >> using, taking the list down from ~30 to 16, 2 in the EC pool on hdds and the >> rest on the replicated ssd pool. They vary in size from 50GB to 200GB, I >> don't >> have the # of objects per rbd on hand right now but maybe this is a factor as >> well, particularly with 'du'. This doesn't appear to have made a difference >> in >> the time and number of attempts required to list them in the dashboard. >> >> I suspect it might be a case of 'du on all images is always going to take >> longer >> than the current dashboard timeout', in which case the behaviour of the >> dashboard might possibly need to change to account for this, maybe fetch and >> listt the images in parallel and asynchronously or something. As it stand it >> means the dashboard isn't really usable for managing existing images, which >> is >> a shame because having that ability makes ceph accessible to our clients who >> are considering it and begins affording some level of self-service for them - >> one of the reasons we've been really excited for Mimic's release actually. I >> really hope I've just done something wrong :) >> >> I'll try to isolate which process the delay is coming from tonight as well as >> collecting other useful metrics when I'm back on that network tonight. >> >> Thanks, >> Wes >> >> > (null) > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > -- Ricardo Dias Senior Software Engineer - Storage Team SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg) signature.asc Description: OpenPGP digital signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Unable to list rbd block > images in nautilus dashboard
Thank you - On 9 Apr, 2019, at 12:50 AM, Ricardo Dias rd...@suse.com wrote: > Hi Wes, > > I just filed a bug ticket in the Ceph tracker about this: > > http://tracker.ceph.com/issues/39140 > > Will work on a solution ASAP. > > Thanks, > Ricardo Dias > > On 08/04/19 15:41, Wes Cilldhaire wrote: >> It's definitely ceph-mgr that is struggling here. It uses 100% of a cpu for >> for >> several tens of seconds and reports the followinf in its log a few times >> before >> anything gets displayed >> >> Traceback (most recent call last): >> File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 88, in >> dashboard_exception_handler >> return handler(*args, **kwargs) >> File "/usr/lib64/python2.7/site-packages/cherrypy/_cpdispatch.py", line 54, >> in >> __call__ >> return self.callable(*self.args, **self.kwargs) >> File "/usr/local/share/ceph/mgr/dashboard/controllers/__init__.py", line >> 649, in >> inner >> ret = func(*args, **kwargs) >> File "/usr/local/share/ceph/mgr/dashboard/controllers/__init__.py", line >> 842, in >> wrapper >> return func(*vpath, **params) >> File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 44, in >> wrapper >> return f(*args, **kwargs) >> File "/usr/local/share/ceph/mgr/dashboard/services/exception.py", line 44, in >> wrapper >> return f(*args, **kwargs) >> File "/usr/local/share/ceph/mgr/dashboard/controllers/rbd.py", line 270, in >> list >> return self._rbd_list(pool_name) >> File "/usr/local/share/ceph/mgr/dashboard/controllers/rbd.py", line 261, in >> _rbd_list >> status, value = self._rbd_pool_list(pool) >> File "/usr/local/share/ceph/mgr/dashboard/tools.py", line 244, in wrapper >> return rvc.run(fn, args, kwargs) >> File "/usr/local/share/ceph/mgr/dashboard/tools.py", line 232, in run >> raise ViewCacheNoDataException() >> ViewCacheNoDataException: ViewCache: unable to retrieve data >> >> - On 5 Apr, 2019, at 5:06 PM, Wes Cilldhaire w...@sol1.com.au wrote: >> >>> Hi Lenz, >>> >>> Thanks for responding. I suspected that the number of rbd images might have >>> had >>> something to do with it so I cleaned up old disposable VM images I am no >>> longer >>> using, taking the list down from ~30 to 16, 2 in the EC pool on hdds and the >>> rest on the replicated ssd pool. They vary in size from 50GB to 200GB, I >>> don't >>> have the # of objects per rbd on hand right now but maybe this is a factor >>> as >>> well, particularly with 'du'. This doesn't appear to have made a difference >>> in >>> the time and number of attempts required to list them in the dashboard. >>> >>> I suspect it might be a case of 'du on all images is always going to take >>> longer >>> than the current dashboard timeout', in which case the behaviour of the >>> dashboard might possibly need to change to account for this, maybe fetch and >>> listt the images in parallel and asynchronously or something. As it stand it >>> means the dashboard isn't really usable for managing existing images, which >>> is >>> a shame because having that ability makes ceph accessible to our clients who >>> are considering it and begins affording some level of self-service for them >>> - >>> one of the reasons we've been really excited for Mimic's release actually. I >>> really hope I've just done something wrong :) >>> >>> I'll try to isolate which process the delay is coming from tonight as well >>> as >>> collecting other useful metrics when I'm back on that network tonight. >>> >>> Thanks, >>> Wes >>> >>> >> (null) >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > > -- > Ricardo Dias > Senior Software Engineer - Storage Team > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, > HRB 21284 > (AG Nürnberg) > > > ___ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com (null) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] DevConf US CFP Ends Today + Planning
Hey everyone, The CFP for DevConf US [1] ends today! I have submitted for us to have a Ceph Foundation booth, BOF space and two presentations myself which you can find on our CFP coordination pad [2]. I'll update here if our booth is accepted and a call for help. If you're planning on attending and want to help with Ceph's presence, please email me directly so I can make sure you're part of any communication. Looking forward to potentially meeting more people in the community! [1] - https://devconf.info/us/2019 [2] - https://pad.ceph.com/p/cfp-coordination -- Mike Perez (thingee) ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] radosgw cloud sync aws s3 auth failed
On Mon, Apr 08, 2019 at 06:38:59PM +0800, 黄明友 wrote: > > hi,all > >I had test the cloud sync module in radosgw. ceph verion is >13.2.5 , git commit id is >cbff874f9007f1869bfd3821b7e33b2a6ffd4988; Reading src/rgw/rgw_rest_client.cc shows that it only generates v2 signatures for the sync module :-( AWS China regions are some of the v4-only regions. I don't know of any current work to tackle this, but there is v4 signature generation code already in the codebase, would just need to be wired up in src/rgw/rgw_rest_client.cc. -- Robin Hugh Johnson Gentoo Linux: Dev, Infra Lead, Foundation Treasurer E-Mail : robb...@gentoo.org GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85 GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136 signature.asc Description: PGP signature ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] bluefs-bdev-expand experience
Hi Yuri, both issues from Round 2 relate to unsupported expansion for main device. In fact it doesn't work and silently bypasses the operation in you case. Please try with a different device... Also I've just submitted a PR for mimic to indicate the bypass, will backport to Luminous once mimic patch is approved. See https://github.com/ceph/ceph/pull/27447 Thanks, Igor On 4/5/2019 4:07 PM, Yury Shevchuk wrote: On Fri, Apr 05, 2019 at 02:42:53PM +0300, Igor Fedotov wrote: wrt Round 1 - an ability to expand block(main) device has been added to Nautilus, see: https://github.com/ceph/ceph/pull/25308 Oh, that's good. But still separate wal&db may be good for studying load on each volume (blktrace) or moving db/wal to another physical disk by means of LVM transparently to ceph. wrt Round 2: - not setting 'size' label looks like a bug although I recall I fixed it... Will double check. - wrong stats output is probably related to the lack of monitor restart - could you please try that and report back if it helps? Or even restart the whole cluster.. (well I understand that's a bad approach for production but just to verify my hypothesis) Mon restart didn't help: node0:~# systemctl restart ceph-mon@0 node1:~# systemctl restart ceph-mon@1 node2:~# systemctl restart ceph-mon@2 node2:~# ceph osd df ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS 0 hdd 0.22739 1.0 233GiB 9.44GiB 223GiB 4.06 0.12 128 1 hdd 0.22739 1.0 233GiB 9.44GiB 223GiB 4.06 0.12 128 3 hdd 0.227390 0B 0B 0B 00 0 2 hdd 0.22739 1.0 800GiB 409GiB 391GiB 51.18 1.51 128 TOTAL 1.24TiB 428GiB 837GiB 33.84 MIN/MAX VAR: 0.12/1.51 STDDEV: 26.30 Restarting mgrs and then all ceph daemons on all three nodes didn't help either: node2:~# ceph osd df ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS 0 hdd 0.22739 1.0 233GiB 9.43GiB 223GiB 4.05 0.12 128 1 hdd 0.22739 1.0 233GiB 9.43GiB 223GiB 4.05 0.12 128 3 hdd 0.227390 0B 0B 0B 00 0 2 hdd 0.22739 1.0 800GiB 409GiB 391GiB 51.18 1.51 128 TOTAL 1.24TiB 428GiB 837GiB 33.84 MIN/MAX VAR: 0.12/1.51 STDDEV: 26.30 Maybe we should upgrade to v14.2.0 Nautilus instead of studying old bugs... after all, this is a toy cluster for now. Thank you for responding, -- Yury On 4/5/2019 2:06 PM, Yury Shevchuk wrote: Hello all! We have a toy 3-node Ceph cluster running Luminous 12.2.11 with one bluestore osd per node. We started with pretty small OSDs and would like to be able to expand OSDs whenever needed. We had two issues with the expansion: one turned out user-serviceable while the other probably needs developers' look. I will describe both shortly. Round 1 ~~~ Trying to expand osd.2 by 1TB: # lvextend -L+1T /dev/vg0/osd2 Size of logical volume vg0/osd2 changed from 232.88 GiB (59618 extents) to 1.23 TiB (321762 extents). Logical volume vg0/osd2 successfully resized. # ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-2 inferring bluefs devices from bluestore path slot 1 /var/lib/ceph/osd/ceph-2//block 1 : size 0x13a3880 : own 0x[1bf220~25430] Expanding... 1 : can't be expanded. Bypassing... # It didn't work. The explaination can be found in ceph/src/os/bluestore/BlueFS.cc at line 310: // returns true if specified device is under full bluefs control // and hence can be expanded bool BlueFS::is_device_expandable(unsigned id) { if (id >= MAX_BDEV || bdev[id] == nullptr) { return false; } switch(id) { case BDEV_WAL: return true; case BDEV_DB: // true if DB volume is non-shared return bdev[BDEV_SLOW] != nullptr; } return false; } So we have to use separate block.db and block.wal for OSD to be expandable. Indeed, our OSDs were created without separate block.db and block.wal, like this: ceph-volume lvm create --bluestore --data /dev/vg0/osd2 Recreating osd.2 with separate block.db and block.wal: # ceph-volume lvm zap --destroy --osd-id 2 # lvcreate -L1G -n osd2wal vg0 Logical volume "osd2wal" created. # lvcreate -L40G -n osd2db vg0 Logical volume "osd2db" created. # lvcreate -L400G -n osd2 vg0 Logical volume "osd2" created. # ceph-volume lvm create --osd-id 2 --bluestore --data vg0/osd2 --block.db vg0/osd2db --block.wal vg0/osd2wal Resync takes some time, and then we have expandable osd.2. Round 2 ~~~ Trying to expand osd.2 from 400G to 700G: # lvextend -L+300G /dev/vg0/osd2 Size of logical volume vg0/osd2 changed from 400.00 GiB (102400 extents) to 700.00 GiB (179200 extents). Logical volume vg0/osd2 successfully resized. # ceph-bluestore-tool bluefs-bdev-expand --path /var/lib/ceph/osd/ceph-2/ inferring bluefs devices from bluestore
Re: [ceph-users] osd_memory_target exceeding on Luminous OSD BlueStore
One of the difficulties with the osd_memory_target work is that we can't tune based on the RSS memory usage of the process. Ultimately it's up to the kernel to decide to reclaim memory and especially with transparent huge pages it's tough to judge what the kernel is going to do even if memory has been unmapped by the process. Instead the autotuner looks at how much memory has been mapped and tries to balance the caches based on that. In addition to Dan's advice, you might also want to enable debug bluestore at level 5 and look for lines containing "target:" and "cache_size:". These will tell you the current target, the mapped memory, unmapped memory, heap size, previous aggregate cache size, and new aggregate cache size. The other line will give you a break down of how much memory was assigned to each of the bluestore caches and how much each case is using. If there is a memory leak, the autotuner can only do so much. At some point it will reduce the caches to fit within cache_min and leave it there. Mark On 4/8/19 5:18 AM, Dan van der Ster wrote: Which OS are you using? With CentOS we find that the heap is not always automatically released. (You can check the heap freelist with `ceph tell osd.0 heap stats`). As a workaround we run this hourly: ceph tell mon.* heap release ceph tell osd.* heap release ceph tell mds.* heap release -- Dan On Sat, Apr 6, 2019 at 1:30 PM Olivier Bonvalet wrote: Hi, on a Luminous 12.2.11 deploiement, my bluestore OSD exceed the osd_memory_target : daevel-ob@ssdr712h:~$ ps auxw | grep ceph-osd ceph3646 17.1 12.0 6828916 5893136 ? Ssl mars29 1903:42 /usr/bin/ceph-osd -f --cluster ceph --id 143 --setuser ceph --setgroup ceph ceph3991 12.9 11.2 6342812 5485356 ? Ssl mars29 1443:41 /usr/bin/ceph-osd -f --cluster ceph --id 144 --setuser ceph --setgroup ceph ceph4361 16.9 11.8 6718432 5783584 ? Ssl mars29 1889:41 /usr/bin/ceph-osd -f --cluster ceph --id 145 --setuser ceph --setgroup ceph ceph4731 19.7 12.2 6949584 5982040 ? Ssl mars29 2198:47 /usr/bin/ceph-osd -f --cluster ceph --id 146 --setuser ceph --setgroup ceph ceph5073 16.7 11.6 6639568 5701368 ? Ssl mars29 1866:05 /usr/bin/ceph-osd -f --cluster ceph --id 147 --setuser ceph --setgroup ceph ceph5417 14.6 11.2 6386764 5519944 ? Ssl mars29 1634:30 /usr/bin/ceph-osd -f --cluster ceph --id 148 --setuser ceph --setgroup ceph ceph5760 16.9 12.0 6806448 5879624 ? Ssl mars29 1882:42 /usr/bin/ceph-osd -f --cluster ceph --id 149 --setuser ceph --setgroup ceph ceph6105 16.0 11.6 6576336 5694556 ? Ssl mars29 1782:52 /usr/bin/ceph-osd -f --cluster ceph --id 150 --setuser ceph --setgroup ceph daevel-ob@ssdr712h:~$ free -m totalusedfree shared buff/cache available Mem: 47771 452101643 17 917 43556 Swap: 0 0 0 # ceph daemon osd.147 config show | grep memory_target "osd_memory_target": "4294967296", And there is no recovery / backfilling, the cluster is fine : $ ceph status cluster: id: de035250-323d-4cf6-8c4b-cf0faf6296b1 health: HEALTH_OK services: mon: 5 daemons, quorum tolriq,tsyne,olkas,lorunde,amphel mgr: tsyne(active), standbys: olkas, tolriq, lorunde, amphel osd: 120 osds: 116 up, 116 in data: pools: 20 pools, 12736 pgs objects: 15.29M objects, 31.1TiB usage: 101TiB used, 75.3TiB / 177TiB avail pgs: 12732 active+clean 4 active+clean+scrubbing+deep io: client: 72.3MiB/s rd, 26.8MiB/s wr, 2.30kop/s rd, 1.29kop/s wr On an other host, in the same pool, I see also high memory usage : daevel-ob@ssdr712g:~$ ps auxw | grep ceph-osd ceph6287 6.6 10.6 6027388 5190032 ? Ssl mars21 1511:07 /usr/bin/ceph-osd -f --cluster ceph --id 131 --setuser ceph --setgroup ceph ceph6759 7.3 11.2 6299140 5484412 ? Ssl mars21 1665:22 /usr/bin/ceph-osd -f --cluster ceph --id 132 --setuser ceph --setgroup ceph ceph7114 7.0 11.7 6576168 5756236 ? Ssl mars21 1612:09 /usr/bin/ceph-osd -f --cluster ceph --id 133 --setuser ceph --setgroup ceph ceph7467 7.4 11.1 6244668 5430512 ? Ssl mars21 1704:06 /usr/bin/ceph-osd -f --cluster ceph --id 134 --setuser ceph --setgroup ceph ceph7821 7.7 11.1 6309456 5469376 ? Ssl mars21 1754:35 /usr/bin/ceph-osd -f --cluster ceph --id 135 --setuser ceph --setgroup ceph ceph8174 6.9 11.6 6545224 5705412 ? Ssl mars21 1590:31 /usr/bin/ceph-osd -f --cluster ceph --id 136 --setuser ceph --setgroup ceph ceph8746 6.6 11.1 6290004 5477204 ? Ssl mars21 1511:11 /usr/bin/ceph-osd -f --cluster ceph --id 137 --setuser ceph --setgroup ceph ceph9100 7.7 11.6 6552080 5713560 ? Ssl mars
Re: [ceph-users] PGs stuck in created state
Hello Simon, Another idea is to increase choose_total_tries. Hth Mehmet Am 7. März 2019 09:56:17 MEZ schrieb Martin Verges : >Hello, > >try restarting every osd if possible. >Upgrade to a recent ceph version. > >-- >Martin Verges >Managing director > >Mobile: +49 174 9335695 >E-Mail: martin.ver...@croit.io >Chat: https://t.me/MartinVerges > >croit GmbH, Freseniusstr. 31h, 81247 Munich >CEO: Martin Verges - VAT-ID: DE310638492 >Com. register: Amtsgericht Munich HRB 231263 > >Web: https://croit.io >YouTube: https://goo.gl/PGE1Bx > > >Am Do., 7. März 2019 um 08:39 Uhr schrieb simon falicon < >simonfali...@gmail.com>: > >> Hello Ceph Users, >> >> I have an issue with my ceph cluster, after one serious fail in four >SSD >> (electricaly dead) I have lost PGs (and replicats) and I have 14 Pgs >stuck. >> >> So for correct it I have try to force create this PGs (with same IDs) >but >> now the Pgs stuck in creating state -_-" : >> >> ~# ceph -s >> health HEALTH_ERR >> 14 pgs are stuck inactive for more than 300 seconds >> >> >> ceph pg dump | grep creating >> >> dumped all in format plain >> 9.300000000creating2019-02-25 >09:32:12.3339790'00:0[20,26]20[20,11]200'0 >2019-02-25 09:32:12.3339790'02019-02-25 09:32:12.333979 >> 3.900000000creating2019-02-25 >09:32:11.2954510'00:0[16,39]16[17,6]170'0 >2019-02-25 09:32:11.2954510'02019-02-25 09:32:11.295451 >> ... >> >> I have try to create new PG dosent existe before and it work, but for >this >> PG stuck in creating state. >> >> In my monitor logs I have this message: >> >> 2019-02-25 11:02:46.904897 7f5a371ed700 0 mon.controller1@1(peon) e7 >handle_command mon_command({"prefix": "pg force_create_pg", "pgid": >"4.20e"} v 0) v1 >> 2019-02-25 11:02:46.904938 7f5a371ed700 0 log_channel(audit) log >[INF] : from='client.? 172.31.101.107:0/3101034432' >entity='client.admin' cmd=[{"prefix": "pg force_create_pg", "pgid": >"4.20e"}]: dispatch >> >> When I check map I have: >> >> ~# ceph pg map 4.20e >> osdmap e428069 pg 4.20e (4.20e) -> up [27,37,36] acting [13,17] >> >> I have restart OSD 27,37,36,13 and 17 but no effect. (one by one) >> >> I have see this issue http://tracker.ceph.com/issues/18298 but I run >on >> ceph 10.2.11. >> >> So could you help me please ? >> >> Many thanks by advance, >> Sfalicon. >> >> ___ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Inconsistent PGs caused by omap_digest mismatch
We have two separate RGW clusters running Luminous (12.2.8) that have started seeing an increase in PGs going active+clean+inconsistent with the reason being caused by an omap_digest mismatch. Both clusters are using FileStore and the inconsistent PGs are happening on the .rgw.buckets.index pool which was moved from HDDs to SSDs within the last few months. We've been repairing them by first making sure the odd omap_digest is not the primary by setting the primary-affinity to 0 if needed, doing the repair, and then setting the primary-affinity back to 1. For example PG 7.3 went inconsistent earlier today: # rados list-inconsistent-obj 7.3 -f json-pretty | jq -r '.inconsistents[] | .errors, .shards' [ "omap_digest_mismatch" ] [ { "osd": 504, "primary": true, "errors": [], "size": 0, "omap_digest": "0x4c10ee76", "data_digest": "0x" }, { "osd": 525, "primary": false, "errors": [], "size": 0, "omap_digest": "0x26a1241b", "data_digest": "0x" }, { "osd": 556, "primary": false, "errors": [], "size": 0, "omap_digest": "0x26a1241b", "data_digest": "0x" } ] Since the odd omap_digest is on osd.504 and osd.504 is the primary, we would set the primary-affinity to 0 with: # ceph osd primary-affinity osd.504 0 Do the repair: # ceph pg repair 7.3 And then once the repair is complete we would set the primary-affinity back to 1 on osd.504: # ceph osd primary-affinity osd.504 1 There doesn't appear to be any correlation between the OSDs which would point to a hardware issue, and since it's happening on two different clusters I'm wondering if there's a race condition that has been fixed in a later version? Also, what exactly is the omap digest? From what I can tell it appears to be some kind of checksum for the omap data. Is that correct? Thanks, Bryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch
On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell wrote: > > We have two separate RGW clusters running Luminous (12.2.8) that have started > seeing an increase in PGs going active+clean+inconsistent with the reason > being caused by an omap_digest mismatch. Both clusters are using FileStore > and the inconsistent PGs are happening on the .rgw.buckets.index pool which > was moved from HDDs to SSDs within the last few months. > > We've been repairing them by first making sure the odd omap_digest is not the > primary by setting the primary-affinity to 0 if needed, doing the repair, and > then setting the primary-affinity back to 1. > > For example PG 7.3 went inconsistent earlier today: > > # rados list-inconsistent-obj 7.3 -f json-pretty | jq -r '.inconsistents[] | > .errors, .shards' > [ > "omap_digest_mismatch" > ] > [ > { > "osd": 504, > "primary": true, > "errors": [], > "size": 0, > "omap_digest": "0x4c10ee76", > "data_digest": "0x" > }, > { > "osd": 525, > "primary": false, > "errors": [], > "size": 0, > "omap_digest": "0x26a1241b", > "data_digest": "0x" > }, > { > "osd": 556, > "primary": false, > "errors": [], > "size": 0, > "omap_digest": "0x26a1241b", > "data_digest": "0x" > } > ] > > Since the odd omap_digest is on osd.504 and osd.504 is the primary, we would > set the primary-affinity to 0 with: > > # ceph osd primary-affinity osd.504 0 > > Do the repair: > > # ceph pg repair 7.3 > > And then once the repair is complete we would set the primary-affinity back > to 1 on osd.504: > > # ceph osd primary-affinity osd.504 1 > > There doesn't appear to be any correlation between the OSDs which would point > to a hardware issue, and since it's happening on two different clusters I'm > wondering if there's a race condition that has been fixed in a later version? > > Also, what exactly is the omap digest? From what I can tell it appears to be > some kind of checksum for the omap data. Is that correct? Yeah; it's just a crc over the omap key-value data that's checked during deep scrub. Same as the data digest. I've not noticed any issues around this in Luminous but I probably wouldn't have, so will have to leave it up to others if there are fixes in since 12.2.8. -Greg ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch
> On Apr 8, 2019, at 4:38 PM, Gregory Farnum wrote: > > On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell wrote: >> >> There doesn't appear to be any correlation between the OSDs which would >> point to a hardware issue, and since it's happening on two different >> clusters I'm wondering if there's a race condition that has been fixed in a >> later version? >> >> Also, what exactly is the omap digest? From what I can tell it appears to >> be some kind of checksum for the omap data. Is that correct? > > Yeah; it's just a crc over the omap key-value data that's checked > during deep scrub. Same as the data digest. > > I've not noticed any issues around this in Luminous but I probably > wouldn't have, so will have to leave it up to others if there are > fixes in since 12.2.8. Thanks for adding some clarity to that Greg! For some added information, this is what the logs reported earlier today: 2019-04-08 11:46:15.610169 osd.504 osd.504 10.16.10.30:6804/8874 33 : cluster [ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 0x26a1241b != omap_digest 0x4c10ee76 from shard 504 2019-04-08 11:46:15.610190 osd.504 osd.504 10.16.10.30:6804/8874 34 : cluster [ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 0x26a1241b != omap_digest 0x4c10ee76 from shard 504 I then tried deep scrubbing it again to see if the data was fine, but the digest calculation was just having problems. It came back with the same problem with new digest values: 2019-04-08 15:56:21.186291 osd.504 osd.504 10.16.10.30:6804/8874 49 : cluster [ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 0x93bac8f != omap_digest 0 xab1b9c6f from shard 504 2019-04-08 15:56:21.186313 osd.504 osd.504 10.16.10.30:6804/8874 50 : cluster [ERR] 7.3 : soid 7:c09d46a1:::.dir.default.22333615.1861352:head omap_digest 0x93bac8f != omap_digest 0 xab1b9c6f from shard 504 Which makes sense, but doesn’t explain why the omap data is getting out of sync across multiple OSDs and clusters… I’ll see what I can figure out tomorrow, but if anyone else has some hints I would love to hear them. Thanks, Bryan ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
[ceph-users] Try to log the IP in the header X-Forwarded-For with radosgw behind haproxy
Hi @all, I'm using Ceph rados gateway installed via ceph-ansible with the Nautilus version. The radosgw are behind a haproxy which add these headers (checked via tcpdump): X-Forwarded-Proto: http X-Forwarded-For: 10.111.222.55 where 10.111.222.55 is the IP address of the client. The radosgw use the civetweb http frontend. Currently, this is the IP address of the haproxy itself which is mentioned in logs. I would like to mention the IP address from the X-Forwarded-For HTTP header. How to do that? I have tried this option in ceph.conf: rgw_remote_addr_param = X-Forwarded-For It doesn't work but maybe I have read the doc wrongly. Thx in advance for your help. PS: I have tried too the http frontend "beast" but, in this case, no HTTP request seems to be logged. -- François ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Re: [ceph-users] Try to log the IP in the header X-Forwarded-For with radosgw behind haproxy
Refer "rgw log http headers" under http://docs.ceph.com/docs/nautilus/radosgw/config-ref/ Or even better in the code https://github.com/ceph/ceph/pull/7639 Thanks, -Pavan. On 4/8/19, 8:32 PM, "ceph-users on behalf of Francois Lafont" wrote: Hi @all, I'm using Ceph rados gateway installed via ceph-ansible with the Nautilus version. The radosgw are behind a haproxy which add these headers (checked via tcpdump): X-Forwarded-Proto: http X-Forwarded-For: 10.111.222.55 where 10.111.222.55 is the IP address of the client. The radosgw use the civetweb http frontend. Currently, this is the IP address of the haproxy itself which is mentioned in logs. I would like to mention the IP address from the X-Forwarded-For HTTP header. How to do that? I have tried this option in ceph.conf: rgw_remote_addr_param = X-Forwarded-For It doesn't work but maybe I have read the doc wrongly. Thx in advance for your help. PS: I have tried too the http frontend "beast" but, in this case, no HTTP request seems to be logged. -- François ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com