[ceph-users] how to handle incomplete PGs

2020-08-14 Thread huxia...@horebdata.cn
Dear Ceph folks, Recently i encountered incomplete PGs when replacing an OSD node with new handware. I noticed multiple OSD ups and downs, and eventually a few PGs got stucked at PG incomplete status. Questions 1: is there a reliable way to avoid the occurence of incomplete PGs?

[ceph-users] Re: CephFS clients waiting for lock when one of them goes slow

2020-08-14 Thread Yan, Zheng
On Sat, Aug 15, 2020 at 12:32 AM wrote: > > Yes, I've seen this problem quite frequently as of late, running v13.2.10 > MDS. It seems to be dependent on the client behavior - a lot of xlock > contention on some directory, although it's hard to pin down which client is > doing what. The only rem

[ceph-users] Re: Nautilus slow using "ceph tell osd.* bench"

2020-08-14 Thread Jim Forde
Solution Failed! I rebalanced all osd's to 0.0 and then back to their original weight, and started getting back to my original ~269 IOPS. It has been about 5 days since I completed the re-balance and performance is degrading again! There is a bit of improvement but not to where it was in Mimic.

[ceph-users] Re: Can't add OSD id in manual deploy

2020-08-14 Thread Eugen Block
Usually it should also accept the device path (although I haven't tried that in Octopus yet), you could try `ceph-volume lvm prepare --data /path/to/device` first and then activate it. If that doesn't work, try to create a vg and lv and try it with LVM syntax (ceph-volume lvm prepare --data

[ceph-users] Re: Can't add OSD id in manual deploy

2020-08-14 Thread Joshua Schaeffer
On 8/14/20 10:57 AM, Eugen Block wrote: > I didn't notice that. Have you tried this multiple times with the same disk? > Do you see any other error messages in syslog? Thanks Eugen for your fast response. Yes, I have tried it multiple times, but I'm trying again right now just to be sure the ou

[ceph-users] RGW Lifecycle Processing and Promote Master Process

2020-08-14 Thread Alex Hussein-Kershaw
Hi, I've previously discussed some issues I've had with the RGW lifecycle processing. I've discovered that the root cause of my problem is that: * I'm running a multisite configuration * Life cycle processing is done on the master site each night. `radosgw-admin lc list` correctly re

[ceph-users] Re: Resolving a pg inconsistent Issue

2020-08-14 Thread Eugen Block
Hi, I'm not sure what SUSE Support would suggest (you probably should be able to open a case?) but I'd probably go for nodeep-scub flag, wait for the already started deep-scrubs to finish and then trigger a pg repair and a deep-scrub on the pg if the repair doesn't resolve it. This should

[ceph-users] Re: Radosgw Multiside Sync

2020-08-14 Thread Vladimir Sigunov
As of my understanding - no, you shouldn't if you are running full site sync. Your system user (zone.user) has full access and this account should take care about everything. You should enlist particular buckets (users) only for per-bucket sync flows. Vladimir -Original Message-From: Ansgar

[ceph-users] Re: Radosgw Multiside Sync

2020-08-14 Thread Ansgar Jazdzewski
Hi, it looks like that only buckets from my sub-tenant-user are not in sync: <...> radosgw-admin --tenant tmp --uid test --display-name "Test User" --access_key 1VIH8RUV7OD5I3IWFX5H --secret 0BvSbieeHhKi7gLHyN8zsVPHIzEFRwEXZwgj0u22 user create <..> Do I have to create a new group/flow/pipe for ea

[ceph-users] Re: Can't add OSD id in manual deploy

2020-08-14 Thread Eugen Block
I didn't notice that. Have you tried this multiple times with the same disk? Do you see any other error messages in syslog? Zitat von Joshua Schaeffer : The OSD node is the same as the monitor and manager node at the moment and has the ceph.conf file: user@node1:~$ ls -l /etc/ceph/ total

[ceph-users] Resolving a pg inconsistent Issue

2020-08-14 Thread Steven Pine
Hi Ceph users, We are running a suse SES 5.5 cluster that's largely based on luminous with some mimic backports. We've been doing some large reshuffling from adding in additional OSDs and during this process we have an inconsistent pg group, investigation suggests there was a read error. We woul

[ceph-users] Re: Can't add OSD id in manual deploy

2020-08-14 Thread Joshua Schaeffer
The OSD node is the same as the monitor and manager node at the moment and has the ceph.conf file: user@node1:~$ ls -l /etc/ceph/ total 15 -rw--- 1 root root 151 Aug 13 15:50 ceph.client.admin.keyring -rw-r--r-- 1 root root 432 Aug 13 16:09 ceph.conf -rw-r--r-- 1 root root  92 Jun 30 16:44 rb

[ceph-users] Re: CephFS clients waiting for lock when one of them goes slow

2020-08-14 Thread pchoi
Yes, I've seen this problem quite frequently as of late, running v13.2.10 MDS. It seems to be dependent on the client behavior - a lot of xlock contention on some directory, although it's hard to pin down which client is doing what. The only remedy was to fail over the MDS. 1k - 4k clients 2M r

[ceph-users] Re: Can't add OSD id in manual deploy

2020-08-14 Thread Eugen Block
The OSD node also needs the ceph.conf, it seems that is not the case in your setup. Zitat von Joshua Schaeffer : Hey all, I'm trying to deploy Ceph 15.2.4 on Ubuntu 20.04 and I am going through the manual deploy process [1]. I was able to successfully bootstrap the monitor and manager a

[ceph-users] Re: osd fast shutdown provokes slow requests

2020-08-14 Thread Dan van der Ster
I think the best course of action would be to open a tracker ticket with details about your environment and your observations, then the devs could try to see if something was overlooked with this change. -- dan On Fri, Aug 14, 2020 at 5:48 PM Manuel Lausch wrote: > > Hi, > > I thought the "fail"

[ceph-users] Re: osd fast shutdown provokes slow requests

2020-08-14 Thread Manuel Lausch
Hi, I thought the "fail" needs to propagated as well. Am I false? Who can have a look if a markdown message in "fast shutdown" mode is a possibilitiy? I do not have the expertice to say if this would breakt something else. But if this is possible I would vote for this. Thanks Manuel On Fri, 1

[ceph-users] Can't add OSD id in manual deploy

2020-08-14 Thread Joshua Schaeffer
Hey all, I'm trying to deploy Ceph 15.2.4 on Ubuntu 20.04 and I am going through the manual deploy process [1]. I was able to successfully bootstrap the monitor and manager and am now trying to add the OSD's, but the `ceph-volume` command is hanging when running `ceph osd new`. It appears the c

[ceph-users] Re: Radosgw Multiside Sync

2020-08-14 Thread Ansgar Jazdzewski
Hi, > As I can understand, we are talking about Ceph 15.2.x Octopus, right? Yes i'am on ceph 15.2.4 > What is the number of zones/realms/zonegroups? ATM i run just a small test on my local machine one zonegroup (global) with a zone node01 and node02 als just one realm > Is Ceph healthy? (ceph

[ceph-users] Re: Radosgw Multiside Sync

2020-08-14 Thread Vladimir Sigunov
Hi Ansgar, As I can understand, we are talking about Ceph 15.2.x Octopus, right? What is the number of zones/realms/zonegroups? Is Ceph healthy? (ceph -s and ceph health detail ) What does radosgw-admin sync status say? Do you see your zone.user (or whatever you name it) in both zones with the

[ceph-users] Re: SED drives ,*how to fio test all disks, poor performance

2020-08-14 Thread Ed Kalk
screenshot attached showing the IOPs and LAtency from iostat -xtc 2 On 8/14/2020 9:09 AM, Ed Kalk wrote:  ubuntu@ubuntu:/mnt$ sudo fio --filename=/mnt/sda1/file1.fio:/mnt/sdb1/file2.fio:/mnt/sdc1/file3.fio:/mnt/sdd1/file4.fio:/mnt/sde1/file5.fio:/mnt/sdf1/file6.fio:/mnt/sdg1/file7.fio:/mnt/sdh1/

[ceph-users] How to separate WAL DB and DATA using cephadm or other method?

2020-08-14 Thread Popoi Zen
Hi, I just want to know if it possible to separate my NVME disk and HDD disk roles on cephadm. Example I have an NVME paritition: /dev/nvme0n0 and HDD: /dev/sdb. How to pair that 2 disk, so I can use /dev/sdb as OSD and the whole /dev/nvme0n0 as WAL DB?

[ceph-users] Re: osd fast shutdown provokes slow requests

2020-08-14 Thread Dan van der Ster
Hi, I suppose the idea is that it's quicker to fail via the connection refused setting than by waiting for an osdmap to be propagated across the cluster. It looks simple enough in OSD.cc to also send the down message to the mon even with fast shutdown enabled. But I don't have any clue if that wo

[ceph-users] Re: osd fast shutdown provokes slow requests

2020-08-14 Thread Manuel Lausch
Hi Dan, thank you for the link. I read it as well as the linked conversation in the rook project. I don't get it why the fast shutdown should be better than the "normal" shutdown in which the OSD annouces its shutdown directly. Are there cases where the shutdown of the OSD takes longer until its

[ceph-users] Re: osd fast shutdown provokes slow requests

2020-08-14 Thread Dan van der Ster
There's a bit of discussion on this at the original PR: https://github.com/ceph/ceph/pull/31677 Sage claims the IO interruption should be smaller with osd_fast_shutdown than without. -- dan On Fri, Aug 14, 2020 at 10:08 AM Manuel Lausch wrote: > > Hi Dan, > > stopping a single OSD took mostly 1

[ceph-users] Re: OSD memory leak?

2020-08-14 Thread Dan van der Ster
Hi Frank, I'm having trouble getting the exact version of ceph you used to create this heap profile. Could you run the google-pprof --text steps at [1] and share the output? Thanks, Dan [1] https://docs.ceph.com/docs/master/rados/troubleshooting/memory-profiling/ On Tue, Aug 11, 2020 at 2:37 P

[ceph-users] Radosgw Multiside Sync

2020-08-14 Thread Ansgar Jazdzewski
Hi Folks, i'am trying to move from our own custom bucket synchronization to the rados-gateway build in one. Multisite setup is working https://docs.ceph.com/docs/master/radosgw/multisite/ All buckes and users are visible in both clusters Next i tried to setup the multi-side-sync https://docs.cep

[ceph-users] Re: osd fast shutdown provokes slow requests

2020-08-14 Thread Manuel Lausch
Hi Dan, stopping a single OSD took mostly 1 to 2 seconds betwenn stop and the first reporting in ceph.log. Stopping a whole node, in this case 24 OSDs, in the most cases it took 5 to 7 seconds. After the reporting peering begins, but this is quite fast. Since I have the fast shutdown disabled. Th

[ceph-users] Re: Single node all-in-one install for testing

2020-08-14 Thread Eugen Block
Hi, you should be able to add the OSD manually directly on the host with: ceph-volume lvm create --data {vg name/lv name} Regards, Eugen Zitat von "Richard W.M. Jones" : I have one spare machine with a single 1TB on it, and I'd like to test a local Ceph install. This is just for testing, I

[ceph-users] Re: CephFS clients waiting for lock when one of them goes slow

2020-08-14 Thread Eugen Block
Hi, 1,5 GB MDS cache for 20 clients is not enough, I would start with increasing that to 4 or 8 GB and see if the problems still occur. Regards, Eugen Zitat von Petr Belyaev : Hi all, I have a very small Ceph setup - 3 OSDs, 3 MDS, 3 MON, CephFS with ~20 ceph-fuse clients connected. Al