Dear Ceph folks,
Recently i encountered incomplete PGs when replacing an OSD node with new
handware. I noticed multiple OSD ups and downs, and eventually a few PGs got
stucked at PG incomplete status.
Questions 1: is there a reliable way to avoid the occurence of incomplete PGs?
On Sat, Aug 15, 2020 at 12:32 AM wrote:
>
> Yes, I've seen this problem quite frequently as of late, running v13.2.10
> MDS. It seems to be dependent on the client behavior - a lot of xlock
> contention on some directory, although it's hard to pin down which client is
> doing what. The only rem
Solution Failed!
I rebalanced all osd's to 0.0 and then back to their original weight, and
started getting back to my original ~269 IOPS.
It has been about 5 days since I completed the re-balance and performance is
degrading again! There is a bit of improvement but not to where it was in Mimic.
Usually it should also accept the device path (although I haven't
tried that in Octopus yet), you could try `ceph-volume lvm prepare
--data /path/to/device` first and then activate it. If that doesn't
work, try to create a vg and lv and try it with LVM syntax
(ceph-volume lvm prepare --data
On 8/14/20 10:57 AM, Eugen Block wrote:
> I didn't notice that. Have you tried this multiple times with the same disk?
> Do you see any other error messages in syslog?
Thanks Eugen for your fast response. Yes, I have tried it multiple times, but
I'm trying again right now just to be sure the ou
Hi,
I've previously discussed some issues I've had with the RGW lifecycle
processing. I've discovered that the root cause of my problem is that:
* I'm running a multisite configuration
* Life cycle processing is done on the master site each night.
`radosgw-admin lc list` correctly re
Hi,
I'm not sure what SUSE Support would suggest (you probably should be
able to open a case?) but I'd probably go for nodeep-scub flag, wait
for the already started deep-scrubs to finish and then trigger a pg
repair and a deep-scrub on the pg if the repair doesn't resolve it.
This should
As of my understanding - no, you shouldn't if you are running full site
sync. Your system user (zone.user) has full access and this account
should take care about everything. You should enlist particular buckets
(users) only for per-bucket sync flows.
Vladimir
-Original Message-From: Ansgar
Hi,
it looks like that only buckets from my sub-tenant-user are not in sync:
<...>
radosgw-admin --tenant tmp --uid test --display-name "Test User"
--access_key 1VIH8RUV7OD5I3IWFX5H --secret
0BvSbieeHhKi7gLHyN8zsVPHIzEFRwEXZwgj0u22 user create
<..>
Do I have to create a new group/flow/pipe for ea
I didn't notice that. Have you tried this multiple times with the same
disk? Do you see any other error messages in syslog?
Zitat von Joshua Schaeffer :
The OSD node is the same as the monitor and manager node at the
moment and has the ceph.conf file:
user@node1:~$ ls -l /etc/ceph/
total
Hi Ceph users,
We are running a suse SES 5.5 cluster that's largely based on luminous with
some mimic backports.
We've been doing some large reshuffling from adding in additional OSDs and
during this process we have an inconsistent pg group, investigation
suggests there was a read error.
We woul
The OSD node is the same as the monitor and manager node at the moment and has
the ceph.conf file:
user@node1:~$ ls -l /etc/ceph/
total 15
-rw--- 1 root root 151 Aug 13 15:50 ceph.client.admin.keyring
-rw-r--r-- 1 root root 432 Aug 13 16:09 ceph.conf
-rw-r--r-- 1 root root 92 Jun 30 16:44 rb
Yes, I've seen this problem quite frequently as of late, running v13.2.10 MDS.
It seems to be dependent on the client behavior - a lot of xlock contention on
some directory, although it's hard to pin down which client is doing what. The
only remedy was to fail over the MDS.
1k - 4k clients
2M r
The OSD node also needs the ceph.conf, it seems that is not the case
in your setup.
Zitat von Joshua Schaeffer :
Hey all,
I'm trying to deploy Ceph 15.2.4 on Ubuntu 20.04 and I am going
through the manual deploy process [1]. I was able to successfully
bootstrap the monitor and manager a
I think the best course of action would be to open a tracker ticket
with details about your environment and your observations, then the
devs could try to see if something was overlooked with this change.
-- dan
On Fri, Aug 14, 2020 at 5:48 PM Manuel Lausch wrote:
>
> Hi,
>
> I thought the "fail"
Hi,
I thought the "fail" needs to propagated as well. Am I false?
Who can have a look if a markdown message in "fast shutdown" mode is a
possibilitiy? I do not have the expertice to say if this would breakt
something else. But if this is possible I would vote for this.
Thanks
Manuel
On Fri, 1
Hey all,
I'm trying to deploy Ceph 15.2.4 on Ubuntu 20.04 and I am going through the
manual deploy process [1]. I was able to successfully bootstrap the monitor and
manager and am now trying to add the OSD's, but the `ceph-volume` command is
hanging when running `ceph osd new`. It appears the c
Hi,
> As I can understand, we are talking about Ceph 15.2.x Octopus, right?
Yes i'am on ceph 15.2.4
> What is the number of zones/realms/zonegroups?
ATM i run just a small test on my local machine one zonegroup (global)
with a zone node01 and node02 als just one realm
> Is Ceph healthy? (ceph
Hi Ansgar,
As I can understand, we are talking about Ceph 15.2.x Octopus, right?
What is the number of zones/realms/zonegroups?
Is Ceph healthy? (ceph -s and ceph health detail )
What does radosgw-admin sync status say?
Do you see your zone.user (or whatever you name it) in both zones with
the
screenshot attached showing the IOPs and LAtency from iostat -xtc 2
On 8/14/2020 9:09 AM, Ed Kalk wrote:
ubuntu@ubuntu:/mnt$ sudo fio
--filename=/mnt/sda1/file1.fio:/mnt/sdb1/file2.fio:/mnt/sdc1/file3.fio:/mnt/sdd1/file4.fio:/mnt/sde1/file5.fio:/mnt/sdf1/file6.fio:/mnt/sdg1/file7.fio:/mnt/sdh1/
Hi, I just want to know if it possible to separate my NVME disk and HDD
disk roles on cephadm. Example I have an NVME paritition: /dev/nvme0n0 and
HDD: /dev/sdb. How to pair that 2 disk, so I can use /dev/sdb as OSD and
the whole /dev/nvme0n0 as WAL DB?
Hi,
I suppose the idea is that it's quicker to fail via the connection
refused setting than by waiting for an osdmap to be propagated across
the cluster.
It looks simple enough in OSD.cc to also send the down message to the
mon even with fast shutdown enabled. But I don't have any clue if that
wo
Hi Dan,
thank you for the link. I read it as well as the linked conversation in
the rook project.
I don't get it why the fast shutdown should be better than the "normal"
shutdown in which the OSD annouces its shutdown directly.
Are there cases where the shutdown of the OSD takes longer until its
There's a bit of discussion on this at the original PR:
https://github.com/ceph/ceph/pull/31677
Sage claims the IO interruption should be smaller with
osd_fast_shutdown than without.
-- dan
On Fri, Aug 14, 2020 at 10:08 AM Manuel Lausch wrote:
>
> Hi Dan,
>
> stopping a single OSD took mostly 1
Hi Frank,
I'm having trouble getting the exact version of ceph you used to
create this heap profile.
Could you run the google-pprof --text steps at [1] and share the output?
Thanks, Dan
[1] https://docs.ceph.com/docs/master/rados/troubleshooting/memory-profiling/
On Tue, Aug 11, 2020 at 2:37 P
Hi Folks,
i'am trying to move from our own custom bucket synchronization to the
rados-gateway build in one.
Multisite setup is working https://docs.ceph.com/docs/master/radosgw/multisite/
All buckes and users are visible in both clusters
Next i tried to setup the multi-side-sync
https://docs.cep
Hi Dan,
stopping a single OSD took mostly 1 to 2 seconds betwenn stop and the
first reporting in ceph.log. Stopping a whole node, in this case 24
OSDs, in the most cases it took 5 to 7 seconds. After the reporting
peering begins, but this is quite fast.
Since I have the fast shutdown disabled. Th
Hi,
you should be able to add the OSD manually directly on the host with:
ceph-volume lvm create --data {vg name/lv name}
Regards,
Eugen
Zitat von "Richard W.M. Jones" :
I have one spare machine with a single 1TB on it, and I'd like to test
a local Ceph install. This is just for testing, I
Hi,
1,5 GB MDS cache for 20 clients is not enough, I would start with
increasing that to 4 or 8 GB and see if the problems still occur.
Regards,
Eugen
Zitat von Petr Belyaev :
Hi all,
I have a very small Ceph setup - 3 OSDs, 3 MDS, 3 MON, CephFS with
~20 ceph-fuse clients connected. Al
29 matches
Mail list logo