On 11/08/2020 20:41, Kevin Myers wrote:
Replica count of 2 is a sure fire way to a crisis !
It is :-)
Sent from my iPad
On 11 Aug 2020, at 18:45, Martin Palma wrote:
Hello,
after an unexpected power outage our production cluster has 5 PGs
inactive and incomplete. The OSDs on which the
All together get profited by picking the right and trustworthy Epson Customer
Service specialist co-ops, it is firmly prescribed to have a solid examination
before engaging with any of such specialist organizations as there are a few
phony organizations are likewise there professing to be the be
So as to be familiar with the HP Support Assistant in an appropriate way, you
ought not leave any stone unturned in moving toward the guaranteed printer
specialists who will expertly help you the specific investigating answer for
your issues in a powerful way. You can move toward them by simply
Hi All
We had a cluster (v13.2.4) with 32 osds in total. At first, an osd (osd.18)
in cluster was down. So, we tried to remove the osd and added a new one
(osd.32) with new ID. We unplugged the disk (osd.18) and plugged in a new
disk in the same slot and add osd.32 into cluster. Then, osd.32 was
b
Hi,
I am not sure but perhaps this could be an Effekt of "balancer" module - if you
use it!?
Hth
Mehmet
Am 10. August 2020 17:28:27 MESZ schrieb David Orman :
>We've gotten a bit further, after evaluating how this remapped count
>was
>determine (pg_temp), we've found the PGs counted as being re
We're happy to announce the availability of the eleventh release in the
Nautilus series. This release brings a number of bugfixes across all
major components of Ceph. We recommend that all Nautilus users upgrade
to this release.
Notable Changes
---
* RGW: The `radosgw-admin` sub-comma
Replica count of 2 is a sure fire way to a crisis !
Sent from my iPad
> On 11 Aug 2020, at 18:45, Martin Palma wrote:
>
> Hello,
> after an unexpected power outage our production cluster has 5 PGs
> inactive and incomplete. The OSDs on which these 5 PGs are located all
> show "stuck requests a
Hello,
after an unexpected power outage our production cluster has 5 PGs
inactive and incomplete. The OSDs on which these 5 PGs are located all
show "stuck requests are blocked":
Reduced data availability: 5 pgs inactive, 5 pgs incomplete
98 stuck requests are blocked > 4096 sec. Implicated os
Hi Mark,
here is a first collection of heap profiling data (valid 30 days):
https://files.dtu.dk/u/53HHic_xx5P1cceJ/heap_profiling-2020-08-03.tgz?l
This was collected with the following config settings:
osd dev osd_memory_cache_min 805306368
osd
tried removing the daemon first, and that kinda blew up.
ceph orch daemon rm --force mon.tempmon
ceph orch host rm tempmon
now there are two problems.
1. ceph is still looking for it,
services:
mon: 4 daemons, quorum ceph1,ceph2,ceph3 (age 3s), out of quorum:
tempmon
mgr: ceph1.oqptlg(
> Hi,
> you can change the MDS setting to be less strict [1]:
> According to [1] the default is 300 seconds to be evicted. Maybe give
> the less strict option a try?
Thanks for your reply. I already set mds_session_blacklist_on_timeout to false.
This seems to have helped somewhat, but still
I have one spare machine with a single 1TB on it, and I'd like to test
a local Ceph install. This is just for testing, I don't care that it
won't have redundancy, failover, etc. Is there any canonical
documentation for this case?
- - -
Longer story is this morning I found this documenta
I'm happy to announce the another release of the go-ceph API
bindings. This is a regular release following our every-two-months release
cadence.
https://github.com/ceph/go-ceph/releases/tag/v0.5.0
The bindings aim to play a similar role to the "pybind" python bindings in the
ceph tree but for
Hello,
When connection is lost between kernel client, a few things happen:
1.
Caps become stale:
Aug 11 11:08:14 admin-cap kernel: [308405.227718] ceph: mds0 caps stale
2.
MDS evicts client for being unresponsive:
MDS log: 2020-08-11 11:12:08.923 7fd1f45ae700 0 log_channel(cluster) log [WRN]
On 8/11/20 2:52 AM, Wido den Hollander wrote:
On 11/08/2020 00:40, Michael Thomas wrote:
On my relatively new Octopus cluster, I have one PG that has been
perpetually stuck in the 'unknown' state. It appears to belong to the
device_health_metrics pool, which was created automatically by the
Hi,
Our production cluster runs Luminous.
Yesterday, one of our OSD-only hosts came up with its clock about 8
hours wrong(!) having been out of the cluster for a week or so.
Initially, ceph seemed entirely happy, and then after an hour or so it
all went South (OSDs start logging about bad aut
Hi,
you can change the MDS setting to be less strict [1]:
It is possible to respond to slow clients by simply dropping their
MDS sessions, but permit them to re-open sessions and permit them to
continue talking to OSDs. To enable this mode, set
mds_session_blacklist_on_timeout to false on
Of course I found the cause shortly after sending the message …
The scrubbing parameters need to move from the [osd] section to the [global]
section, see https://www.suse.com/support/kb/doc/?id=19621
Health is back to OK after restarting osds, mons and mgrs.
Cheers,
Dirk
On Dienstag, 11.
On 11/08/2020 00:40, Michael Thomas wrote:
On my relatively new Octopus cluster, I have one PG that has been
perpetually stuck in the 'unknown' state. It appears to belong to the
device_health_metrics pool, which was created automatically by the mgr
daemon(?).
The OSDs that the PG maps to
Hi,
since some time (I think upgrade to nautilus) we get
X pgs not deep scrubbed in time
I deep-scrubbed the pgs when the error occurred and expected the cluster to
recover over time, but no such luck. The warning comes up again and again.
In our spinning rust cluster we allow deep scrubbing o
20 matches
Mail list logo