Hi Igor,
we store 400TB backups (RDB snapshots) on the cluster, depending on the
schedule we replace all data every one to two weeks, so we are deleting
data every day.
Yes, the OSDs are killed with messages like "heartbeat_check: no reply
from 10.244.0.27:6852 osd.37 ever...", if that is what yo
That's a known issue. You probably did "enable application cephfs" on the
pools. This prevents a meta data tag to be applied correctly. If you google for
your problem, you will find threads on this with fixes. There was at least one
this year.
Also, you could just start from scratch one more ti
We have had multiple clusters experiencing the following situation over the
past few months on both 14.2.6 and 14.2.11. On a few instances it seemed
random , in a second situation we had temporary networking disruption, in a
third situation we accidentally made some osd changes which caused certain
Hey Timothy,
Did you ever resolve this issue, and if so, how?
> Thank you..I looked through both logs and noticed this in the cancel one:
>
> osd_op(unknown.0.0:4164 41.2 41:55b0279d:reshard::reshard.09:head
> [call
> rgw.reshard_remove] snapc 0=[] ondisk+write+known_if_redirected e2498
Hi Paul,
any chances you initiated massive data removal recently?
Are there any suicide timeouts in OSD logs prior to OSD failures? Any
log output containing "slow operation observed" there?
Please also note the following PR and tracker comments which might be
relevant for your case.
https
This is a known issue with RocksDB/BlueFS. Discussed multiple time in
this mailing thread...
This should improve starting Nautilus v14.2.12 thanks to the following PRs:
https://github.com/ceph/ceph/pull/33889
https://github.com/ceph/ceph/pull/37091
Please note these PRs don't fix existing sp
Hi,
I don't know how this happened but it seems second node's hosts file
(/etc/hosts) was broken and "host-1" thinks itself as "host". Fixing
/etc/hosts also fixed this issue.
Thanks,
Gencer.
On 19.11.2020 17:33:52, "Gencer Genç" wrote:
Hi,
I ran those commands as usual:
$ ceph orch hos
Hi,
We are having slow osd's... A hot topic to search on it... I've tried to
dive as deep as I can, but I need to know which debug setting will help me
to dive even deeper...
Okay: situation:
- After expansion lot's of backfill operations are running spread over the
osd's.
- max_backfills is set
Hi,
I ran those commands as usual:
$ ceph orch host ls
Result is as expected by hosts names and addresses.
$ ceph orch ls
Again, Expected result as before.
Then I started to upgrade via this command:
$ ceph orch upgrade start --ceph-version 15.2.6
It failed with attached logs. Please see log
Hi All,
I've been using ceph block and object storage for years but just
wandering into cephfs now (Nautilus all servers on 14.2.9 ).
I created small data and metadata pools, a new filesystem and used:
ceph fs authorize client. / rw
creating two new users to mount it, both can one using fuse (
Hello,
I thought I'd post an update.
Setting the pg_log size to 500, and running the offline trim operation
sequentially on all OSDs seems to help. With our current setup, it takes about
12-48h per node, depending on the pgs per osd. The PG amounts per OSD we have
are ~180-750, with a majority
Hi all,
there seems to be a bug in how beacon time-outs are computed. After waiting for
a full time-out period of 86400s=24h, the problem disappeared. It looks like
received beacons are only counted properly after a MON was up for the grace
period. I have no other explanation.
Best regards,
==
On Thu, Nov 19, 2020 at 3:39 AM David Galloway wrote:
>
> This is the 6th backport release in the Octopus series. This releases
> fixes a security flaw affecting Messenger V2 for Octopus & Nautilus. We
> recommend users to update to this release.
>
> Notable Changes
> ---
> * CVE 2020-
I don't think it's may be a problem. But it also useless.
k
Sent from my iPhone
> On 18 Nov 2020, at 07:06, Szabo, Istvan (Agoda)
> wrote:
>
> Is it s problem if ec_overwrite enabled in the data pool?
> https://docs.ceph.com/en/latest/rados/operations/erasure-code/#erasure-coding-with-overwr
We are doing that as well. But we need to be able to check specific buckets
additionally. For that we use this second approach.
Since we double-check all output from our script anyway (to see if NoSuchKey
actually happens), we can rule out false positives.
So far all the files detected this wa
I would recommend you get a dump with rados ls -p poolname (can be
several GB, mine is 61GB) and grep (or ack, which is faster) for the
names there to get an overview of what is there and what isn't. Looking
up the names directly can easily give you the wrong picture, because it
is kinda compli
Thanks, we are currently scanning our object storage. It looks like we can
detect the missing objects that return “No Such Key” looking at all
“__multipart_” objects returned by radosgw-admin bucket radoslist, and checking
if they exist using rados stat. We are currently not looking at shadow ob
- The head object had a size of 0.
- There was an object with a ’shadow’ in its name, belonging to that path.
That is normal. What is not normal is if there are NO shadow objects.
On 18/11/2020 10:06, Denis Krienbühl wrote:
It looks like a single-part object. But we did replace that object last
18 matches
Mail list logo