I use the same technic for my normal snapshot backups. But it’s regarding a
Autodesk database. In order to have full support of Autodesk in case things go
wrong, I need to follow Autodesk recommendations. That is to do a data-backup
(db-dump + file store copie) with their tool being the ADMS Co
Hi,
my OSDs are running odroid-hc4's and they only have about 4GB of memory,
and every 10 minutes a random OSD crashes due to out of memory. Sadly the
whole machine gets unresponsive when the memory gets completely full, so no
ssh access or prometheus output in the meantime.
After the osd success
for example on of my latest osd crashes looks like this in dmesg:
[Dec 2 08:26] bstore_mempool invoked oom-killer:
gfp_mask=0x24200ca(GFP_HIGHUSER_MOVABLE), nodemask=0, order=0,
oom_score_adj=0
[ +0.06] bstore_mempool
cpuset=ed46e6fa52c1e40f13389b349c54e62dcc8c65d76c4c7860e2ff7c39444d14cc
mem
> my OSDs are running odroid-hc4's and they only have about 4GB of memory,
> and every 10 minutes a random OSD crashes due to out of memory. Sadly the
> whole machine gets unresponsive when the memory gets completely full, so no
> ssh access or prometheus output in the meantime.
> I've set the mem
Can I get rid of PGs after trying to decrease the number on the pool again?
Doing a backup and nuking the cluster seems a little too much work for me :)
$ sudo ceph osd pool get cephfs_data pg_num
pg_num: 128
$ sudo ceph osd pool set cephfs_data pg_num 16
$ sudo ceph osd pool get cephfs_data pg_
Hi,
we are currently encountering a lot of broken / orphan multipart uploads.
When I try to fetch the multipart uploads via s3cmd, it just never finishes.
Debug output looks like this and it basically never changes.
DEBUG: signature-v4 headers: {'x-amz-date': '20221202T105838Z',
'Authorization':
Hi Eric,
sadly it took too long from the customer complaining until it reached my
desk, so there are not RGW CLIENT logs.
We are currently improving out logging situation to move the logs to
graylog.
Currently it looks like, that the GC removed rados objects it should not
have removed due to this
Hello,
still do not really understand why this error message comes up.
The error message contains two significant numbers. The first one which
is easy to understand is the maximal value of pgs for each osd a
precompiled config variable (mon_max_pg_per_osd). The value on my
cluster is 250. This
Hi Rainer,
there is indeed a bit of a mess in terminology. The number mon_max_pg_per_osd
means "the maximum number of PGs an OSD is a member of", which is equal to "the
number of PG shards an OSD holds". Unfortunately, this confusion is endemic in
the entire documentation and one needs to look
Hi,
can you elaborate a bit what happened and why "a few reboots" were
required? 64% inactive PGs and 700 unkown PGs doesn't look too good.
Has this improved a bit since your post? If ceph orch commands are not
responding it could point to a broken mgr, do you see anything in the
logs of
I succesfully setup a stretched cluster, except the CRUSH rule mentioned in the
docs wasn't correct. The parameters for "min_size" and "max_size" should be
removed, or else the rule can't be imported.
Second there should be a mention about setting the monitor crush location takes
sometime and kn
> Hello,
>
> still do not really understand why this error message comes up.
> The error message contains two significant numbers. The first one which is
> easy to understand is the maximal value of pgs for each osd a precompiled
> config variable (mon_max_pg_per_osd). The value on my cluster
Thanks for the hint, I tried turning that off:
$ sudo ceph osd pool get cephfs_data pg_autoscale_mode
pg_autoscale_mode: on
$ sudo ceph osd pool set cephfs_data pg_autoscale_mode off
set pool 9 pg_autoscale_mode to off
$ sudo ceph osd pool get cephfs_data pg_autoscale_mode
pg_autoscale_mode: off
This can't be done in a very nice way currently. There's actually an open
PR against main to allow setting the crush location for mons in the service
spec specifically because others found that this was annoying as well. What
I think should work as a workaround is, go to the host where the mon that
That isn't a great solution indeed, but I'll try the solution. Would this also
be necessary to replace the Tiebreaker?
From: Adam King
Sent: Friday, December 2, 2022 2:48:19 PM
To: Sake Paulusma
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] How to replace or
yes, I think so. I think the context in which this originally came up was
somebody trying to replace the tiebreaker mon.
On Fri, Dec 2, 2022 at 9:08 AM Sake Paulusma wrote:
> That isn't a great solution indeed, but I'll try the solution. Would this
> also be necessary to replace the Tiebreaker?
The instructions work great, the monitor is added in the monmap now.
I asked about the Tiebreaker because there is a special command to replace the
current one. But this manual intervention is probably still needed to first set
the correct location. Will report back later when I replace the curr
Dear Mark.
Thank you very much for all of this information. I learned a lot! In
particular that I need to learn more about pinning.
In the end, I want to run the whole thing in production with real world
workloads. My main aim in running the benchmark is to ensure that my
hardware and OS is corre
I am currently going over all our buckets, which takes some time:
# for BUCKET in `radosgw-admin bucket stats | jq -r '.[] | .bucket'`;
do radosgw-admin
bi list --bucket ${BUCKET} | jq -r '.[] | select(.idx? |
match("_multipart.*")) | .idx + ", " + .entry.meta.mtime' >
${BUCKET}.multiparts done
An
We have a large cluster (10PB) which is about 30% full at this point. We
recently fixed a configuration issue that then triggered the pg autoscaler to
start moving around massive amounts of data (85% misplaced objects - about 7.5B
objects). The misplaced % is dropping slowly (about 10% each
hi,
maybe someone here can help me to debug an issue we faced today.
Today one of our clusters came to a grinding halt with 2/3 of our OSDs
reporting slow ops.
Only option to get it back to work fast, was to restart all OSDs daemons.
The cluster is an octopus cluster with 150 enterprise SSD OSDs.
21 matches
Mail list logo