Cc: Dan van der Ster; Patrick Donnelly; Bailey Allison; Spencer Macphee
Subject: [ceph-users] Re: Help needed, ceph fs down due to large stray dir
Hi all,
we took a log with setting debug_journaler=20 and managed to track
the deadlock down to line
https://github.com/ceph/ceph/blob/pacific/s
nk Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Frank Schilder
Sent: Sunday, January 19, 2025 5:35 PM
To: ceph-users@ceph.io
Cc: Dan van der Ster; Patrick Donnelly; Bailey Allison; Spencer Macphee
Subject: [ceph-users] Re: Help needed, ceph fs down due to
Thanks Stephan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hi Mohammad,
this seems to be a bug in the current squid version.
https://tracker.ceph.com/issues/69527
Cheers
Stephan
Am Mo., 20. Jan. 2025 um 11:56 Uhr schrieb Saif Mohammad <
samdto...@gmail.com>:
> Hello Community,
>
> We are trying to set ACL for one of the objects by s3cmd tool within t
are affected.
Thanks for your help and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Frank Schilder
Sent: Saturday, January 18, 2025 2:21 PM
To: Frédéric Nass; ceph-users@ceph.io
Cc: Dan van der Ster; Patrick
Hi all,
looking at the log data (see snippet at end) we suspect a classic
"producer–consumer" deadlock since it seems that the same thread that is
filling the purge queue at PurgeQueue.cc:L335:journaler.append_entry(bl) in
function PurgeQueue::push is also responsible for processing it but the
ceph-users@ceph.io
Cc: Dan van der Ster; Patrick Donnelly; Bailey Allison; Spencer Macphee
Subject: Re: [ceph-users] Re: Help needed, ceph fs down due to large stray dir
Hi Frank,
More than ever. You should open a tracker and post debug logs there so anyone
can have a look.
Regards
; Spencer Macphee
Objet : [ceph-users] Re: Help needed, ceph fs down due to large stray dir
Dear all,
a quick update and some answers. We set up a dedicated host for running an MDS
and debugging the problem. On this host we have 750G RAM, 4T swap and 4T log,
both on fast SSDs. Plan is to monitor
Dear all,
a quick update and some answers. We set up a dedicated host for running an MDS
and debugging the problem. On this host we have 750G RAM, 4T swap and 4T log,
both on fast SSDs. Plan is to monitor with "perf top" the MDS becoming the
designated MDS for the problematic rank and also pull
Hi,
basically it's that easy [0] when only one or few hosts are
reinstalled but the cluster is otherwise operative:
ceph cephadm osd activate ...
If your cluster has lost all monitors, it can get difficult. You can
rebuild the mon store [1] by collecting required information from
*ALL* O
he current state the MDS is in,
but you may want to consider this move if you can.
Regards,
Frédéric.
De : Frank Schilder
Envoyé : dimanche 12 janvier 2025 00:07
À : Eugen Block
Cc: ceph-users@ceph.io
Objet : [ceph-users] Re: Help needed, ceph fs down due to
10:43 PM
To: Eugen Block
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Help needed, ceph fs down due to large stray dir
Hi Eugen,
thanks and yes, let's try one thing at a time. I will report back.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 10
ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Help needed, ceph fs down due to large stray dir
Personally, I would only try one change at a time and wait for a
result. Otherwise it can get difficult to tell what exactly helped and
what not.
I have never played with auth_service_ticket_ttl yet, so
ng 109, rum S14
From: Eugen Block
Sent: Saturday, January 11, 2025 7:59 PM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Help needed, ceph fs down due to large stray dir
Hi Frank,
not sure if this already has been mentioned, but this one has 60
seconds timeout:
mds_beacon_mon_down_gra
ceph-users@ceph.io
Subject: [ceph-users] Re: Help needed, ceph fs down due to large stray dir
Hi Frank,
not sure if this already has been mentioned, but this one has 60
seconds timeout:
mds_beacon_mon_down_grace
ceph config help mds_beacon_mon_down_grace
mds_beacon_mon_down_grace - toleran
From: Frank Schilder
Sent: Saturday, January 11, 2025 12:46 PM
To: Dan van der Ster
Cc: Bailey Allison; ceph-users@ceph.io
Subject: [ceph-users] Re: Help needed, ceph fs down due to large stray dir
Hi all,
my hopes are down again. The MDS might look busy but I'm not sure
its doing any
-users@ceph.io
Subject: [ceph-users] Re: Help needed, ceph fs down due to large stray dir
Hi all,
my hopes are down again. The MDS might look busy but I'm not sure its doing
anything interesting. I now see a lot of these in the log (stripped the
heartbeat messages):
2025-01-11T12:35:50.712
m S14
From: Frank Schilder
Sent: Saturday, January 11, 2025 11:41 AM
To: Dan van der Ster
Cc: Bailey Allison; ceph-users@ceph.io
Subject: [ceph-users] Re: Help needed, ceph fs down due to large stray dir
Hi all,
new update: after sleeping after the final MDS restart the MDS is
regards!
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Frank Schilder
Sent: Saturday, January 11, 2025 2:36 AM
To: Dan van der Ster
Cc: Bailey Allison; ceph-users@ceph.io
Subject: [ceph-users] Re: Help needed, ceph fs down due to
he
MDS idle yet unresponsive".
Thanks for your help so far! Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Dan van der Ster
Sent: Saturday, January 11, 2025 3:04 AM
To: Frank Schilder
Cc: Bailey Allison; ceph
some progress with
> trimming the stray items? However, I can't do 850 restarts in this fashion,
> there has to be another way.
>
> I would be really grateful for any help regarding getting he system in a
> stable state for further trouble shooting. I would really block all cl
ystem and
trim the stray items is dearly needed. Alternatively, is there a way to do
off-line trimming?
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
____
From: Dan van der Ster
Sent: Friday, January 10, 2025 11:32 PM
To: Frank Schilder
Cc: Bailey Allison; cep
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ____
> From: Bailey Allison
> Sent: Friday, January 10, 2025 10:23 PM
> To: Frank Schilder; ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: Help needed, ceph fs down due
+1 to this, and the doc mentioned.
Just be aware depending on version the heartbeat grace parameter is
different, I believe for 16 and below it's the one I mentioned, and it's
to be set on the mon level, and for 17 and newer it is what Spencer
mentioned. The doc he has provided also mentions s
mds_beacon_grace is, perhaps confusingly, not an MDS configuration. It's
applied to MONs. As you've injected it into the MDS that is likely why the
heartbeat is still failing:
This has the effect of having the MDS continue to send beacons to the
monitors
even when its internal "heartbeat" mechanis
You could try some of the steps here Frank:
https://docs.ceph.com/en/quincy/cephfs/troubleshooting/#avoiding-recovery-roadblocks
mds_heartbeat_reset_grace is probably the only one really relevant to your
scenario.
On Fri, Jan 10, 2025 at 1:30 PM Frank Schilder wrote:
> Hi all,
>
> we seem to ha
oceed.
Thanks so far and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Bailey Allison
Sent: Friday, January 10, 2025 10:23 PM
To: Frank Schilder; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Help needed, ceph f
___
From: Bailey Allison
Sent: Friday, January 10, 2025 10:05 PM
To: ceph-users@ceph.io; Frank Schilder
Subject: Re: [ceph-users] Re: Help needed, ceph fs down due to large stray dir
Frank,
You mentioned previously a large number of strays on the mds rank. Are
you able to che
sø Campus
Bygning 109, rum S14
From: Bailey Allison
Sent: Friday, January 10, 2025 10:05 PM
To: ceph-users@ceph.io; Frank Schilder
Subject: Re: [ceph-users] Re: Help needed, ceph fs down due to large stray dir
Frank,
You mentioned previously a large number of strays on the mds r
Frank,
You mentioned previously a large number of strays on the mds rank. Are
you able to check the rank again to see how many strays there are again?
We've previously had a similar issue, and once the MDS came back up we
had to stat the filesystem to decrease the number of strays, and which
Hi all,
I got the MDS up. however, after quite some time its sitting with almost no CPU
load:
top - 21:40:02 up 2:49, 1 user, load average: 0.00, 0.02, 0.34
Tasks: 606 total, 1 running, 247 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.1 sy, 0.0 ni, 99.9 id, 0.0 wa, 0.0 hi, 0.0
HI Frank,
What is the state of the mds currently? We are probably at a point where
we do a bit of hope and waiting for it to come back up.
Regards,
Bailey Allison
Service Team Lead
45Drives, Ltd.
866-594-7199 x868
On 1/10/25 15:51, Frank Schilder wrote:
Hi all,
I seem to have gotten the MD
Hi all,
I seem to have gotten the MDS up to the point that it reports stats. Does this
mean anything:
2025-01-10T20:50:25.256+0100 7f87ccd5f700 1 heartbeat_map is_healthy 'MDSRank'
had timed out after 15.00954s
2025-01-10T20:50:25.256+0100 7f87ccd5f700 0 mds.beacon.ceph-12 Skipping beacon
I had a similar issue some months ago that ended up using around 300
gigabytes of RAM for a similar number of strays.
You can get an idea of the strays kicking around by checking the
omapkeys of the stray objects in the cephfs metadata pool. Strays are
tracked in objects: 600., 601.000
Hi Patrick and others,
thanks for your fast reply. The problem we are in comes from forward scrub
ballooning and the memory overuse did not go away even after aborting the
scrub. The "official" way to evaluate strays I got from Neha was to restart the
rank.
I did not expect that the MDS needs
Hi Frank,
Are you able to share any logs from the mds that's crashing? And just to
confirm the rank goes into up:active before eventually OOM ?
This sounds familiar-ish but i'm also recovering after a nearly 24 hour
bender of another ceph related recovery.trying to rack my brain of
simil
Hi Frank,
On Fri, Jan 10, 2025 at 12:31 PM Frank Schilder wrote:
>
> Hi all,
>
> we seem to have a serious issue with our file system, ceph version is pacific
> latest. After a large cleanup operation we had an MDS rank with 100Mio stray
> entries (yes, one hundred million). Today we restarted
On 17-10-2024 15:16, Nico Schottelius wrote:
Stefan Kooman writes:
On 16-10-2024 03:02, Harry G Coin wrote:
Thanks for the notion! I did that, the result was no change to the
problem, but with the added ceph -s complaint "Public/cluster
network defined, but can not be found on any host" --
Stefan Kooman writes:
> On 16-10-2024 03:02, Harry G Coin wrote:
>> Thanks for the notion! I did that, the result was no change to the
>> problem, but with the added ceph -s complaint "Public/cluster
>> network defined, but can not be found on any host" -- with
>> otherwise totally normal clust
On 16-10-2024 03:02, Harry G Coin wrote:
Thanks for the notion! I did that, the result was no change to the
problem, but with the added ceph -s complaint "Public/cluster network
defined, but can not be found on any host" -- with otherwise totally
normal cluster operations. Go figure. How ca
Hi Frédéric
All was normal in v18, after 19.2 the problem remains even though the
addresses are different:
cluster_network global: fc00:1000:0:b00::/64
public_network global: fc00:1002:c7::/64
Also, after rebooting everything in sequence, it only complains that the
27 osd that are both up,
Hi Harry,
Do you have a 'cluster_network' set to the same subnet as the 'public_network'
like in the issue [1]? Doesn't make much sens setting up a cluster_network when
it's not different than the public_network.
Maybe that's what triggers the OSD_UNREACHABLE recently coded here [2] (even
thoug
Thanks for the notion! I did that, the result was no change to the
problem, but with the added ceph -s complaint "Public/cluster network
defined, but can not be found on any host" -- with otherwise totally
normal cluster operations. Go figure. How can ceph -s be so totally
wrong, the dashbo
Try failing over to a standby mgr
> On Oct 14, 2024, at 9:33 PM, Harry G Coin wrote:
>
> I need help to remove a useless "HEALTH ERR" in 19.2.0 on a fully dual stack
> docker setup with ceph using ip v6, public and private nets separated, with a
> few servers. After upgrading from an error
Cybersecurity and Information Assurance
> 4 Brindabella Cct
> Brindabella Business Park
> Canberra Airport, ACT 2609
>
> www.raytheonaustralia.com.au
> LinkedIn | Twitter | Facebook | Instagram
>
> -Original Message-
> From: Adam King
> Sent: Monday, September 23, 202
nce
4 Brindabella Cct
Brindabella Business Park
Canberra Airport, ACT 2609
www.raytheonaustralia.com.au
LinkedIn | Twitter | Facebook | Instagram
-Original Message-
From: Adam King
Sent: Monday, September 23, 2024 8:36 AM
To: Kozakis, Anestis
Cc: ceph-users
Subject: [External] [ceph-user
Cephadm stored the key internally within the cluster and it can be grabbed
with `ceph config-key get mgr/cephadm/ssh_identity_key`. As for if you
already have keys setup, I'd recommend passing filepaths to those keys to
the `--ssh-private-key` and `--ssh-public-key` flags the bootstrap command
has
Hi,
if you assigned the SSD to be for block.db it won't be available from
the orchestrator's point of view as a data device. What you could try
is to manually create a partition or LV on the remaining SSD space and
then point the service spec to that partition/LV via path spec. I
haven't
Hi,
I believe our KL studio has hit this same bug after deleting a pool that
was used only for testing.
So, is there any procedure to get rid of those bad journal events and get
the mds back to rw state?
Thanks,
---
Olli Rajala - Lead TD
Anima Vitae Ltd.
www.anima.fi
-
> Hi,
>
> just one question coming to mind, if you intend to migrate the images
> separately, is it really necessary to set up mirroring? You could just 'rbd
> export' on the source cluster and 'rbd import' on the destination cluster.
That can be slower if using a pipe, and require staging sp
- Le 11 Juil 24, à 20:50, Dave Hall kdh...@binghamton.edu a écrit :
> Hello.
>
> I would like to use mirroring to facilitate migrating from an existing
> Nautilus cluster to a new cluster running Reef. RIght now I'm looking at
> RBD mirroring. I have studied the RBD Mirroring section of th
Hi,
just one question coming to mind, if you intend to migrate the images
separately, is it really necessary to set up mirroring? You could just
'rbd export' on the source cluster and 'rbd import' on the destination
cluster.
Zitat von Anthony D'Atri :
I would like to use mirroring to
>
> I would like to use mirroring to facilitate migrating from an existing
> Nautilus cluster to a new cluster running Reef. RIght now I'm looking at
> RBD mirroring. I have studied the RBD Mirroring section of the
> documentation, but it is unclear to me which commands need to be issued on
> ea
First, thanks Xiubo for your feedback !
To go further on the points raised by Sake:
- How does this happen ? -> There were no preliminary signs before the incident
- Is this avoidable? -> Good question, I'd also like to know how!
- How to fix the issue ? -> So far, no fix nor workaround from w
Hi Xiubo
Thank you for the explanation! This won't be a issue for us, but made me think
twice :)
Kind regards,
Sake
> Op 04-06-2024 12:30 CEST schreef Xiubo Li :
>
>
> On 6/4/24 15:20, Sake Ceph wrote:
> > Hi,
> >
> > A little break into this thread, but I have some questions:
> > * How d
On 6/4/24 15:20, Sake Ceph wrote:
Hi,
A little break into this thread, but I have some questions:
* How does this happen, that the filesystem gets into readonly modus
The detail explanation you can refer to the ceph PR:
https://github.com/ceph/ceph/pull/55421.
* Is this avoidable?
* How-
Hi,
A little break into this thread, but I have some questions:
* How does this happen, that the filesystem gets into readonly modus
* Is this avoidable?
* How-to fix the issue, because I didn't see a workaround in the mentioned
tracker (or I missed it)
* With this bug around, should you use c
Hi Nicolas,
This is a known issue and Venky is working on it, please see
https://tracker.ceph.com/issues/63259.
Thanks
- Xiubo
On 6/3/24 20:04, nbarb...@deltaonline.net wrote:
Hello,
First of all, thanks for reading my message. I set up a Ceph version 18.2.2 cluster with
4 nodes, everythin
On Tue, May 28, 2024 at 8:54 AM Noe P. wrote:
>
> Hi,
>
> we ran into a bigger problem today with our ceph cluster (Quincy,
> Alma8.9).
> We have 4 filesystems and a total of 6 MDs, the largest fs having
> two ranks assigned (i.e. one standby).
>
> Since we often have the problem of MDs lagging be
Hi,
just for the archives:
On Tue, 5 Mar 2024, Anthony D'Atri wrote:
* Try applying the settings to global so that mons/mgrs get them.
Setting osd_deep_scrub_interval at global instead at osd immediately turns
health to OK and removes the false warning from PGs not scrubbed in time.
HTH,
I had the same problem as you
The only solution that worked for me is to set it on the pools:
for pool in $(ceph osd pool ls); do
ceph osd pool set "$pool" scrub_max_interval "$smaxi"
ceph osd pool set "$pool" scrub_min_interval "$smini"
ceph osd pool set "$pool" d
Hi Anthony,
thanks for the tips. I reset all the values but osd_deep_scrub_interval
to their defaults as reported at
https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/ :
# ceph config set osd osd_scrub_sleep 0.0
# ceph config set osd osd_scrub_load_threshold 0.5
# ceph config
* Try applying the settings to global so that mons/mgrs get them.
* Set your shallow scrub settings back to the default. Shallow scrubs take
very few resources
* Set your randomize_ratio back to the default, you’re just bunching them up
* Set the load threshold back to the default, I can’t ima
Hi Anthony and everyone else
We have found the issue. Because the new 20x 14 TiB OSDs were onboarded
onto a single node, there was not only an imbalance in the capacity of each
OSD but also between the nodes (other nodes each have around 15x 1.7TiB).
Furthermore, CRUSH rule sets default failure do
> I have recently onboarded new OSDs into my Ceph Cluster. Previously, I had
> 44 OSDs of 1.7TiB each and was using it for about a year. About 1 year ago,
> we onboarded an additional 20 OSDs of 14TiB each.
That's a big difference in size. I suggest increasing mon_max_pg_per_osd to
1000 --
Hi Jasper,
I suggest to disable all the crush-compat and reweighting approaches.
They rarely work out.
The state of the art is:
ceph balancer on
ceph balancer mode upmap
ceph config set mgr mgr/balancer/upmap_max_deviation 1
Cheers, Dan
--
Dan van der Ster
CTO
Clyso GmbH
p: +49 89 215252722 |
On Wed, Jan 31, 2024 at 3:43 AM garcetto wrote:
>
> good morning,
> i was struggling trying to understand why i cannot find this setting on
> my reef version, is it because is only on latest dev ceph version and not
> before?
that's right, this new feature will be part of the squid release. we
Thank you Eugen! This worked :)
> Op 09-11-2023 14:55 CET schreef Eugen Block :
>
>
> It's the '#' character, everything after (including '#' itself) is cut
> off. I tried with single and double quotes which also failed. But as I
> already said, use a simple password and then change it with
It's the '#' character, everything after (including '#' itself) is cut
off. I tried with single and double quotes which also failed. But as I
already said, use a simple password and then change it within grafana.
That way you also don't have the actual password lying around in clear
text in
I just tried it on a 17.2.6 test cluster, although I don't have a
stack trace the complicated password doesn't seem to be applied (don't
know why yet). But since it's an "initial" password you can choose
something simple like "admin", and during the first login you are
asked to change it an
I tried everything at this point, even waited a hour, still no luck. Got it 1
time accidentally working, but with a placeholder for a password. Tried with
correct password, nothing and trying again with the placeholder didn't work
anymore.
So I thought to switch the manager, maybe something is
Usually, removing the grafana service should be enough. I also have
this directory (custom_config_files/grafana.) but it's
empty. Can you confirm that after running 'ceph orch rm grafana' the
service is actually gone ('ceph orch ls grafana')? The directory
underneath /var/lib/ceph/{fsid}/gr
Using podman version 4.4.1 on RHEL 8.8, Ceph 17.2.7
I used 'podman system prune -a -f' and 'podman volume prune -f' to cleanup
files, but this leaves a lot of files over in
/var/lib/containers/storage/overlay and a empty folder
/var/lib/ceph//custom_config_files/grafana..
Found those files with
What doesn't work exactly? For me it did...
Zitat von Sake Ceph :
To bad, that doesn't work :(
Op 09-11-2023 09:07 CET schreef Sake Ceph :
Hi,
Well to get promtail working with Loki, you need to setup a
password in Grafana.
But promtail wasn't working with the 17.2.6 release, the URL was
To bad, that doesn't work :(
> Op 09-11-2023 09:07 CET schreef Sake Ceph :
>
>
> Hi,
>
> Well to get promtail working with Loki, you need to setup a password in
> Grafana.
> But promtail wasn't working with the 17.2.6 release, the URL was set to
> containers.local. So I stopped using it, bu
Hi,
Well to get promtail working with Loki, you need to setup a password in
Grafana.
But promtail wasn't working with the 17.2.6 release, the URL was set to
containers.local. So I stopped using it, but forgot to click on save in KeePass
:(
I didn't configure anything special in Grafana, the
Hi,
you mean you forgot your password? You can remove the service with
'ceph orch rm grafana', then re-apply your grafana.yaml containing the
initial password. Note that this would remove all of the grafana
configs or custom dashboards etc., you would have to reconfigure them.
So before do
On Tue, Aug 8, 2023 at 1:18 AM Zhang Bao wrote:
>
> Hi, thanks for your help.
>
> I am using ceph Pacific 16.2.7.
>
> Before my Ceph stuck at `ceph fs status fsname`, one of my cephfs became
> readonly.
Probably the ceph-mgr is stuck (the "volumes" plugin) somehow talking
to the read-only CephFS
On Mon, Aug 7, 2023 at 6:12 AM Zhang Bao wrote:
>
> Hi,
>
> I have a ceph stucked at `ceph --verbose stats fs fsname`. And in the
> monitor log, I can found something like `audit [DBG] from='client.431973 -'
> entity='client.admin' cmd=[{"prefix": "fs status", "fs": "fsname",
> "target": ["mon-mg
I created a tracker issue, maybe that will get some attention:
https://tracker.ceph.com/issues/61861
Zitat von Michel Jouvin :
Hi Eugen,
Thank you very much for these detailed tests that match what I
observed and reported earlier. I'm happy to see that we have the
same understanding of ho
Hi,
adding the dev mailing list, hopefully someone there can chime in. But
apparently the LRC code hasn't been maintained for a few years
(https://github.com/ceph/ceph/tree/main/src/erasure-code/lrc). Let's
see...
Zitat von Michel Jouvin :
Hi Eugen,
Thank you very much for these detaile
Hi Eugen,
Thank you very much for these detailed tests that match what I observed
and reported earlier. I'm happy to see that we have the same
understanding of how it should work (based on the documentation). Is
there any other way that this list to enter in contact with the plugin
developers
Hi, I have a real hardware cluster for testing available now. I'm not
sure whether I'm completely misunderstanding how it's supposed to work
or if it's a bug in the LRC plugin.
This cluster has 18 HDD nodes available across 3 rooms (or DCs), I
intend to use 15 nodes to be able to recover if o
Hi,
I realize that the crushmap I attached to one of my email, probably
required to understand the discussion here, has been stripped down by
mailman. To avoid poluting the thread with a long output, I put it on at
https://box.in2p3.fr/index.php/s/J4fcm7orfNE87CX. Download it if you are
inte
Hi Patrick,
The disaster recovery process with cephfs-data-scan tool didn't fix our MDS
issue. It still kept crashing. I've uploaded a detailed MDS log with below ID.
The restore procedure below didn't get it working either. Should I set
mds_go_bad_corrupt_dentry to false alongside with
mds_ab
Hi Patrick,
Thanks for the instructions. We started the MDS recovery scan with below cmds
following the link below. The first bit of scan extens has finished and we're
waiting on scan inodes. Probably we shouldn't interrupt the process. Once this
procedure failed, I'll follow your steps and let
Hello Justin,
Please do:
ceph config set mds debug_mds 20
ceph config set mds debug_ms 1
Then wait for a crash. Please upload the log.
To restore your file system:
ceph config set mds mds_abort_on_newly_corrupt_dentry false
Let the MDS purge the strays and then try:
ceph config set mds mds_a
Hi Patrick,
Sorry for keeping bothering you but I found that MDS service kept crashing even
cluster shows MDS is up. I attached another log of MDS server - eowyn at below.
Look forward to hearing more insights. Thanks a lot.
https://drive.google.com/file/d/1nD_Ks7fNGQp0GE5Q_x8M57HldYurPhuN/view
Sorry Patrick, last email was restricted as attachment size. I attached a link
for you to download the log. Thanks.
https://drive.google.com/drive/folders/1bV_X7vyma_-gTfLrPnEV27QzsdmgyK4g?usp=sharing
Justin Li
Senior Technical Officer
School of Information Technology
Faculty of Science, Enginee
Thanks Patrick. We're making progress! After issuing below cmd (ceph config)
you gave me, ceph cluster health shows HEALTH_WARN and mds is back up. However,
cephfs can't be mounted showing below error. Ceph mgr portal also show 500
internal error when I try to browse the cephfs folder. I'll be u
Hello Justin,
On Tue, May 23, 2023 at 4:55 PM Justin Li wrote:
>
> Dear All,
>
> After a unsuccessful upgrade to pacific, MDS were offline and could not get
> back on. Checked the MDS log and found below. See cluster info from below as
> well. Appreciate it if anyone can point me to the right d
Thanks for replying, Greg. I'll give you a detailed sequence I did on the
upgrade at below.
Step 1: upgrade ceph mgr and Monitor --- reboot. Then mgr and mon are all up
running.
Step 2: upgrade one OSD node --- reboot and OSDs are all up.
Step 3: upgrade a second OSD node named OSD-node2. I did
On Tue, May 23, 2023 at 1:55 PM Justin Li wrote:
>
> Dear All,
>
> After a unsuccessful upgrade to pacific, MDS were offline and could not get
> back on. Checked the MDS log and found below. See cluster info from below as
> well. Appreciate it if anyone can point me to the right direction. Thank
Hi Eugen,
My LRC pool is also somewhat experimental so nothing really urgent. If you
manage to do some tests that help me to understand the problem I remain
interested. I propose to keep this thread for that.
Zitat, I shared my crush map in the email you answered if the attachment
was not su
Hi, I don’t have a good explanation for this yet, but I’ll soon get
the opportunity to play around with a decommissioned cluster. I’ll try
to get a better understanding of the LRC plugin, but it might take
some time, especially since my vacation is coming up. :-)
I have some thoughts about th
Hi,
I've been following this thread with interest as it seems like a unique use
case to expand my knowledge. I don't use LRC or anything outside basic
erasure coding.
What is your current crush steps rule? I know you made changes since your
first post and had some thoughts I wanted to share, but
Hi Eugen,
Yes, sure, no problem to share it. I attach it to this email (as it may
clutter the discussion if inline).
If somebody on the list has some clue on the LRC plugin, I'm still
interested by understand what I'm doing wrong!
Cheers,
Michel
Le 04/05/2023 à 15:07, Eugen Block a écrit
Subject: [ceph-users] Re: Help needed to configure erasure coding LRC plugin
Hi,
I don't think you've shared your osd tree yet, could you do that?
Apparently nobody else but us reads this thread or nobody reading this
uses the LRC plugin. ;-)
Thanks,
Eugen
Zitat von Michel Jouvin :
> Hi,
Hi,
I don't think you've shared your osd tree yet, could you do that?
Apparently nobody else but us reads this thread or nobody reading this
uses the LRC plugin. ;-)
Thanks,
Eugen
Zitat von Michel Jouvin :
Hi,
I had to restart one of my OSD server today and the problem showed
up again
Hi,
I had to restart one of my OSD server today and the problem showed up
again. This time I managed to capture "ceph health detail" output
showing the problem with the 2 PGs:
[WRN] PG_AVAILABILITY: Reduced data availability: 2 pgs inactive, 2 pgs down
pg 56.1 is down, acting
[208,65,73,
1 - 100 of 165 matches
Mail list logo