Hi all,I need your help! Our FS is degraded.Health: mds.1 is damagedCeph tell mds.1 damage lsResolve_mds: gid 1 not in mds mapBest regards, Sake ___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
load balance folders over the active mds nodes. The load is currently around 500 iops and 50 MB traffic, or even lower. After the holidays I'm going to see what I can achieve with manual pinning directories to mds ranks. Best regards, Sake On 31 Dec 2023 09:01, David Yang wrote:I hope this mes
Hi! I would like to build a simple PowerShell script which monitors the quotas set on certain directories. Is this possible via the Restful API? Some extra information:Ceph version 17.2.6Deployed via Cephadm and having mgr nodes with an accessable Rest API. Folder structure:/ Folder 1/ Folder 2/
Not sure why my message shows up as an html attachment..Best regards, SakeOn Jun 14, 2023 08:53, Sake wrote:Hi! I would like to build a simple PowerShell script which monitors the quotas set on certain directories. Is this possible via the Restful API? Some extra information:Ceph version
Which version do you use? Quincy has currently incorrect values for it's new IOPS scheduler, this will be fixed in the next release (hopefully soon). But there are workaround, please check the mailing list about this, I'm in a hurry so can't point directly to the correct post. Best regards, SakeOn
option will be gone, but the recovery speed will be fixed :)Best regards, Sake ___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
ster is healthy (via Rest-API) only health_ok is good
7. Done
For upgrade the OS we have something similar, but exiting maintenance mode is
broken (with 17.2.7) :(
I need to check the tracker for similar issues and if I can't find anything, I
will create a ticket.
Kind regards,
Sake
> O
I needed to do some cleaning before I could share this :)
Maybe you or someone else can use it.
Kind regards,
Sake
> Op 14-06-2024 03:53 CEST schreef Michael Worsham
> :
>
>
> I'd love to see what your playbook(s) looks like for doing t
Edit: someone made some changes which broke some tasks when selecting the
cephadm host to use. Just keep in mind it's an example
> Op 14-06-2024 10:28 CEST schreef Sake Ceph :
>
>
> I needed to do some cleaning before I could share this :)
> Maybe you or someone else ca
Not yet released. Every x.1.z release is release candidate. Always wait for the
x.2.z release (in this case 19.2.0) and the official release notes on
docs.ceph.com :-)
> Op 21-07-2024 18:32 CEST schreef Nicola Mori :
>
>
> Dear Ceph users,
>
> on quay.io I see available images for 19.1.0. Y
What I read on the Slack channel is that the publication job got stuck late in
the day and the restart finished late. I'll guess they announce today the new
version.
Kind regards,
Sake
> Op 24-07-2024 13:05 CEST schreef Alfredo Rezinovsky :
>
>
> Ceph dashboard offers
screenshots, not sure how to share.
Kind regards,
Sake
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
for the clients :(
Second it happens on our Production environment and I won't be making any
changes there for a test.
Will try to replicate in our staging environment, but that one has a lot less
load on it.
Kind regards,
Sake
> Op 31-08-2024 09:15 CEST schreef Alexander Pat
nderlying host is running on RHEL 8. Upgrade to RHEL 9 is planned, but hit
some issues with automatically upgrading hosts.
Kind regards,
Sake
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Ow it got worse after the upgrade to Reef (was running Quincy). With Quincy the
memory usage was also a lot of times around 95% and some swap usage, but never
exceeding both to the point of crashing.
Kind regards,
Sake
> Op 31-08-2024 09:15 CEST schreef Alexander Patrakov :
>
>
I was talking about the hosts where the MDS containers are running on. The
clients are all RHEL 9.
Kind regards,
Sake
> Op 31-08-2024 08:34 CEST schreef Alexander Patrakov :
>
>
> Hello Sake,
>
> The combination of two active MDSs and RHEL8 does ring a bell, and I
>
.
But why keeps the MDS al this information in its memory? If it isn't accessed
for more than 20 hours, it should release it in my opinion (even a lot earlier,
like after an hour).
Kind regards,
Sake
> Op 02-09-2024 09:33 CEST schreef Eugen Block :
>
>
> Can you tell if the
But the client which is doing the rsync, doesn't hold any caps after the rsync.
Cephfs-top shows 0 caps. Even a system reboot of the client doesn't make a
change.
Kind regards,
Sake
> Op 03-09-2024 04:01 CEST schreef Alexander Patrakov :
>
>
> MDS cannot release an
After the upgrade from 17.2.7 to 18.2.4 a lot of graphs are empty. For example
the Osd latency under OSD device details or the Osd Overview has a lot of No
data messages.
I deployed ceph-exporter on all hosts, am I missing something? Did even a
redeploy of prometheus.
Kind regards,
Sake
Hi Frank,
That option is set to false (I didnt enabled security for the monitoring
stack).
Kind regards,
Sake
> Op 04-09-2024 20:17 CEST schreef Frank de Bot (lists) :
>
>
> Hi Sake,
>
> Do you have the config mgr/cephadm/secure_monitoring_stack to true? If
> so, t
mention missing a configuration, but I can't find any specific
configuration options in the docs;
https://docs.ceph.com/en/reef/mgr/prometheus/ and
https://docs.ceph.com/en/reef/cephadm/services/monitoring/
Kind regards,
Sake
> Op 05-09-2024 10:00 CEST schreef Pierre Riteau :
>
>
>
That is working, but I noticed the firewall isn't opened for that port.
Shouldn't cephadm manage this, like it does for all the other ports?
Kind regards,
Sake
> Op 06-09-2024 16:14 CEST schreef Björn Lässig :
>
>
> Am Mittwoch, dem 04.09.2024 um 20:01 +0200 schrie
After opening port 9926 manually, the Grafana dashboards show the data.
So is this a bug?
Kind regards,
Sake
> Op 06-09-2024 17:39 CEST schreef Sake Ceph :
>
>
> That is working, but I noticed the firewall isn't opened for that port.
> Shouldn't cephadm manage this,
DEBUG /bin/firewall-cmd: stdout success
When looking in the deploy configuration, Grafana shows 'ports': [3000], but
ceph-exporter shows 'ports': []
Kind regards,
Sake
> Op 09-09-2024 10:50 CEST schreef Eugen Block :
>
>
> Sorry, clicked "send" too
We're using default :) I'm talking about the deployment configuration which is
shown in the log files when deploying grafana/ceph-exporter.
I got the same configuration as you for ceph-exporter (the default) when
exporting the service.
Kind regards,
Sake
> Op 09-09-2024 12:04
We're using RHEL 8 and 9 and on both the port was not open.
It's just strange it isn't working for ceph-exporter but just fine for
everything else.
Kind regards,
Sake
> Op 09-09-2024 14:03 CEST schreef Eugen Block :
>
>
> Those two daemons are handled diff
;
> (http://quay.ceph.io/ceph-ci/ceph@sha256:02ce7c1aa356b524041713a3603da8445c4fe00ed30cb1c1f91532926db20d3c')],
> 'rank': None, 'rank_generation': None,
>
>
> I opened the following tracker to fix the issue:
> https://tracker.ceph.com/issues/6797
regards,
Sake
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
credentials error
on environment where I tried to use Grafana with Loki in the past (with 17.2.6
of Ceph/cephadm). I changed the password in the past within Grafana, but how
can I overwrite this now? Or is there a way to cleanup all Grafana files?
Best regards,
Sake
afana, the default dashboards are
great! So a wipe isn't a problem, it's what I want.
Best regards,
Sake
> Op 09-11-2023 08:19 CET schreef Eugen Block :
>
>
> Hi,
> you mean you forgot your password? You can remove the service with
> 'ceph orch rm grafana
To bad, that doesn't work :(
> Op 09-11-2023 09:07 CET schreef Sake Ceph :
>
>
> Hi,
>
> Well to get promtail working with Loki, you need to setup a password in
> Grafana.
> But promtail wasn't working with the 17.2.6 release, the URL was set to
> co
..
Found those files with 'find / -name *grafana*'.
> Op 09-11-2023 09:53 CET schreef Eugen Block :
>
>
> What doesn't work exactly? For me it did...
>
> Zitat von Sake Ceph :
>
> > To bad, that doesn't work :(
> >> Op 09-11-2023 09:07 CET s
I tried everything at this point, even waited a hour, still no luck. Got it 1
time accidentally working, but with a placeholder for a password. Tried with
correct password, nothing and trying again with the placeholder didn't work
anymore.
So I thought to switch the manager, maybe something is
I believe they are working on it or want to work on it to revert from a
stretched cluster, because of the reason you mention: if the other datacenter
is totally burned down, you maybe want for the time being switch to one
datacenter setup.
Best regards,
Sake
> Op 09-11-2023 11:18 CET schr
gt; stack trace the complicated password doesn't seem to be applied
> > (don't know why yet). But since it's an "initial" password you can
> > choose something simple like "admin", and during the first login you
> > are asked to change it
Don't forget with stretch mode, osds only communicate with mons in the same DC
and the tiebreaker only communicate with the other mons (to prevent split brain
scenarios).
Little late response, but I wanted you to know this :)
___
ceph-users mailing lis
583} state up:resolve seq 571
join_fscid=2 addr
[v2:10.233.127.18:6800/3627858294,v1:10.233.127.18:6801/3627858294] compat
{c=[1],r=[1],i=[7ff]}]
Best regards,
Sake
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to cep
ph fs reset atlassian-prod --yes-i-really-mean-it
This brought the fs back online and the servers/applications are working again.
Question: can I increase the max_mds and active standby_replay?
Will collect logs, maybe we can pinpoint the cause.
Best regards,
That wasn't really clear in the docs :(
> Op 21-12-2023 17:26 CET schreef Patrick Donnelly :
>
>
> On Thu, Dec 21, 2023 at 3:05 AM Sake Ceph wrote:
> >
> > Hi David
> >
> > Reducing max_mds didn't work. So I executed a fs reset:
> > c
d /app4 to rank 3?
I would like to load balance the subfolders of /app1 to 2 (or 3) MDS servers.
Best regards,
Sake
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
s and assign folder App2 to rank 1.
2. Is there already a feature request for pinning directories via the
dashboard? Again, I couldn't find a request.
3. I believe in the past you needed to remove the manual pins before an
upgrade, is this still the case?
Best regards,
Sake
> Op 22-12-2
es/11271/objects/41/8f82507a0737c611720ed224bcc8b7a24fda01
rm: cannot remove
'/mnt/shared_disk-app1/shared/data/repositories/11271/objects/41/8f82507a0737c611720ed224bcc8b7a24fda01':
Input/output error
Best regards,
Sake
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
After upgrading to 17.2.7 our load balancers can't check the status of the
manager nodes for the dashboard. After some troubleshooting I noticed only TLS
1.3 is availalbe for the dashboard.
Looking at the source (quincy), TLS config got changed from 1.2 to 1.3.
Searching in the tracker I found
gards,
Sake
> Op 25-01-2024 15:22 CET schreef Nizamudeen A :
>
>
> Hi,
>
> I'll re-open the PR and will merge it to Quincy. Btw i want to know if the
> load balancers will be supporting tls 1.3 in future. Because we were planning
> to completely drop the tls1.2
I would say drop it for squid release or if you keep it in squid, but going to
disable it in a minor release later, please make a note in the release notes if
the option is being removed.
Just my 2 cents :)
Best regards,
Sake
___
ceph-users mailing
Hi Matthew,
Cephadm doesn't cleanup old container images, at least with Quincy. After a
upgrade we run the following commands:
sudo podman system prune -a -f
sudo podman volume prune -f
But if someone has a better advice, please tell us.
Kind regards,
Sake
> Op 19-04-2024 10
Just a question: is it possible to block or disable all clients? Just to
prevent load on the system.
Kind regards,
Sake
> Op 22-04-2024 20:33 CEST schreef Erich Weiler :
>
>
> I also see this from 'ceph health detail':
>
> # ceph health detail
> HEALTH_W
100 GB of Ram! Damn that's a lot for a filesystem in my opinion, or am I wrong?
Kind regards,
Sake
> Op 22-04-2024 21:50 CEST schreef Erich Weiler :
>
>
> I was able to start another MDS daemon on another node that had 512GB
> RAM, and then the active MDS eventually
but I really need some
fixes of this release.
Kind regards,
Sake
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
I don't have access to Slack, but thank you for all your work! Fingers crossed
for a quick release.
Kind regards,
Sake
> Op 23-05-2024 16:20 CEST schreef Yuri Weinstein :
>
>
> We are still working on the last-minute fixes, see this for details
> https://ceph-storag
Hi
Isn't this just the limit of one HDD or the other HDD's for providing the data?
Don't forget, recovery will drop even more for the last few objects. At least I
noticed this when replacing a drive in my (little) cluster.
Kind regards,
Sake
> Op 26-05-2024 09:36 CES
u use cephfs with reef?
Kind regards,
Sake
> Op 04-06-2024 04:04 CEST schreef Xiubo Li :
>
>
> Hi Nicolas,
>
> This is a known issue and Venky is working on it, please see
> https://tracker.ceph.com/issues/63259.
>
> Thanks
> - Xiubo
>
> On 6/3
Hi Xiubo
Thank you for the explanation! This won't be a issue for us, but made me think
twice :)
Kind regards,
Sake
> Op 04-06-2024 12:30 CEST schreef Xiubo Li :
>
>
> On 6/4/24 15:20, Sake Ceph wrote:
> > Hi,
> >
> > A little break into this thread, b
y has some advice I would gladly hear about it!
KIND regards,
Sake
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
arting the containers. The start up fails because it can't pull
the container image because authentication is required (our instance is offline
and we're using a local image registry with authentication).
Kind regards,
Sake
> Op 04-06-2024 14:40 CEST schreef Robert Sander :
&g
ossible to achieve this at the moment
(automatically or manually)?
Thanks,
Sake
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Last friday I upgrade the Ceph cluster from 17.2.3 to 17.2.5 with "ceph orch
upgrade start --image
localcontainerregistry.local.com:5000/ceph/ceph:v17.2.5-20221017". After
sometime, an hour?, I've got a health warning: CEPHADM_REFRESH_FAILED: failed
to probe daemons or devices. I'm using only C
October 24, 2022 5:50:20 PM
To: Sake Paulusma
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Failed to probe daemons or devices
Hello Sake,
Could you share the output of vgs / lvs commands?
Also, I would suggest you to open a tracker [1]
Thanks!
[1]
https://tracker.ceph.com/projects/cep
I fixed the issue by removing the blanco/not labeled disk. It is still a bug,
so hopefully it can get fixed for someone else who can't easily remove a disk :)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le
Hi
I noticed that cephadm would update the grafana-frontend-api-url with version
17.2.3, but it looks broken with version 17.2.5. It isn't a big deal to update
the url by myself, but it's quite irritating to do if in the past it corrected
itself.
Best reg
n the docs for stretch mode
with the command "ceph mon set_location datacenter=". This only
results in the following error:
Error ENOENT: mon.oqsoel11437 does not exist
So how can I add/replace a monitor in a stretched cluster?
Best regards,
Sake
___
That isn't a great solution indeed, but I'll try the solution. Would this also
be necessary to replace the Tiebreaker?
From: Adam King
Sent: Friday, December 2, 2022 2:48:19 PM
To: Sake Paulusma
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] How to
The instructions work great, the monitor is added in the monmap now.
I asked about the Tiebreaker because there is a special command to replace the
current one. But this manual intervention is probably still needed to first set
the correct location. Will report back later when I replace the curr
Hello,
I configured a stretched cluster on two datacenters. It's working fine, except
this weekend the Raw Capicity exceeded 50% and the error
POOL_TARGET_SIZE_BYTES_OVERCOMMITED showed up.
The command "ceph df" is showing the correct cluster size, but "ceph osd pool
autoscale-status" is showi
The RATIO for cephfs.application-acc.data shouldn't be over 1.0, I believe this
triggered the error.
All weekend I was thinking about this issue, but couldn't find an option to
correct this.
But minutes after posting I found a blog about the autoscaler
(https://ceph.io/en/news/blog/2022/autosc
us, the cluster size
From: Gregory Farnum
Sent: Monday, February 13, 2023 5:32:18 PM
To: Sake Paulusma
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Health warning - POOL_TARGET_SIZE_BYTES_OVERCOMMITED
On Mon, Feb 13, 2023 at 4:16 AM Sake Paulusma wrote:
&
From: Sake Paulusma
Sent: Monday, February 13, 2023 6:52:45 PM
To: Gregory Farnum
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Health warning - POOL_TARGET_SIZE_BYTES_OVERCOMMITED
Hey Greg,
I'm just analyzing this issue and it isn't strange the total cluster size is
half the tot
We noticed extremely slow performance when remapping is necessary. We didn't do
anything special other than assigning the correct device_class (to ssd). When
checking ceph status, we notice the number of objects recovering is around
17-25 (with watch -n 1 -c ceph status).
How can we increase
Hi,
The config shows "mclock_scheduler" and I already switched to the
high_recovery_ops, this does increase the recovery ops, but only a little.
You mention there is a fix in 17.2.6+, but we're running on 17.2.6 (this
cluster is created on this version). Any more ideas?
Best regards
_
Just to add:
high_client_ops: around 8-13 objects per second
high_recovery_ops: around 17-25 objects per second
Both observed with "watch - n 1 - c ceph status"
Best regards
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email
Thanks for the input! Changing this value we indeed increased the recovery
speed from 20 object per second to 500!
Now something strange:
1. We needed to change the class for our drives manually to ssd.
2. The setting "osd_mclock_max_capacity_iops_ssd" was set to 0. With osd bench
descriped in t
Did an extra test with shutting down an osd host and force a recovery. Only
using the iops setting I got 500 objects a second, but using also the
bytes_per_usec setting, I got 1200 objects a second!
Maybe there should also be an investigation about this performance issue.
Best regards
isn't using the device class override value.
Best regards,
Sake
From: Sridhar Seshasayee
Sent: Wednesday, May 24, 2023 11:34:02 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: Slow recovery on Quincy
As someone in this thread noted, the cost related con
If I glance at the commits to the quincy branch, shouldn't the mentioned
configurations be included in 17.2.7?
The requested command output:
[ceph: root@mgrhost1 /]# ceph version
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy (stable)
[ceph: root@mgrhost1 /]# ceph config s
Thanks, will keep an eye out for this version. Will report back to this thread
about these options and the recovery time/number of objects per second for
recovery.
Again, thank you'll for the information and answers!
___
ceph-users mailing list -- ceph
keep a
failed deployment, provide just like Option 1 clear instructions how to clean
up the failed deployment.
With the above additions, I would prefer Option 1. Because there's almost
always a reason a deployment fails and I would like to investigate directly why
it happened.
Hi!
I noticed the same that the snapshot scheduler seemed to do nothing , but after
a manager fail over the creation of snapshots started to work (including the
retention rules)..
Best regards,
Sake
From: Lokendra Rathour
Sent: Monday, May 29, 2023 10:11:54
er bug
report?
And does someone know a workaround to set the correct URL for the time being?
Best regards,
Sake
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
large
(15GB/9GB); 0 inodes in use by clients, 0 stray files
=== Full health status ===
[WARN] MDS_CACHE_OVERSIZED: 1 MDSs report oversized cache
mds.atlassian-prod.mds4.qlvypn(mds.0): MDS cache is too large
(15GB/9GB); 0 inodes in use by clients, 0 stray files
Best regards,
Thank you Patrick for responding and fix the issue! Good to know the issue is
know and been worked on :-)
> Op 21-07-2023 15:59 CEST schreef Patrick Donnelly :
>
>
> Hello Sake,
>
> On Fri, Jul 21, 2023 at 3:43 AM Sake Ceph wrote:
> >
> > At 01:27 this morn
Hi all
We deployed successfully a stretched cluster and all is working fine. But is it
possible to assign the active MDS services in one DC and the standby-replay in
the other?
We're running 18.2.4, deployed via cephadm. Using 4 MDS servers with 2 active
MDS on pinnend ranks and 2 in standby-re
gt;
> You'll have to monitor each MDS state and restart any non-local active MDSs
> to reverse roles.
>
> Regards,
> Frédéric.
>
> - Le 29 Oct 24, à 14:06, Sake Ceph c...@paulusma.eu a écrit :
>
> > Hi all
> > We deployed successfully a stretched clus
We're looking for the multiple mds daemons to be active in zone A and
standby(-replay) in zone B.
This scenario would also benefit people who have more powerfull hardware in
zone A than zone B.
Kind regards,
Sake
> Op 31-10-2024 15:50 CET schreef Adam King :
>
>
>
I stumbled on this problem earlier, port 9926 isn't being opened. See also
thread "Grafana dashboards is missing data".
A tracker is already opened to fix the issue:
https://tracker.ceph.com/issues/67975
> Op 25-11-2024 13:44 CET schreef Kilian Ries :
>
>
> Prometheus metrics seem to be broke
84 matches
Mail list logo