Hello Peter,
your irony is perfect, it is worth to notice.
Meaning of my previous post was, that CEPH cluster didn't fulfill
my needs and, although I had set mClock profile to
"high_client_ops" (because I have a plenty of time to rebalancing
and scrubbing), my clients went to problems.
And there
Hi,
it's really as easy as it sounds (fresh test cluster on 18.2.1 without
any pools yet):
ceph:~ # ceph fs volume create cephfs
(wait a minute or two)
ceph:~ # ceph fs status
cephfs - 0 clients
==
RANK STATE MDS ACTIVITY DNSINOS
DIRS CAPS
0
Hello Jan,
> Meaning of my previous post was, that CEPH cluster didn't fulfill
> my needs and, although I had set mClock profile to
> "high_client_ops" (because I have a plenty of time to rebalancing
> and scrubbing), my clients went to problems.
>
As far as the question around mClock is concern
It's reasonable enough.
actually, I expected the client to have just? thousands of
"PG-to-OSDs" mappings.
Nevertheless, it’s so heavy that the client calculates location on
demand, right?
if the client with the outdated map sends a request to the wrong OSD,
then does the OSD handle it somehow thro
Le 25/01/2024 à 08:42:19+, Eugen Block a écrit
> Hi,
>
> it's really as easy as it sounds (fresh test cluster on 18.2.1 without any
> pools yet):
>
> ceph:~ # ceph fs volume create cephfs
Yes...I already try that with the label and works fine.
But I prefer to use «my» pools. Because I have
Albert,
Never used EC for (root) data pool.
Le jeu. 25 janv. 2024 à 12:08, Albert Shih a écrit :
> Le 25/01/2024 à 08:42:19+, Eugen Block a écrit
> > Hi,
> >
> > it's really as easy as it sounds (fresh test cluster on 18.2.1 without
> any
> > pools yet):
> >
> > ceph:~ # ceph fs volume creat
Did you set the ec-overwrites flag for the pool as mentioned in the docs?
https://docs.ceph.com/en/latest/cephfs/createfs/#using-erasure-coded-pools-with-cephfs
If you plan to use pre-created pools anyway then the slightly more
manual method is the way to go.
You can set the pg_num (and pgp_nu
I'm not sure if using EC as default data pool for cephfs is still
discouraged as stated in the output when attempting to do that, the
docs don't mention that (at least not in the link I sent in the last
mail):
ceph:~ # ceph fs new cephfs cephfs_metadata cephfs_data
Error EINVAL: pool 'cephf
Hello Jos.
I check the diff and notice the difference:
https://github.com/ceph/ceph/pull/52127/files
Thank you for the guide link and for the fix.
Have a great day.
Regards.
23 Oca 2024 Sal 11:07 tarihinde Jos Collin şunu yazdı:
> This fix is in the mds.
> I think you need to read
> https:/
Hello Eugen.
I read all of your MDS related topics and thank you so much for your effort
on this.
There is not much information and I couldn't find a MDS tuning guide at
all. It seems that you are the correct person to discuss mds debugging and
tuning.
Do you have any documents or may I learn w
Den tors 25 jan. 2024 kl 11:57 skrev Henry lol :
>
> It's reasonable enough.
> actually, I expected the client to have just? thousands of
> "PG-to-OSDs" mappings.
Yes, but filename to PG is done with a pseudorandom algo.
> Nevertheless, it’s so heavy that the client calculates location on
> deman
Hello Sridhar,
Dne Čt, led 25, 2024 at 09:53:26 CET napsal(a) Sridhar Seshasayee:
> Hello Jan,
>
> Meaning of my previous post was, that CEPH cluster didn't fulfill
> my needs and, although I had set mClock profile to
> "high_client_ops" (because I have a plenty of time to rebalancing
> and scrub
In case the root is EC, it is likely that is not possible to apply the
disaster recovery procedure, (no xattr layout/parent on the data pool).
Cordialement,
*David CASIER*
Le jeu.
Oh right, I forgot about that, good point! But if that is (still) true
then this should definitely be in the docs as a warning for EC pools
in cephfs!
Zitat von "David C." :
In case the root is EC, it is likely that is not possible to apply the
disaster recovery procedure, (no xattr layout/
We are a lot impacted by this issue with MGR in Pacific.
This has to be fixed.
As someone suggested in the issue tracker, we limited the memory usage
of the MGR in the systemd unit (MemoryLimit=16G) in order to kill the
MGR before it consumes all the memory of the server and impacts other
serv
There is no definitive answer wrt mds tuning. As it is everywhere
mentioned, it's about finding the right setup for your specific
workload. If you can synthesize your workload (maybe scale down a bit)
try optimizing it in a test cluster without interrupting your
developers too much.
But wha
It would be a pleasure to complete the documentation but we would need to
test or have someone confirm what I have assumed.
Concerning the warning, I think we should not talk about the discovery
procedure.
While the discovery procedure has already saved some entities, it has also
put entities at r
After upgrading to 17.2.7 our load balancers can't check the status of the
manager nodes for the dashboard. After some troubleshooting I noticed only TLS
1.3 is availalbe for the dashboard.
Looking at the source (quincy), TLS config got changed from 1.2 to 1.3.
Searching in the tracker I found
Hi,
I'll re-open the PR and will merge it to Quincy. Btw i want to know if the
load balancers will be supporting tls 1.3 in future. Because we were
planning to completely drop the tls1.2 support from dashboard because of
security reasons. (But so far we are planning to keep it as it is atleast
for
Hi Nizamudeen,
Thank you for your quick response!
The load balancers support TLS 1.3, but the administrators need to reconfigure
the healthchecks. The only problem, it's a global change for all load
balancers... So not something they change overnight and need to plan/test for.
Best regards,
Ah okay, thanks for the clarification.
In that case, probably we'll need to keep this 1.2 fix for squid i guess.
I'll check and will update as necessary.
On Thu, Jan 25, 2024, 20:12 Sake Ceph wrote:
> Hi Nizamudeen,
>
> Thank you for your quick response!
>
> The load balancers support TLS 1.3,
I would say drop it for squid release or if you keep it in squid, but going to
disable it in a minor release later, please make a note in the release notes if
the option is being removed.
Just my 2 cents :)
Best regards,
Sake
___
ceph-users mailing l
Understood, thank you.
On Thu, Jan 25, 2024, 20:24 Sake Ceph wrote:
> I would say drop it for squid release or if you keep it in squid, but
> going to disable it in a minor release later, please make a note in the
> release notes if the option is being removed.
> Just my 2 cents :)
>
> Best rega
I will try my best to explain my situation.
I don't have a separate mds server. I have 5 identical nodes, 3 of them
mons, and I use the other 2 as active and standby mds. (currently I have
left overs from max_mds 4)
root@ud-01:~# ceph -s
cluster:
id: e42fd4b0-313b-11ee-9a00-31da71873773
Hi Ceph Users
I am encountering a problem with the RGW Admin Ops Socket.
I am setting up the socket as follows:
rgw_enable_ops_log = true
rgw_ops_log_socket_path = /tmp/ops/rgw-ops.socket
rgw_ops_log_data_backlog = 16Mi
Seems like the socket fills up over time and it doesn't seem to get
flush
Gotcha !
I've got the point, after restarting the CA certificate creation with :
ceph restful create-self-signed-cert
I get this error :
Module 'cephadm' has failed: Expected 4 octets in
'fd30:::0:1101:2:0:501'
*Ouch 4 octets = IP4 address expected... some nice code in perspective.*
I
Hi Marc,
The ops log code is designed to discard data if the socket is
flow-controlled, iirc. Maybe we just need to handle the signal.
Of course, you should have something consuming data on the socket, but it's
still a problem if radosgw exits unexpectedly.
Matt
On Thu, Jan 25, 2024 at 10:08 A
It would be cool, actually, to have the metrics working in 18.2.2, for IPv6
only
Otherwise, everything works fine on my side.
Cordialement,
*David CASIER*
Le jeu. 25 janv. 2024 à
Hi
I am using a unix socket client to connect with it and read the data
from it.
Do I need to do anything like signal the socket that this data has been
read? Or am I not reading fast enough and data is backing up?
What I am also noticing that at some point (probably after something
with the
I understand that your MDS shows a high CPU usage, but other than that
what is your performance issue? Do users complain? Do some operations
take longer than expected? Are OSDs saturated during those phases?
Because the cache pressure messages don’t necessarily mean that users
will notice.
Oh! That's why data imbalance occurs in Ceph.
I totally misunderstood Ceph's placement algorithm until just now.
Thank you a lot for your detailed explanation :)
Sincerely,
2024년 1월 25일 (목) 오후 9:32, Janne Johansson 님이 작성:
>
> Den tors 25 jan. 2024 kl 11:57 skrev Henry lol :
> >
> > It's reasonab
On 1/25/24 13:32, Janne Johansson wrote:
It doesn't take OSD usage into consideration except at creation time
or OSD in/out/reweighing (or manual displacements with upmap and so
forth), so this is why "ceph df" will tell you a pool has X free
space, where X is "smallest free space on the OSDs on
More and more I am annoyed with the 'dumb' design decisions of redhat. Just now
I have an issue on an 'air gapped' vm that I am unable to start a docker/podman
container because it tries to contact the repository to update the image and
instead of using the on disk image it just fails. (Not to m
Every user has a 1x subvolume and I only have 1 pool.
At the beginning we were using each subvolume for ldap home directory +
user data.
When a user logins any docker on any host, it was using the cluster for
home and the for user related data, we was have second directory in the
same subvolume.
Ti
Den tors 25 jan. 2024 kl 17:47 skrev Robert Sander
:
> > forth), so this is why "ceph df" will tell you a pool has X free
> > space, where X is "smallest free space on the OSDs on which this pool
> > lies, times the number of OSDs". Given the pseudorandom placement of
> > objects to PGs, there is n
On 25.01.2024 18:19, Marc wrote:
More and more I am annoyed with the 'dumb' design decisions of redhat.
Just now I have an issue on an 'air gapped' vm that I am unable to
start a docker/podman container because it tries to contact the
repository to update the image and instead of using the on d
This is client side metrics from a "failing to respond to cache pressure"
warned client.
root@datagen-27:/sys/kernel/debug/ceph/e42fd4b0-313b-11ee-9a00-31da71873773.client1282187#
cat bdi/stats
BdiWriteback:0 kB
BdiReclaimable: 0 kB
BdiDirtyThresh: 0 kB
Di
For the OP - IBM appears to have some relevant info in their CEPH docs:
https://www.ibm.com/docs/en/storage-ceph/5?topic=cluster-performing-disconnected-installation
Questions:
Is it possible to reset “container_image” after the cluster has been deployed?
sudo ceph config dump |grep conta
Hi,
I got those metrics back after setting:
reef01:~ # ceph config set mgr mgr/prometheus/exclude_perf_counters false
reef01:~ # curl http://localhost:9283/metrics | grep ceph_osd_op | head
% Total% Received % Xferd Average Speed TimeTime
Time Current
Yeah, it's mentioned in the upgrade docs [2]:
Monitoring & Alerting
Ceph-exporter: Now the performance metrics for Ceph daemons
are exported by ceph-exporter, which deploys on each daemon rather
than using prometheus exporter. This will reduce performance
bottlenecks.
[2] https:/
Ah, there they are (different port):
reef01:~ # curl http://localhost:9926/metrics | grep ceph_osd_op | head
% Total% Received % Xferd Average Speed TimeTime
Time Current
Dload Upload Total SpentLeft Speed
100 124k 100 124k0
>
>>> forth), so this is why "ceph df" will tell you a pool has X free
>>> space, where X is "smallest free space on the OSDs on which this pool
>>> lies, times the number of OSDs".
To be even more precise, this depends on the failure domain. With the typical
"rack" failure domain, say you u
Hello team,
I have a cluster in production composed by 3 osds servers with 20 disks
each deployed using ceph-ansibleand ubuntu OS , and the version is pacific
. These days is in WARN state caused by pgs which are not deep-scrubbed in
time . I tried to deep-scrubbed some pg manually but seems that
We had the same problem. It turned out that one disk was slowly dying. It
was easy to identify by the commands (in your case):
ceph pg dump | grep -F 6.78
ceph pg dump | grep -F 6.60
…
This command shows the OSDs of a PG in square brackets. If is there always
the same number, then you've found th
It seems that are different OSDs as shown here . how have you managed to
sort this out?
ceph pg dump | grep -F 6.78
dumped all
6.78 44268 0 0 00
1786796401180 0 10099 10099
active+clean 2024-01-26T03:51:26.781438+0200 1
45 matches
Mail list logo