On 11/23/23 11:25, zxcs wrote:
Thanks a ton, Xiubo!
it not disappear.
even we umount the ceph directory on these two old os node.
after dump ops flight , we can see some request, and the earliest complain “failed
to authpin, subtree is being exported"
And how to avoid this, would you please
Le 20/11/2023 à 09:24:41+, Frank Schilder a écrit
Hi,
Thanks everyone for your answer.
>
> we are using something similar for ceph-fs. For a backup system your setup
> can work, depending on how you back up. While HDD pools have poor IOP/s
> performance, they are very good for streaming
On 22-11-2023 15:54, Stefan Kooman wrote:
Hi,
In a IPv6 only deployment the ceph-exporter daemons are not listening on
IPv6 address(es). This can be fixed by editing the unit.run file of the
ceph-exporter by changing "--addrs=0.0.0.0" to "--addrs=::".
Is this configurable? So that cephadm de
Hi Frank,
Locally I had some test by using the copy2 and copy, but they all worked
well for me.
Could you write a reproducing script ?
Thanks
- Xiubo
On 11/10/23 22:53, Frank Schilder wrote:
It looks like the cap update request was dropped to the ground in MDS.
[...]
If you can reproduce i
I just raised one tracker to follow this:
https://tracker.ceph.com/issues/63510
Thanks
- Xiubo
On 11/10/23 22:53, Frank Schilder wrote:
It looks like the cap update request was dropped to the ground in MDS.
[...]
If you can reproduce it, then please provide the mds logs by setting:
[...]
I
Thanks a ton, Xiubo!
it not disappear.
even we umount the ceph directory on these two old os node.
after dump ops flight , we can see some request, and the earliest complain
“failed to authpin, subtree is being exported"
And how to avoid this, would you please help to shed some light here?
Th
Hi,
In a IPv6 only deployment the ceph-exporter daemons are not listening on
IPv6 address(es). This can be fixed by editing the unit.run file of the
ceph-exporter by changing "--addrs=0.0.0.0" to "--addrs=::".
Is this configurable? So that cephadm deploys ceph-exporter with proper
unit.run a
Hi
running Ceph Pacific 16.2.13.
we had full CephFS filesystem and after adding new HW we tried to start
it but our MDS daemons are pushed to be standby and are removed from MDS
map.
Filesystem was broken, so we repaired it with:
# ceph fs fail cephfs
# cephfs-journal-tool --rank=cephfs:0
Thanks for this. This looks similar to what we're observing. Although we
don't use the API apart from the usage by Ceph deployment itself - which I
guess still counts.
/Z
On Wed, 22 Nov 2023, 15:22 Adrien Georget,
wrote:
> Hi,
>
> This memory leak with ceph-mgr seems to be due to a change in Ce
Yes, we use docker, though we haven't had any issues because of it. I don't
think that docker itself can cause mgr memory leaks.
/Z
On Wed, 22 Nov 2023, 15:14 Eugen Block, wrote:
> One other difference is you use docker, right? We use podman, could it
> be some docker restriction?
>
> Zitat von
Hi,
This memory leak with ceph-mgr seems to be due to a change in Ceph 16.2.12.
Check this issue : https://tracker.ceph.com/issues/59580
We are also affected by this, with or without containerized services.
Cheers,
Adrien
Le 22/11/2023 à 14:14, Eugen Block a écrit :
One other difference is you
One other difference is you use docker, right? We use podman, could it
be some docker restriction?
Zitat von Zakhar Kirpichenko :
It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 384
GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of memory,
give or take, i
It's a 6-node cluster with 96 OSDs, not much I/O, mgr . Each node has 384
GB of RAM, each OSD has a memory target of 16 GB, about 100 GB of memory,
give or take, is available (mostly used by page cache) on each node during
normal operation. Nothing unusual there, tbh.
No unusual mgr modules or set
What does your hardware look like memory-wise? Just for comparison,
one customer cluster has 4,5 GB in use (middle-sized cluster for
openstack, 280 OSDs):
PID USER PR NIVIRTRESSHR S %CPU %MEM TIME+ COMMAND
6077 ceph 20 0 6357560 4,522g 22316 S 12,00 1,79
I've disabled the progress module entirely and will see how it goes.
Otherwise, mgr memory usage keeps increasing slowly, from past experience
it will stabilize at around 1.5-1.6 GB. Other than this event warning, it's
unclear what could have caused random memory ballooning.
/Z
On Wed, 22 Nov 202
There are some unhandled race conditions in the MDS cluster in rare
circumstances.
We had this issue with mimic and octopus and it went away after manually
pinning sub-dirs to MDS ranks; see
https://docs.ceph.com/en/nautilus/cephfs/multimds/?highlight=dir%20pin#manually-pinning-directory-trees-
On 11/22/23 16:02, zxcs wrote:
HI, Experts,
we are using cephfs with 16.2.* with multi active mds, and recently, we have
two nodes mount with ceph-fuse due to the old os system.
and one nodes run a python script with `glob.glob(path)`, and another client
doing `cp` operation on the same pa
Hi,
we've seen this a year ago in a Nautilus cluster with multi-active MDS
as well. It turned up only once within several years and we decided
not to look too closely at that time. How often do you see it? Is it
reproducable? In that case I'd recommend to create a tracker issue.
Regards,
I see these progress messages all the time, I don't think they cause
it, but I might be wrong. You can disable it just to rule that out.
Zitat von Zakhar Kirpichenko :
Unfortunately, I don't have a full stack trace because there's no crash
when the mgr gets oom-killed. There's just the mgr lo
Unfortunately, I don't have a full stack trace because there's no crash
when the mgr gets oom-killed. There's just the mgr log, which looks
completely normal until about 2-3 minutes before the oom-kill, when
tmalloc warnings show up.
I'm not sure that it's the same issue that is described in the t
Do you have the full stack trace? The pastebin only contains the
"tcmalloc: large alloc" messages (same as in the tracker issue). Maybe
comment in the tracker issue directly since Radek asked for someone
with a similar problem in a newer release.
Zitat von Zakhar Kirpichenko :
Thanks, Eug
Hello Eugen,
thanks for the validation. Actually I use plain http because I do not have
much time to look for a solution.
But i will check a new cert ASAP.
Christoph
Am Fr., 17. Nov. 2023 um 12:57 Uhr schrieb Eugen Block :
> I was able to reproduce the error with a self-signed elliptic curves
HI, Experts,
we are using cephfs with 16.2.* with multi active mds, and recently, we have
two nodes mount with ceph-fuse due to the old os system.
and one nodes run a python script with `glob.glob(path)`, and another client
doing `cp` operation on the same path.
then we see some log about
23 matches
Mail list logo