Hi Venky,
Yep that's what I figured out too - but it also must have triggered some
underlying issue where the mds goes into some state where each of these
updatedb runs accumulate more and more of that constant write io to the
metadata pool which never settles. Feels like it was constantly flushin
Hi Olli,
On Tue, Jul 2, 2024 at 7:51 PM Olli Rajala wrote:
>
> Hi - mostly as a note to future me and if anyone else looking for the same
> issue...
>
> I finally solved this a couple of months ago. No idea what is wrong with
> Ceph but the root cause that was triggering this MDS issue was that I
This was common in the NFS days, and some Linux distribution deliberately slew
the execution time. find over an NFS mount was a sure-fire way to horque the
server. (e.g. Convex C1)
IMHO since the tool relies on a static index it isn't very useful, and I
routinely remove any variant from my sys
Hi - mostly as a note to future me and if anyone else looking for the same
issue...
I finally solved this a couple of months ago. No idea what is wrong with
Ceph but the root cause that was triggering this MDS issue was that I had
several workstations and a couple servers where the updatedb of "lo
Hi,
One thing I now noticed in the mds logs is that there's a ton of entries
like this:
2022-12-11T18:20:49.321+0200 7fdd0edde700 20 mds.0.cache projecting to
[d345,d346] n(v1638 rc2022-12-11T18:20:49.317400+0200 b787972591
694=484+210)
2022-12-11T18:20:49.321+0200 7fdd0edde700 20 mds.0.cache
Hi,
I'm still totally lost with this issue. And now lately I've had a couple of
incidents where the write bw has suddenly jumped to even crazier levels.
See the graph here:
https://gist.github.com/olliRJL/3e97e15a37e8e801a785a1bd5358120d
The points where it drops to something manageable again are
On Fri, Nov 11, 2022 at 3:06 AM Olli Rajala wrote:
>
> Hi Venky,
>
> I have indeed observed the output of the different sections of perf dump like
> so:
> watch -n 1 ceph tell mds.`hostname` perf dump objecter
> watch -n 1 ceph tell mds.`hostname` perf dump mds_cache
> ...etc...
>
> ...but withou
Hi Venky,
I have indeed observed the output of the different sections of perf dump
like so:
watch -n 1 ceph tell mds.`hostname` perf dump objecter
watch -n 1 ceph tell mds.`hostname` perf dump mds_cache
...etc...
...but without any proper understanding of what is a normal rate for some
number to
Hi Olli,
On Mon, Oct 17, 2022 at 1:08 PM Olli Rajala wrote:
>
> Hi Patrick,
>
> With "objecter_ops" did you mean "ceph tell mds.pve-core-1 ops" and/or
> "ceph tell mds.pve-core-1 objecter_requests"? Both these show very few
> requests/ops - many times just returning empty lists. I'm pretty sure t
Hi Milind,
Here's are the output of top and a pstack backtrace:
https://gist.github.com/olliRJL/5f483c6bc4ad50178c8c9871370b26d3
https://gist.github.com/olliRJL/b83a743eca098c05d244e5c1def9046c
I uploaded the debug log using ceph-post-file - hope someone can access
that :)
ceph-post-file: 30f9b38
maybe,
- use the top program to look at a threaded listing of the ceph-mds
process and see which thread(s) are consuming the most cpu
- use gstack to attach to the ceph-mds process and dump the backtrace
into a file; we can then map the thread with highest cpu consumption to the
gst
I might have spoken too soon :(
Now about 60h after dropping the caches the write bandwidth has gone up
linearly from those initial hundreds of kB/s to now nearly 10MB/s.
I don't think this could be caused by the cache just filling up again
either. After dropping the cache I tested if filling up
Oh Lordy,
Seems like I finally got this resolved. And all it needed in the end was to
drop the mds caches with:
ceph tell mds.`hostname` cache drop
The funny thing is that whatever the issue with the cache was it had
persisted through several Ceph upgrades and node reboots. It's been a live
produ
I tried my luck and upgraded to 17.2.4 but unfortunately that didn't make
any difference here either.
I also looked more again at all kinds of client op and request stats and
wotnot which only made me even more certain that this io is not caused by
any clients.
What internal mds operation or mech
Hi Patrick,
With "objecter_ops" did you mean "ceph tell mds.pve-core-1 ops" and/or
"ceph tell mds.pve-core-1 objecter_requests"? Both these show very few
requests/ops - many times just returning empty lists. I'm pretty sure that
this I/O isn't generated by any clients - I've earlier tried to isola
Hello Olli,
On Thu, Oct 13, 2022 at 5:01 AM Olli Rajala wrote:
>
> Hi,
>
> I'm seeing constant 25-50MB/s writes to the metadata pool even when all
> clients and the cluster is idling and in clean state. This surely can't be
> normal?
>
> There's no apparent issues with the performance of the clus
16 matches
Mail list logo