On 25 January 2018 at 04:53, Warren Wang wrote:
> The other thing I can think of is if you have OSDs locking up and getting
> corrupted, there is a severe XFS bug where the kernel will throw a NULL
> pointer dereference under heavy memory pressure. Again, it's due to memory
> issues, but you wi
7:54
> To: Blair Bethwaite
> Cc: ceph-users@lists.ceph.com
> Subject: Re: [ceph-users] OSD servers swapping despite having free memory
> capacity
>
> Forgot to mention another hint. If kswapd is constantly using CPU, and your
> sar -
> r ALL and sar -B stats look like it&
Forgot to mention another hint. If kswapd is constantly using CPU, and your sar
-r ALL and sar -B stats look like it's trashing, kswapd is probably busy
evicting things from memory in order to make a larger order allocation.
The other thing I can think of is if you have OSDs locking up and getti
+1 to Warren's advice on checking for memory fragmentation. Are you
seeing kmem allocation failures in dmesg on these hosts?
On 24 January 2018 at 10:44, Warren Wang wrote:
> Check /proc/buddyinfo for memory fragmentation. We have some pretty severe
> memory frag issues with Ceph to the point wh
Check /proc/buddyinfo for memory fragmentation. We have some pretty severe
memory frag issues with Ceph to the point where we keep excessive
min_free_kbytes configured (8GB), and are starting to order more memory than we
actually need. If you have a lot of objects, you may find that you need to
to:linco...@uchicago.edu]
Sent: dinsdag 23 januari 2018 21:13
To: Samuel Taylor Liston; ceph-users@lists.ceph.com
Subject: Re: [ceph-users] OSD servers swapping despite having free
memory capacity
Hi Sam,
What happens if you just disable swap altogether? i.e., with `swapoff
-a`
--Lincoln
On Tu
Hi Sam,
What happens if you just disable swap altogether? i.e., with `swapoff
-a`
--Lincoln
On Tue, 2018-01-23 at 19:54 +, Samuel Taylor Liston wrote:
> We have a 9 - node (16 - 8TB OSDs per node) running jewel on centos
> 7.4. The OSDs are configured with encryption. The cluster is
> acce
We have a 9 - node (16 - 8TB OSDs per node) running jewel on centos 7.4. The
OSDs are configured with encryption. The cluster is accessed via two - RGWs
and there are 3 - mon servers. The data pool is using 6+3 erasure coding.
About 2 weeks ago I found two of the nine servers wedged and had