Hi,
I'm also seeing slow memory increase over time with my bluestore nvme osds
(3,2tb each) , with default ceph.conf settings. (ceph 12.2.2)
each osd start around 5G memory, and go up to 8GB.
Currently I'm restarting them around each month to free memory.
here a dump of osd.0 after 1week running
ceph 2894538 3.9 9.9 7358564 6553080 ? Ssl mars01 303:03
/usr/bin/ceph-osd -f --cluster ceph --id 0 --setuser ceph --setgroup ceph
root@ceph4-1:~# ceph daemon osd.0 dump_mempools
{
"bloom_filter": {
"items": 0,
"bytes": 0
},
"bluestore_alloc": {
"items": 84070208,
"bytes": 84070208
},
"bluestore_cache_data": {
"items": 168,
"bytes": 2908160
},
"bluestore_cache_onode": {
"items": 947820,
"bytes": 636935040
},
"bluestore_cache_other": {
"items": 101250372,
"bytes": 2043476720
},
"bluestore_fsck": {
"items": 0,
"bytes": 0
},
"bluestore_txc": {
"items": 8,
"bytes": 5760
},
"bluestore_writing_deferred": {
"items": 85,
"bytes": 1203200
},
"bluestore_writing": {
"items": 7,
"bytes": 569584
},
"bluefs": {
"items": 1774,
"bytes": 106360
},
"buffer_anon": {
"items": 68307,
"bytes": 17188636
},
"buffer_meta": {
"items": 284,
"bytes": 24992
},
"osd": {
"items": 333,
"bytes": 4017312
},
"osd_mapbl": {
"items": 0,
"bytes": 0
},
"osd_pglog": {
"items": 1195884,
"bytes": 298139520
},
"osdmap": {
"items": 4542,
"bytes": 384464
},
"osdmap_mapping": {
"items": 0,
"bytes": 0
},
"pgmap": {
"items": 0,
"bytes": 0
},
"mds_co": {
"items": 0,
"bytes": 0
},
"unittest_1": {
"items": 0,
"bytes": 0
},
"unittest_2": {
"items": 0,
"bytes": 0
},
"total": {
"items": 187539792,
"bytes": 3089029956
}
}
another osd after 1 month:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
ceph 1718009 2.5 11.7 8542012 7725992 ? Ssl 2017 2463:28
/usr/bin/ceph-osd -f --cluster ceph --id 5 --setuser ceph --setgroup ceph
root@ceph4-1:~# ceph daemon osd.5 dump_mempools
{
"bloom_filter": {
"items": 0,
"bytes": 0
},
"bluestore_alloc": {
"items": 98449088,
"bytes": 98449088
},
"bluestore_cache_data": {
"items": 759,
"bytes": 17276928
},
"bluestore_cache_onode": {
"items": 884140,
"bytes": 594142080
},
"bluestore_cache_other": {
"items": 116375567,
"bytes": 2072801299
},
"bluestore_fsck": {
"items": 0,
"bytes": 0
},
"bluestore_txc": {
"items": 6,
"bytes": 4320
},
"bluestore_writing_deferred": {
"items": 99,
"bytes": 1190045
},
"bluestore_writing": {
"items": 11,
"bytes": 4510159
},
"bluefs": {
"items": 1202,
"bytes": 64136
},
"buffer_anon": {
"items": 76863,
"bytes": 21327234
},
"buffer_meta": {
"items": 910,
"bytes": 80080
},
"osd": {
"items": 328,
"bytes": 3956992
},
"osd_mapbl": {
"items": 0,
"bytes": 0
},
"osd_pglog": {
"items": 1118050,
"bytes": 286277600
},
"osdmap": {
"items": 6073,
"bytes": 551872
},
"osdmap_mapping": {
"items": 0,
"bytes": 0
},
"pgmap": {
"items": 0,
"bytes": 0
},
"mds_co": {
"items": 0,
"bytes": 0
},
"unittest_1": {
"items": 0,
"bytes": 0
},
"unittest_2": {
"items": 0,
"bytes": 0
},
"total": {
"items": 216913096,
"bytes": 3100631833
}
}
----- Mail original -----
De: "Kjetil Joergensen" <[email protected]>
À: "ceph-users" <[email protected]>
Envoyé: Mercredi 7 Mars 2018 01:07:06
Objet: Re: [ceph-users] Memory leak in Ceph OSD?
Hi,
addendum: We're running 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b).
The workload is a mix of 3xreplicated & ec-coded (rbd, cephfs, rgw).
-KJ
On Tue, Mar 6, 2018 at 3:53 PM, Kjetil Joergensen < [
mailto:[email protected] | [email protected] ] > wrote:
Hi,
so.. +1
We don't run compression as far as I know, so that wouldn't be it. We do
actually run a mix of bluestore & filestore - due to the rest of the cluster
predating a stable bluestore by some amount.
The interesting part is - the behavior seems to be specific to our bluestore
nodes.
Below - yellow line, node with 10 x ~4TB SSDs, green line 8 x 800GB SSDs. Blue
line - dump_mempools total bytes for all the OSDs running on the yellow line.
The big dips - forced restarts after having suffered through after effects of
letting linux deal with it by OOM->SIGKILL previously.
A gross extrapolation - "right now" the "memory used" seems to be close enough
to "sum of RSS of ceph-osd processes" running on the machines.
-KJ
On Thu, Mar 1, 2018 at 7:18 PM, Alex Gorbachev < [
mailto:[email protected] | [email protected] ] > wrote:
BQ_BEGIN
On Thu, Mar 1, 2018 at 5:37 PM, Subhachandra Chandra
< [ mailto:[email protected] | [email protected] ] > wrote:
> Even with bluestore we saw memory usage plateau at 3-4GB with 8TB drives
> filled to around 90%. One thing that does increase memory usage is the
> number of clients simultaneously sending write requests to a particular
> primary OSD if the write sizes are large.
We have not seen a memory increase in Ubuntu 16.04, but I also
observed repeatedly the following phenomenon:
When doing a VMotion in ESXi of a large 3TB file (this generates a log
of IO requests of small size) to a Ceph pool with compression set to
force, after some time the Ceph cluster shows a large number of
blocked requests and eventually timeouts become very large (to the
point where ESXi aborts the IO due to timeouts). After abort, the
blocked/slow requests messages disappear. There are no OSD errors. I
have OSD logs if anyone is interested.
This does not occur when compression is unset.
--
Alex Gorbachev
Storcium
>
> Subhachandra
>
> On Thu, Mar 1, 2018 at 6:18 AM, David Turner < [ mailto:[email protected]
> | [email protected] ] > wrote:
>>
>> With default memory settings, the general rule is 1GB ram/1TB OSD. If you
>> have a 4TB OSD, you should plan to have at least 4GB ram. This was the
>> recommendation for filestore OSDs, but it was a bit much memory for the
>> OSDs. From what I've seen, this rule is a little more appropriate with
>> bluestore now and should still be observed.
>>
>> Please note that memory usage in a HEALTH_OK cluster is not the same
>> amount of memory that the daemons will use during recovery. I have seen
>> deployments with 4x memory usage during recovery.
>>
>> On Thu, Mar 1, 2018 at 8:11 AM Stefan Kooman < [ mailto:[email protected] |
>> [email protected] ] > wrote:
>>>
>>> Quoting Caspar Smit ( [ mailto:[email protected] |
>>> [email protected] ] ):
>>> > Stefan,
>>> >
>>> > How many OSD's and how much RAM are in each server?
>>>
>>> Currently 7 OSDs, 128 GB RAM. Max wil be 10 OSDs in these servers. 12
>>> cores (at least one core per OSD).
>>>
>>> > bluestore_cache_size=6G will not mean each OSD is using max 6GB RAM
>>> > right?
>>>
>>> Apparently. Sure they will use more RAM than just cache to function
>>> correctly. I figured 3 GB per OSD would be enough ...
>>>
>>> > Our bluestore hdd OSD's with bluestore_cache_size at 1G use ~4GB of
>>> > total
>>> > RAM. The cache is a part of the memory usage by bluestore OSD's.
>>>
>>> A factor 4 is quite high, isn't it? Where is all this RAM used for
>>> besides cache? RocksDB?
>>>
>>> So how should I size the amount of RAM in a OSD server for 10 bluestore
>>> SSDs in a
>>> replicated setup?
>>>
>>> Thanks,
>>>
>>> Stefan
>>>
>>> --
>>> | BIT BV [ http://www.bit.nl/ | http://www.bit.nl/ ] Kamer van Koophandel
>>> 09090351
>>> | GPG: 0xD14839C6 [ tel:%2B31%20318%20648%20688 | +31 318 648 688 ] / [
>>> mailto:[email protected] | [email protected] ]
>>> _______________________________________________
>>> ceph-users mailing list
>>> [ mailto:[email protected] | [email protected] ]
>>> [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> [ mailto:[email protected] | [email protected] ]
>> [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]
>>
>
>
> _______________________________________________
> ceph-users mailing list
> [ mailto:[email protected] | [email protected] ]
> [ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]
>
_______________________________________________
ceph-users mailing list
[ mailto:[email protected] | [email protected] ]
[ http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com |
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ]
--
Kjetil Joergensen < [ mailto:[email protected] | [email protected] ] >
SRE, Medallia Inc
BQ_END
--
Kjetil Joergensen < [ mailto:[email protected] | [email protected] ] >
SRE, Medallia Inc
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
[email protected]
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com