The thread creation depends on the OSD number per host, as well as the
cluster size. You have really a lot (40!!) OSDs on a single node, but the
good part is that you¹ve got a small cluster (only 4 nodes).

If you already run into the problem then the only way is to increase
pid_max. Remember to reserver at least 2x or 3x buffer. During recovery it
may create much more threads than usual, especially when the scale is
large. Using a large pid_max number doesn¹t hurt, the messenger system
reaps inactive threads.

On such a high density system you may also see thread scheduling consumes
too much CPU time, sometimes OSDs are unable to send or process heartbeat
messages and they are marked as out. Newer kernel version does much better
thread scheduling job, so you can try a kernel upgrade when it happens.

 

On 6/12/14, 2:47 AM, "Maciej Bonin" <maciej.bo...@m247.com> wrote:

>We have not experienced any downsides to this approach performance or
>stability-wise, if you prefer you can experiment with the values, but I
>see no real advantage in doing so.
>
>Regards,
>Maciej Bonin
>Systems Engineer | M247 Limited
>M247.com  Connected with our Customers
>Contact us today to discuss your hosting and connectivity requirements
>ISO 27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology
>Fast 500 EMEA | Sunday Times Tech Track 100
>M247 Ltd, registered in England & Wales #4968341. 1 Ball Green, Cobra
>Court, Manchester, M32 0QT
> 
>ISO 27001 Data Protection Classification: A - Public
> 
>
>
>-----Original Message-----
>From: Cao, Buddy [mailto:buddy....@intel.com]
>Sent: 11 June 2014 17:00
>To: Maciej Bonin; ceph-users@lists.ceph.com
>Subject: RE: pid_max value?
>
>Thanks Bonin.  Do you have totally 48 OSDs or there are 48 OSDs on each
>storage node?  Do you think "kernel.pid_max = 4194303" is reasonable
>since it increase a lot from the default OS setting.
>
>
>Wei Cao (Buddy)
>
>-----Original Message-----
>From: Maciej Bonin [mailto:maciej.bo...@m247.com]
>Sent: Wednesday, June 11, 2014 10:07 PM
>To: Cao, Buddy; ceph-users@lists.ceph.com
>Subject: RE: pid_max value?
>
>Hello,
>
>The values we use are as follows:
># sysctl -p
>net.ipv4.ip_local_port_range = 1024 65535 net.core.netdev_max_backlog =
>30000 net.core.somaxconn = 16384 net.ipv4.tcp_max_syn_backlog = 252144
>net.ipv4.tcp_max_tw_buckets = 360000 net.ipv4.tcp_fin_timeout = 3
>net.ipv4.tcp_max_orphans = 262144 net.ipv4.tcp_synack_retries = 2
>net.ipv4.tcp_syn_retries = 2 net.core.rmem_max = 8388608
>net.core.wmem_max = 8388608 net.core.rmem_default = 65536
>net.core.wmem_default = 65536 net.ipv4.tcp_rmem = 4096 87380 8388608
>net.ipv4.tcp_wmem = 4096 65536 8388608 net.ipv4.tcp_mem = 8388608 8388608
>8388608 net.ipv4.route.flush = 1 kernel.pid_max = 4194303
>
>The timeouts don't really make sense without tw reuse/recycling but we
>found increasing the max and letting the old ones hang gives better
>performance.
>Somaxconn was the most important value we had to increase as with 3 mons,
>3 storage nodes, 3 vm hypervisors, 16vms and 48 OSDs we've started
>running into major problems with servers dying left and right.
>Most of those values are lifted from some openstack python script IIRC,
>please let us know if you find a more efficient/stable configuration,
>however we're quite happy with this one.
>
>Regards,
>Maciej Bonin
>Systems Engineer | M247 Limited
>M247.com  Connected with our Customers
>Contact us today to discuss your hosting and connectivity requirements
>ISO 27001 | ISO 9001 | Deloitte Technology Fast 50 | Deloitte Technology
>Fast 500 EMEA | Sunday Times Tech Track 100
>M247 Ltd, registered in England & Wales #4968341. 1 Ball Green, Cobra
>Court, Manchester, M32 0QT
> 
>ISO 27001 Data Protection Classification: A - Public
> 
>
>From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf Of
>Cao, Buddy
>Sent: 11 June 2014 15:00
>To: ceph-users@lists.ceph.com
>Subject: [ceph-users] pid_max value?
>
>Hi, what is the recommended value for /proc/sys/kernel/pid_max? Is 32768
>enough for Ceph cluster with 4 nodes (40 1T OSDs on each node)? My ceph
>node already run into "create thread fail" problem in osd log which root
>cause at pid_max.
>
>
>Wei Cao (Buddy)
>
>_______________________________________________
>ceph-users mailing list
>ceph-users@lists.ceph.com
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to