Forgot to copy the list.
I basically cobbled together the settings from examples on the internet. > > I basically modified this sysctl.conf file with his suggestion for 10gb > nics > http://www.nateware.com/linux-network-tuning-for-2013.html#.VIG_44eLTII > > I found these sites helpful as well: > > http://fasterdata.es.net/host-tuning/linux/ > > This may be of interest to you, it has suggestions for your Mellanox > hardware: > https://fasterdata.es.net/host-tuning/nic-tuning/mellanox-connectx-3/ > > Fermilab website, link to university research papaer > > https://indico.fnal.gov/getFile.py/access?contribId=30&sessionId=19&resId=0&materialId=paper&confId=3377 > > This has a great answer that explains different configurations for servers > vs clients. It seems to me that osds are both servers and clients, so > maybe some of the client tuning would benefit osds as well. This is where > I got the somaxconn setting from. > > http://stackoverflow.com/questions/410616/increasing-the-maximum-number-of-tcp-ip-connections-in-linux > > > I forgot to mention, I'm also setting the txqueuelen for my ceph public > nic and ceph private nic in the /etc/rc.local file: > /sbin/ifconfig eth0 txqueuelen 10000 > /sbin/ifconfig eth1 txqueuelen 10000 > > > > I do push the same sysctl.conf and rc.local to all of my clients as well. > The clients are iSCSI servers which serve vmware hosts. My ceph cluster is > rbd only and I currently only have the iSCSI proxy server clients. We'll > be adding some KVM hypervisors soon, I'm interested to see how they perform > vs my vmware --> iSCSI Server --> Ceph setup. > > > Regarding your sysctl.conf file: > > I've read on a few different sites that net.ipv4.tcp_mem should not be > tuned, since the defaults are good. I have not set it, and I can't speak > to the benefit/problems with setting it. > > You're configured to only use a 4MB TCP buffer, which is very small. It > is actually smaller than the defaults for tcp_wmem, which is 6MB. The link > above suggests up to a 128MB TCP buffer for the 40gb Mellanox and/or 10gb > over a WAN (not sure how to read that). I'm using a 54MB buffer, but I may > increase mine to 128MB to see if there is any benefit. That 4MB buffer may > be your problem. > > Your net.core.netdev_max_backlog is 5x bigger than mine. I think I'll > increase my setting to 250000 as well. > > Our issue looks like http://tracker.ceph.com/issues/9844 and my crash > looks like http://tracker.ceph.com/issues/9788 > > > > On Fri, Dec 5, 2014 at 5:35 AM, Andrei Mikhailovsky <and...@arhont.com > <javascript:_e(%7B%7D,'cvml','and...@arhont.com');>> wrote: > >> Jake, >> >> very usefull indeed. >> >> It looks like I had a similar problem regarding the heartbeat and as you' >> have mentioned, I've not seen such issues on Firefly. However, i've not >> seen any osd crashes. >> >> >> >> Could you please let me know where you got the sysctrl.conf tunings from? >> Was it recommended by the network vendor? >> >> Also, did you make similar sysctrl.conf changes to your host servers? >> >> A while ago i've read the tunning guide for IP over Infiniband and the >> Mellanox recommends setting something like this: >> >> net.ipv4.tcp_timestamps = 0 >> net.ipv4.tcp_sack = 1 >> net.core.netdev_max_backlog = 250000 >> net.core.rmem_max = 4194304 >> net.core.wmem_max = 4194304 >> net.core.rmem_default = 4194304 >> net.core.wmem_default = 4194304 >> net.core.optmem_max = 4194304 >> net.ipv4.tcp_rmem = 4096 87380 4194304 >> net.ipv4.tcp_wmem = 4096 65536 4194304 >> net.ipv4.tcp_mem =4194304 4194304 4194304 >> net.ipv4.tcp_low_latency=1 >> >> >> which is what I have. Not sure if these are optimal. >> >> I can see that the values are pretty conservative compare to yours. I >> guess my values should be different as I am running a 40gbit/s network with >> ipoib. The actual throughput on ipoib is about 20gbit/s according iperf and >> alike. >> >> Andrei >> >> >> ------------------------------ >> >> *From: *"Jake Young" <jak3...@gmail.com >> <javascript:_e(%7B%7D,'cvml','jak3...@gmail.com');>> >> *To: *"Andrei Mikhailovsky" <and...@arhont.com >> <javascript:_e(%7B%7D,'cvml','and...@arhont.com');>> >> *Cc: *ceph-users@lists.ceph.com >> <javascript:_e(%7B%7D,'cvml','ceph-users@lists.ceph.com');> >> *Sent: *Thursday, 4 December, 2014 4:57:47 PM >> *Subject: *Re: [ceph-users] Giant osd problems - loss of IO >> >> >> >> On Fri, Nov 14, 2014 at 4:38 PM, Andrei Mikhailovsky <and...@arhont.com >> <javascript:_e(%7B%7D,'cvml','and...@arhont.com');>> wrote: >> > >> > Any other suggestions why several osds are going down on Giant and >> causing IO to stall? This was not happening on Firefly. >> > >> > Thanks >> > >> > >> >> I had a very similar probem to yours which started after upgrading from >> Firefly to Giant and then later I added two new osd nodes, with 7 osds on >> each. >> >> My cluster originally had 4 nodes, with 7 osds on each node, 28 osds >> total, running Gian. I did not have any problems at this time. >> >> My problems started after adding two new nodes, so I had 6 nodes and 42 >> total osds. It would run fine on low load, but when the request load >> increased, osds started to fall over. >> >> >> I was able to set the debug_ms to 10 and capture the logs from a failed >> OSD. There were a few different reasons the osds were going down. This >> example shows it terminating normally for an unspecified reason a minute >> after it notices it is marked down in the map. >> >> Osd 25 actually marks this osd (osd 35) down. For some reason many osds >> cannot communicate with each other. >> >> There are other examples where I see the "heartbeat_check: no reply from >> osd.blah" message for long periods of time (hours) and neither osd crashes >> or terminates. >> >> 2014-12-01 16:27:06.772616 7f8b642d1700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:06.056972 (cutoff 2014-12-01 16:26:46.772608) >> 2014-12-01 16:27:07.772767 7f8b642d1700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:06.056972 (cutoff 2014-12-01 16:26:47.772759) >> 2014-12-01 16:27:08.772990 7f8b642d1700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:06.056972 (cutoff 2014-12-01 16:26:48.772982) >> 2014-12-01 16:27:09.559894 7f8b3b1fe700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:06.056972 (cutoff 2014-12-01 16:26:49.559891) >> 2014-12-01 16:27:09.773177 7f8b642d1700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:09.559087 (cutoff 2014-12-01 16:26:49.773173) >> 2014-12-01 16:27:10.773307 7f8b642d1700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:09.559087 (cutoff 2014-12-01 16:26:50.773299) >> 2014-12-01 16:27:11.261557 7f8b3b1fe700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:09.559087 (cutoff 2014-12-01 16:26:51.261554) >> 2014-12-01 16:27:11.773512 7f8b642d1700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:11.260129 (cutoff 2014-12-01 16:26:51.773504) >> 2014-12-01 16:27:12.773741 7f8b642d1700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:11.260129 (cutoff 2014-12-01 16:26:52.773733) >> 2014-12-01 16:27:13.773884 7f8b642d1700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:11.260129 (cutoff 2014-12-01 16:26:53.773876) >> 2014-12-01 16:27:14.163369 7f8b3b1fe700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:11.260129 (cutoff 2014-12-01 16:26:54.163366) >> 2014-12-01 16:27:14.507632 7f8b4fb7f700 0 -- 172.1.2.6:6802/5210 >> >> 172.1.2.5:6802/2755 pipe(0x2af06940 sd=57 :51521 s=2 pgs=384 cs=1 l=0 >> c=0x2af094a0).fault with nothing to send, going to standby >> 2014-12-01 16:27:14.511704 7f8b37af1700 0 -- 172.1.2.6:6802/5210 >> >> 172.1.2.2:6812/34015988 pipe(0x2af06c00 sd=69 :41512 s=2 pgs=38842 cs=1 >> l=0 c=0x2af09600).fault with nothing to send, going to standby >> 2014-12-01 16:27:14.511966 7f8b5030c700 0 -- 172.1.2.6:6802/5210 >> >> 172.1.2.4:6802/40022302 pipe(0x30cbcdc0 sd=93 :6802 s=2 pgs=66722 cs=3 >> l=0 c=0x2af091e0).fault with nothing to send, going to standby >> 2014-12-01 16:27:14.514744 7f8b548a5700 0 -- 172.1.2.6:6802/5210 >> >> 172.1.2.2:6800/9016639 pipe(0x2af04dc0 sd=38 :60965 s=2 pgs=11747 cs=1 >> l=0 c=0x2af086e0).fault with nothing to send, going to standby >> 2014-12-01 16:27:14.516712 7f8b349c7700 0 -- 172.1.2.6:6802/5210 >> >> 172.1.2.2:6802/25277 pipe(0x2b04cc00 sd=166 :6802 s=2 pgs=62 cs=1 l=0 >> c=0x2b043080).fault with nothing to send, going to standby >> 2014-12-01 16:27:14.516814 7f8b2bd3b700 0 -- 172.1.2.6:6802/5210 >> >> 172.1.2.4:6804/16770 pipe(0x30cbd600 sd=79 :6802 s=2 pgs=607 cs=3 l=0 >> c=0x2af08c60).fault with nothing to send, going to standby >> 2014-12-01 16:27:14.518439 7f8b2a422700 0 -- 172.1.2.6:6802/5210 >> >> 172.1.2.5:6806/31172 pipe(0x30cbc840 sd=28 :6802 s=2 pgs=22 cs=1 l=0 >> c=0x3041f5a0).fault with nothing to send, going to standby >> 2014-12-01 16:27:14.518883 7f8b589ba700 0 -- 172.1.2.6:6802/5210 >> >> 172.1.2.1:6803/4031631 pipe(0x2af042c0 sd=32 :58296 s=2 pgs=35500 cs=3 >> l=0 c=0x2af08160).fault with nothing to send, going to standby >> 2014-12-01 16:27:14.519271 7f8b5040d700 0 -- 172.1.2.6:6802/5210 >> >> 172.1.2.2:6816/32028847 pipe(0x2af05e40 sd=49 :54016 s=2 pgs=30500 cs=1 >> l=0 c=0x2af08f20).fault with nothing to send, going to standby >> 2014-12-01 16:27:14.774081 7f8b642d1700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:14.161820 (cutoff 2014-12-01 16:26:54.774073) >> 2014-12-01 16:27:15.774290 7f8b642d1700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:14.161820 (cutoff 2014-12-01 16:26:55.774281) >> 2014-12-01 16:27:16.774480 7f8b642d1700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:14.161820 (cutoff 2014-12-01 16:26:56.774471) >> 2014-12-01 16:27:17.774670 7f8b642d1700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:14.161820 (cutoff 2014-12-01 16:26:57.774661) >> 2014-12-01 16:27:18.264884 7f8b3b1fe700 -1 osd.35 79679 heartbeat_check: >> no reply from osd.25 since back 2014-12-01 16:25:51.310319 front 2014-12-01 >> 16:27:14.161820 (cutoff 2014-12-01 16:26:58.264882) >> 2014-12-01 16:27:18.268852 7f8b4c220700 0 log_channel(default) log [WRN] >> : *map e79681 wrongly marked me down* >> 2014-12-01 16:27:22.290362 7f8b37df4700 0 -- 172.1.2.6:6806/1005210 >> >> 172.1.2.4:6804/16770 pipe(0x2c2ef8c0 sd=75 :44216 s=2 pgs=632 cs=1 l=0 >> c=0x1bce6940).fault with nothing to send, going to standby >> 2014-12-01 16:27:22.866677 7f8b56662700 0 -- 172.1.2.6:6806/1005210 >> >> 172.1.2.2:6808/9013936 pipe(0x2f5e4840 sd=55 :41925 s=2 pgs=15111 cs=1 >> l=0 c=0x30828580).fault with nothing to send, going to stan >> dby >> 2014-12-01 16:27:24.854642 7f8b2ed6b700 0 -- 172.1.2.6:6806/1005210 >> >> 172.1.2.6:6816/62000664 pipe(0x2c3c78c0 sd=206 :6806 s=0 pgs=0 cs=0 l=0 >> c=0x2ccb3e40).accept connect_seq 0 vs existing 0 state con >> necting >> 2014-12-01 16:27:25.265306 7f8b2af2d700 0 -- 172.1.1.6:6806/5210 >> >> 172.1.1.54:0/586219983 pipe(0x2d46bc80 sd=246 :6806 s=0 pgs=0 cs=0 l=0 >> c=0x1bdacdc0).accept peer addr is really 172.1.1.54:0/58621998 >> 3 (socket is 172.1.1.54:36423/0) >> 2014-12-01 16:28:45.732468 7f8b368df700 -1 osd.35 79691 **** Got signal >> Terminated **** >> 2014-12-01 16:28:45.732591 7f8b368df700 0 osd.35 79691 prepare_to_stop >> telling mon we are shutting down >> 2014-12-01 16:28:46.586316 7f8b2236a700 0 -- 172.1.2.6:6806/1005210 >> >> 172.1.2.1:6807/91014386 pipe(0x1cabb700 sd=32 :53651 s=2 pgs=37459 cs=1 >> l=0 c=0x1bce5e40).fault with nothing to send, going to sta >> ndby >> 2014-12-01 16:28:46.593615 7f8b4c220700 0 osd.35 79691 got_stop_ack >> starting shutdown >> 2014-12-01 16:28:46.593662 7f8b368df700 0 osd.35 79691 prepare_to_stop >> starting shutdown >> 2014-12-01 16:28:46.593682 7f8b368df700 -1 osd.35 79691 shutdown >> >> >> >> Another example I found shows this same osd crashing with "hit suicide >> timeout". >> >> -4> 2014-12-01 15:50:52.333350 7fecda368700 10 -- >> 172.1.2.6:6805/1031451 >> 172.1.2.1:0/32541 pipe(0x2b70a680 sd=104 :6805 >> s=2 pgs=2692 cs=1 l=1 c=0x2d754680).writer: state = open policy.server=1 >> -3> 2014-12-01 15:50:52.333348 7fecd9d62700 10 -- >> 172.1.1.6:6819/1031451 >> 172.1.2.1:0/32541 pipe(0x4065c80 sd=94 :6819 >> s=2 pgs=2689 cs=1 l=1 c=0x2d7538c0).writer: state = open policy.server=1 >> -2> 2014-12-01 15:50:52.333369 7fecd9d62700 10 -- >> 172.1.1.6:6819/1031451 >> 172.1.2.1:0/32541 pipe(0x4065c80 sd=94 :6819 >> s=2 pgs=2689 cs=1 l=1 c=0x2d7538c0).write_ack 10 >> -1> 2014-12-01 15:50:52.333386 7fecd9d62700 10 -- >> 172.1.1.6:6819/1031451 >> 172.1.2.1:0/32541 pipe(0x4065c80 sd=94 :6819 >> s=2 pgs=2689 cs=1 l=1 c=0x2d7538c0).writer: state = open policy.server=1 >> 0> 2014-12-01 15:50:52.531714 7fed13221700 -1 >> common/HeartbeatMap.cc: In function 'bool >> ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, const char*, time_t)' >> thread 7fed13221700 time 2014-12- >> 01 15:50:52.508265 >> common/HeartbeatMap.cc: 79: FAILED assert(0 == "hit suicide timeout") >> >> ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) >> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x8b) [0xb8231b] >> 2: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, >> long)+0x2a9) [0xac0e19] >> 3: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xac16a6] >> 4: (ceph::HeartbeatMap::check_touch_file()+0x17) [0xac1d87] >> 5: (CephContextServiceThread::entry()+0x154) [0xb96844] >> 6: (()+0x8182) [0x7fed175c7182] >> 7: (clone()+0x6d) [0x7fed15b31fbd] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >> to interpret this. >> >> --- logging levels --- >> 0/ 5 none >> 0/ 0 lockdep >> 0/ 0 context >> 0/ 0 crush >> 1/ 5 mds >> 1/ 5 mds_balancer >> 1/ 5 mds_locker >> 1/ 5 mds_log >> 1/ 5 mds_log_expire >> 1/ 5 mds_migrator >> 0/ 0 buffer >> 0/ 0 timer >> 0/ 1 filer >> 0/ 1 striper >> 0/ 1 objecter >> 0/ 5 rados >> 0/ 5 rbd >> 0/ 5 rbd_replay >> 0/ 0 journaler >> 0/ 5 objectcacher >> 0/ 5 client >> 0/ 0 osd >> 0/ 0 optracker >> 0/ 0 objclass >> 0/ 0 filestore >> 1/ 3 keyvaluestore >> 0/ 0 journal >> 10/10 ms >> 1/ 5 mon >> 0/ 0 monc >> 1/ 5 paxos >> 0/ 0 tp >> 0/ 0 auth >> 1/ 5 crypto >> 0/ 0 finisher >> 0/ 0 heartbeatmap >> 0/ 0 perfcounter >> 1/ 5 rgw >> 1/10 civetweb >> 1/ 5 javaclient >> 0/ 0 asok >> 0/ 0 throttle >> 0/ 0 refs >> -2/-2 (syslog threshold) >> -1/-1 (stderr threshold) >> max_recent 10000 >> max_new 1000 >> log_file /var/log/ceph/ceph-osd.35.log >> --- end dump of recent events --- >> 2014-12-01 15:50:52.627789 7fed13221700 -1 *** Caught signal (Aborted) ** >> in thread 7fed13221700 >> >> ceph version 0.87 (c51c8f9d80fa4e0168aa52685b8de40e42758578) >> 1: /usr/bin/ceph-osd() [0xa9767a] >> 2: (()+0x10340) [0x7fed175cf340] >> 3: (gsignal()+0x39) [0x7fed15a6dbb9] >> 4: (abort()+0x148) [0x7fed15a70fc8] >> 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7fed163796b5] >> 6: (()+0x5e836) [0x7fed16377836] >> 7: (()+0x5e863) [0x7fed16377863] >> 8: (()+0x5eaa2) [0x7fed16377aa2] >> 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char >> const*)+0x278) [0xb82508] >> 10: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d*, char const*, >> long)+0x2a9) [0xac0e19] >> 11: (ceph::HeartbeatMap::is_healthy()+0xd6) [0xac16a6] >> 12: (ceph::HeartbeatMap::check_touch_file()+0x17) [0xac1d87] >> 13: (CephContextServiceThread::entry()+0x154) [0xb96844] >> 14: (()+0x8182) [0x7fed175c7182] >> 15: (clone()+0x6d) [0x7fed15b31fbd] >> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed >> to interpret this. >> >> --- logging levels --- >> 0/ 5 none >> 0/ 0 lockdep >> 0/ 0 context >> 0/ 0 crush >> 1/ 5 mds >> 1/ 5 mds_balancer >> 1/ 5 mds_locker >> 1/ 5 mds_log >> 1/ 5 mds_log_expire >> 1/ 5 mds_migrator >> 0/ 0 buffer >> 0/ 0 timer >> 0/ 1 filer >> 0/ 1 striper >> 0/ 1 objecter >> 0/ 5 rados >> 0/ 5 rbd >> 0/ 5 rbd_replay >> 0/ 0 journaler >> 0/ 5 objectcacher >> 0/ 5 client >> 0/ 0 osd >> 0/ 0 optracker >> 0/ 0 objclass >> 0/ 0 filestore >> 1/ 3 keyvaluestore >> 0/ 0 journal >> 10/10 ms >> 1/ 5 mon >> 0/ 0 monc >> 1/ 5 paxos >> 0/ 0 tp >> 0/ 0 auth >> 1/ 5 crypto >> 0/ 0 finisher >> 0/ 0 heartbeatmap >> 0/ 0 perfcounter >> 1/ 5 rgw >> 1/10 civetweb >> 1/ 5 javaclient >> 0/ 0 asok >> 0/ 0 throttle >> 0/ 0 refs >> -2/-2 (syslog threshold) >> -1/-1 (stderr threshold) >> max_recent 10000 >> max_new 1000 >> log_file /var/log/ceph/ceph-osd.35.log >> --- end dump of recent events --- >> >> >> >> >> The recurring theme here is that there is a communication issue between >> the osds. >> >> I looked carefully at my network hardware configuration (UCS C240s with >> 40Gbps Cisco VICs connected to a pair of Nexus 5672 using A-FEX Port >> Profile configuration) and couldn't find any dropped packets or errors. >> >> I ran "ss -s" for the first time on my osds and was a bit suprised to see >> how many open TCP connections they all have. >> >> ceph@osd6:/var/log/ceph$ ss -s >> Total: 1492 (kernel 0) >> TCP: 1411 (estab 1334, closed 40, orphaned 0, synrecv 0, timewait 0/0), >> ports 0 >> >> Transport Total IP IPv6 >> * 0 - - >> RAW 0 0 0 >> UDP 10 4 6 >> TCP *1371* 1369 2 >> INET 1381 1373 8 >> FRAG 0 0 0 >> >> While researching if additional kernel tuning would be required to handle >> so many connections, I eventually realized that I forgot to copy my >> customized /etc/sysctl.conf file on the two new nodes. I'm not sure if the >> large amount of TCP connections is part of the performance enhancements >> between Giant and Firefly, or if Firefly uses a similar number of >> connections. >> >> I highlighted the tuning parameters that I suspect are required. >> >> >> ceph@osd6:/var/log/ceph$ cat /etc/sysctl.conf >> # >> # /etc/sysctl.conf - Configuration file for setting system variables >> # See /etc/sysctl.d/ for additional system variables >> # See sysctl.conf (5) for information. >> # >> >> # Increase Linux autotuning TCP buffer limits >> # Set max to 16MB for 1GE and 32M (33554432) or 54M (56623104) for 10GE >> # Don't set tcp_mem itself! Let the kernel scale it based on RAM. >> net.core.rmem_max = 56623104 >> net.core.wmem_max = 56623104 >> net.core.rmem_default = 56623104 >> net.core.wmem_default = 56623104 >> net.core.optmem_max = 40960 >> net.ipv4.tcp_rmem = 4096 87380 56623104 >> net.ipv4.tcp_wmem = 4096 65536 56623104 >> >> # Make room for more TIME_WAIT sockets due to more clients, >> # and allow them to be reused if we run out of sockets >> # Also increase the max packet backlog >> net.core.somaxconn = 1024 >> *net.core.netdev_max_backlog = 50000* >> *net.ipv4.tcp_max_syn_backlog = 30000* >> *net.ipv4.tcp_max_tw_buckets = 2000000* >> *net.ipv4.tcp_tw_reuse = 1* >> *net.ipv4.tcp_fin_timeout = 10* >> >> # Disable TCP slow start on idle connections >> net.ipv4.tcp_slow_start_after_idle = 0 >> >> # If your servers talk UDP, also up these limits >> net.ipv4.udp_rmem_min = 8192 >> net.ipv4.udp_wmem_min = 8192 >> >> # Disable source routing and redirects >> net.ipv4.conf.all.send_redirects = 0 >> net.ipv4.conf.all.accept_redirects = 0 >> net.ipv4.conf.all.accept_source_route = 0 >> >> >> >> >> I added the net.core.somaxconn after this experience, since the default >> is 128. This represents the allowed socket backlog in the kernel; which >> should help when I reboot an osd node and 1300 connections need to be made >> quickly. >> >> I found that I needed to restart my osds after applying the kernel tuning >> above for my cluster to stabilize. >> >> My system is now stable again and performs very well. >> >> I hope this helps someone out. It took me a few days of troubleshooting >> to get to the bottom of this. >> >> >> Jake >> >> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com