-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Ok,
was a fail. After now 1 week we had 3 outages on one machine. I have actually no idea anymore. My actual kernel settings: # disable TIME_WAIT.. wait.. net.ipv4.tcp_tw_recycle=1 net.ipv4.tcp_tw_reuse=1 # disable syn cookies net.ipv4.tcp_syncookies = 0 # double amount of allowed conntrack net.ipv4.netfilter.ip_conntrack_max = 524288 net.netfilter.nf_conntrack_max = 524288 net.ipv4.ip_local_port_range = 7000 65535 net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait = 1 net.netfilter.nf_conntrack_tcp_timeout_established=600 net.netfilter.nf_conntrack_tcp_timeout_time_wait=30 net.ipv4.tcp_fin_timeout=15 net.ipv4.tcp_keepalive_intvl=30 net.ipv4.tcp_keepalive_probes=5 # Disk I/O vm.dirty_background_ratio = 3 vm.dirty_ratio = 6 # Reduktion der TIME_WAIT connections bei vielen kurzen Connections # tcp_fin_timeout (Default: 60) , unser SWIFt System: 15 # Im TIME_WAIT Status kosstet es weniger dem Host zuantworten als eine Neue Connection auf zu bauen, # andererseits werden Resourcen für Connections schneller frei um mehr Connections ab zu arbeiten # Hiewr muss ein passender Mittelweg gefunden werden, jedoch sollte dies i.d.R. das doppelte der Paket TTL betragen net.ipv4.tcp_fin_timeout = 20 # Netzwerk Buffer net.core.rmem_max = 13421568 net.core.wmem_max = 13421568 # TCP Buffer net.ipv4.tcp_rmem = 4096 87380 6710272 net.ipv4.tcp_wmem = 4096 87380 6710272 # Input Queue net.core.netdev_max_backlog = 25000 # Hyper Text Caching Protocol (RFC 2756) #net.ipv4.tcp_congestion_control=htcp # verfügbare module unter net.ipv4.tcp_available_congestion_control abrufen # ggf. htpc modul laden # Empfehlung für Hosts mit jumbo frames (Default: 0, off) ; Swift 1 net.ipv4.tcp_mtu_probing=1 # Tweaks für viele Connections mittels Proxy # z.B. bei "possible SYN flooding on port 80. Sending cookies" Meldungen, # welche entstehen, wenn die Portrange erschöpft ist. # (Proxy benötigt immer 2 Connections pro Request, client<->proxy<->upstream ) # Default: 7000 65535 net.ipv4.ip_local_port_range = 1024 65535 Is there anything wrong which could occurs this kind of issue? Thanks and Cheers Heiko On 09.11.2015 21:53, Mark Kirkwood wrote: > Right, > > I agree that the dirty size of the cache could be the issue (particularly with SATA drives in a JBOD array without the help of any writeback caches etc). > > You might even want to wind those settings down a bit more (3 and 6 perhaps which means with 32G of ram your dirty cache size is between 1 and 2G), should stop the massive IO stall. > > regards > > Mark > > > On 09/11/15 23:39, Heiko Krämer wrote: >> >> Hi Mark, >> >> nothing can be logged because all disks are stale. >> I see only an output on the IPMI >> >> blocked for more than 120 seconds >> kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables >> this message. >> >> I found there could be a problem with the write caching of the disks. >> So i have reduced the available dirty cache: >> >> vm.dirty_background_ratio = 5 >> vm.dirty_ratio = 10 >> >> >> I hope this will solve the issue. >> >> You're right only a hard reboot can solve the problem because SSH >> login or other commands can't be executed because the whole system was >> frozen. >> That's the problem i can't get any deeper informations was happened. >> >> > > > _______________________________________________ > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack > Post to : openstack@lists.openstack.org > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack - -- anynines.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQEcBAEBAgAGBQJWTDZNAAoJELxFogM4ixOFl7wIANX/X/IxCeuIZ1P9RzyRwFji VgPOUE28obdjQPVkUs1rv0YM9CTWwUv/53pVmOiNNQ5K7LFT00HQdTRIpY2eMsyZ NXc14jiGjj3ulVQtfxyucY4m50Tvs1Ljazx1/SBX+cOYVsfMtEmKp8koBxzVIPq6 c7xKg2kPaT5sX3fWAhxWW7ZVjjIRoAbO7hwLBkQcHSd4n0H4UMKj9SWRyPYwdxwP 8qb8dIRVeilt+qgpWbuZeXVW0p5MYfj1cSigCWCRfidVNrqLhYJANMlDTWX99zhO IQ+F6O9/2dOxAWyT2FSjKzKSeK7x++nr0qL1JVHmyuLOdxiJuSfVAcyV19cdgGs= =BlB5 -----END PGP SIGNATURE----- _______________________________________________ Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack Post to : openstack@lists.openstack.org Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack