Hi all, I was hoping someone might have an idea here.. I have a number of nginx doing load balancing sitting behind AWS's network load balancers (TCP) - which seem to only support TCP checks.
Recently a few have stopped working / frozen - they still seem to accept a tcp connection from the NLB - which leads the health check not to fail. But they cannot internally process the request and you cannot even ssh into the machine. A reboot is required and that takes longer than normal. I think the failure is related to a disk issue since the only error in the entire logs where regarding the disk. (error logs below) Ideally if nginx or the O/S fails it would be better if the port just closed. I've considered writing a small daemon that monitors via http locally and keeps a port open if everything is ok. These machines have been running for months now without any issues until now. Anyone have an idea? Thanks! ---- [4161960.544106] INFO: task jbd2/xvda1-8:271 blocked for more than 120 seconds. [4161960.551035] Not tainted 4.4.0-1022-aws #31-Ubuntu [4161960.556118] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [4161960.562846] INFO: task monit:13224 blocked for more than 120 seconds. [4161960.567394] Not tainted 4.4.0-1022-aws #31-Ubuntu [4161960.571120] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [4162080.576076] INFO: task dhclient:696 blocked for more than 120 seconds. [4162080.579596] Not tainted 4.4.0-1022-aws #31-Ubuntu [4162080.582355] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [4162080.586470] INFO: task monit:13224 blocked for more than 120 seconds. [4162080.589847] Not tainted 4.4.0-1022-aws #31-Ubuntu [4162080.592654] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [4162200.596100] INFO: task jbd2/xvda1-8:271 blocked for more than 120 seconds. [4162200.599646] Not tainted 4.4.0-1022-aws #31-Ubuntu [4162200.602422] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [4162200.606423] INFO: task dhclient:696 blocked for more than 120 seconds. [4162200.610118] Not tainted 4.4.0-1022-aws #31-Ubuntu [4162200.613093] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [4162200.617889] INFO: task monit:13224 blocked for more than 120 seconds. [4162200.621641] Not tainted 4.4.0-1022-aws #31-Ubuntu [4162200.624506] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [4162244.551431] systemd[1]: Failed to start Journal Service. [4162320.628099] INFO: task jbd2/xvda1-8:271 blocked for more than 120 seconds. [4162320.631942] Not tainted 4.4.0-1022-aws #31-Ubuntu [4162320.635012] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [4162320.639647] INFO: task dhclient:696 blocked for more than 120 seconds. [4162320.643241] Not tainted 4.4.0-1022-aws #31-Ubuntu [4162320.646233] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [4162320.650712] INFO: task monit:13224 blocked for more than 120 seconds. [4162320.654190] Not tainted 4.4.0-1022-aws #31-Ubuntu [4162320.657183] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [4162334.801390] systemd[1]: Failed to start Journal Service. [4162425.051503] systemd[1]: Failed to start Journal Service. [4162515.301393] systemd[1]: Failed to start Journal Service.
_______________________________________________ nginx mailing list nginx@nginx.org http://mailman.nginx.org/mailman/listinfo/nginx