Issue Type: Bug Bug
Assignee: Unassigned
Attachments: Dockerfile, launch.sh, stacktrace.txt
Components: remoting
Created: 12/Feb/15 10:36 PM
Description:

I find a way to trigger a remoting problem using tcp fault injection with netem. I'm able to trigger this wait call at hudson.remoting.Request.call(Request.java:146):

{{
while(response==null && !channel.isInClosed())
// I don't know exactly when this can happen, as pendingCalls are cleaned up by Channel,
// but in production I've observed that in rare occasion it can block forever, even after a channel
// is gone. So be defensive against that.
wait(30*1000);
}}

When this wait is triggered, the running build is stuck and consumes a executor. It loops over and over on the wait.

To reproduce, setup a SSH slave using the attached Dockerfile, and setup netem on the docker0 bridge like this:

tc qdisc add dev docker0 root netem
tc qdisc change dev docker0 root netem corrupt 1

Testing requires to run the job one time before configuring netem, as netem settings are applied to all network streams, it could fail while downloading Maven dependencies. I just launched a Maven build of a example project to trigger the problem. It might be a Maven specific problem...

To remove netem settings, just run tc qdisc del dev docker0 root.

I've attached the Dockerfile, the command I used to launch it and a threaddump of a Jenkins stuck master.

Environment: Linux
Project: Jenkins
Priority: Minor Minor
Reporter: Yoann Dubreuil
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira

--
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to