Haproxy reload happened at that time: 2019-04-30 16:16:29. Could you give another try ?
On Tue Apr 30 12:40:42 2019, vrpo...@cisco.com wrote: > > I increased idle timeout from 10min to 60min. > > Was it around the time this [2] job failed recently? > > 16:14:44 ++ sleep 184s > 16:16:29 FATAL: command execution failed > > Vratko. > > [2] https://jenkins.fd.io/job/csit-vpp-perf-verify-master-3n- > hsw/335/console > > -----Original Message----- > From: csit-...@lists.fd.io <csit-...@lists.fd.io> On Behalf Of Kenny > Paul via RT > Sent: Tuesday, 2019-April-30 18:22 > To: Jan Gelety -X (jgelety - PANTHEON TECHNOLOGIES at Cisco) > <jgel...@cisco.com> > Cc: csit-...@lists.fd.io; vpp-dev@lists.fd.io > Subject: [csit-dev] [FD.io Helpdesk #73486] Jenkins.fd.io network > issues > > > I increased idle timeout from 10min to 60min. Let's see if that makes > any difference. > > Regards, > > -- > Anton Baranov > Sr. System Operations Engineer > The Linux Foundation > > On Tue Apr 30 10:03:46 2019, vrpo...@cisco.com wrote: > > >> interleaved by quick periods of activity > > > > >>> 09:26:36 ++ sleep 197s > > > > > send any keepalive packages > > > > I always assumed the console outputs are enough to keep jnlp > > connection alive. > > > > Also, I believe this failure over weekend has hit multiple jobs at > > once. > > > > For example https://jenkins.fd.io/job/csit-vpp-perf-verify-master-3n- > > hsw/333/console > > 09:32:54 ++ sleep 184s > > 09:33:09 FATAL: command execution failed > > > > Vratko. > > > > -----Original Message----- > > From: csit-...@lists.fd.io <csit-...@lists.fd.io> On Behalf Of Kenny > > Paul via RT > > Sent: Tuesday, 2019-April-30 15:57 > > To: Jan Gelety -X (jgelety - PANTHEON TECHNOLOGIES at Cisco) > > <jgel...@cisco.com> > > Cc: csit-...@lists.fd.io; vpp-dev@lists.fd.io > > Subject: [csit-dev] [FD.io Helpdesk #73486] Jenkins.fd.io network > > issues > > > > Hello Vratko, > > > > Thank you for explanation. I'm wondering within that period of time > > when reservation was unsuccessful (~40min) does the job keep jnlp > > connection alive (send any keepalive packages)? > > > > I checked the haproxy node where jnlp is runnining and I don't see > > any > > DOWN notification for it > > > > Thanks, > > -- > > Anton Baranov > > Sr. System Operations Engineer > > The Linux Foundation > > > > On Tue Apr 30 09:27:56 2019, vrpo...@cisco.com wrote: > > > > 05:26:36 mkdir: cannot create directory '/tmp/reservation_dir': > > > > File > > > > exists > > > > > > That error is expected, it just means the testbed is currently > > > used > > > by another job, so this job should sleep a while and try again. > > > > > > > the job was waiting (sleep) from 04:45:12 til 05:26:36 > > > > > > I believe my browser is showing me UTC timestamps, which show > > > values > > > larger by 4 hours. > > > > > > > we have 10m idle timeout > > > > > > The ~3m period of sleeps are interleaved by quick periods of > > > activity, so we usually do not hit the timeout. > > > > > > But the final sleep probably took longer for some reason > > > > > > 09:26:36 ++ sleep 197s > > > 09:32:20 FATAL: command execution failed > > > > > > and something bad has happened in less than 6 minutes. > > > So it does not look like the 10m timeout. > > > > > > Vratko. > > > > > > -----Original Message----- > > > From: csit-...@lists.fd.io <csit-...@lists.fd.io> On Behalf Of > > > Kenny Paul via RT > > > Sent: Tuesday, 2019-April-30 15:09 > > > To: Jan Gelety -X (jgelety - PANTHEON TECHNOLOGIES at Cisco) > > > <jgel...@cisco.com> > > > Cc: csit-...@lists.fd.io; vpp-dev@lists.fd.io > > > Subject: [csit-dev] [FD.io Helpdesk #73486] Jenkins.fd.io network > > > issues > > > > > > Hello Jan > > > > > > From logs I see that the job was waiting (sleep) from 04:45:12 til > > > 05:26:36 which could cause jnlp session to timed out as we have > > > 10m > > > idle timeout (client and server side) set on jenkins.fd.io > > > > > > Could you check that error: > > > > > > 05:26:36 Reservation unsuccessful: > > > 05:26:36 mkdir: cannot create directory '/tmp/reservation_dir': > > > File exists > > > > > > Cheers, > > > > > > -- > > > Anton Baranov > > > Sr. System Operations Engineer > > > The Linux Foundation > > > > > > On Mon Apr 29 02:58:28 2019, jgel...@cisco.com wrote: > > > > Hello, > > > > > > > > We are experiencing quite a lot of network issues when running > > > > CSIT tests for 19.04 report: > > > > > > > > Caused: hudson.remoting.ChannelClosedException: Channel > > > > "unknown": > > > > Remote call on JNLP4-connect connection from vex-yul-rot-ingress- > > > > 1.ci.codeaurora.org/10.30.48.3:41068 failed. The channel is > > > > closing down or has closed down > > > > > > > > https://jenkins.fd.io/job/csit-vpp-perf-verify-1904-3n- > > > > hsw/13/consol > > > > e > > > > > > > > Could you, please, have a look on it? > > > > > > > > Thank you very much. > > > > > > > > Regards, > > > > Jan > > > > > > > > > > > > -- Anton Baranov Sr. System Operations Engineer The Linux Foundation
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#12901): https://lists.fd.io/g/vpp-dev/message/12901 Mute This Topic: https://lists.fd.io/mt/31454813/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-