Hi Ali, There is actually a minor bug/race condition in the gem5 ListenSocket::listen function (src/base/socktet.cc). I think Hao might be hitting this, I just haven't had time time to upload the patch for it to the mainline. I hit this when launching hundreds of simulations at the same time (on the same cluster that Hao is using).
gem5 makes the incorrect assumption that by "binding" a socket, it effectively has allocated a port. Linux only allocates ports once you call listen on the given socket, not when you call bind. So even if the port was free when bind was called, another process (gem5 instance) could race in between the bind & listen calls and steal the port. It's a small race condition, but it is there. In the current code, if the call to bind fails due to the port being in use (EADDRINUSE), gem5 retries for a different port. However if listen fails, gem5 just panics. The fix is testing the return value of listen and re-trying if it was due to EADDRINUSE. Here is my file's diff: diff -r a5943fcb8b22 src/base/socket.cc --- a/src/base/socket.cc Sun May 05 16:38:11 2013 -0500 +++ b/src/base/socket.cc Sun Aug 04 21:48:46 2013 -0500 @@ -103,11 +103,13 @@ return false; } - if (::listen(fd, 1) == -1) - panic("ListenSocket(listen): listen() failed!"); + if (::listen(fd, 1) == -1) { + if (errno != EADDRINUSE) + panic("ListenSocket(listen): listen() failed!"); + return false; + } listening = true; - anyListening = true; return true; } On Sun, Aug 4, 2013 at 9:26 PM, Ali Saidi <sa...@umich.edu> wrote: > gem5 opens up a number of ports when it starts for the terminal, > debugging, etc. However if a number of gem5 instances startup at the same > time they can conflict and you'll see the issue below. > > If you add m5.disableAllListeners() to the python script your problem will > go away. > > Ali > > On Aug 4, 2013, at 8:45 PM, Hao Wang <pkuwa...@gmail.com> wrote: > > > Hi. > > > > I get the following error: > > panic: ListenSocket(listen): listen() failed! > > @ cycle 0 > > [listen:build/ALPHA/base/socket.cc, line 107] > > > > when I tried to run hundreds of simulations on a cluster. > > > > Each one is a single-core simulation in SE mode. > > And I tried to limit the number of simulations on one node/machine to 4, > but this error still happens randomly. > > > > Any suggestions? > > > > Hao > > > > _______________________________________________ > > gem5-users mailing list > > gem5-users@gem5.org > > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users > > _______________________________________________ > gem5-users mailing list > gem5-users@gem5.org > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users >
_______________________________________________ gem5-users mailing list gem5-users@gem5.org http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users