Hi Ali,

There is actually a minor bug/race condition in the gem5
ListenSocket::listen function (src/base/socktet.cc).  I think Hao might be
hitting this, I just haven't had time time to upload the patch for it to
the mainline.  I hit this when launching hundreds of simulations at the
same time (on the same cluster that Hao is using).

gem5 makes the incorrect assumption that by "binding" a socket, it
effectively has allocated a port. Linux only allocates ports once you call
listen on the given socket, not when you call bind.  So even if the port
was free when bind was called, another process (gem5 instance) could race
in between the bind & listen calls and steal the port.  It's a small race
condition, but it is there.  In the current code, if the call to bind fails
due to the port being in use (EADDRINUSE), gem5 retries for a different
port.  However if listen fails, gem5 just panics.   The fix is testing the
return value of listen and re-trying if it was due to EADDRINUSE.

Here is my file's diff:

diff -r a5943fcb8b22 src/base/socket.cc
--- a/src/base/socket.cc Sun May 05 16:38:11 2013 -0500
+++ b/src/base/socket.cc Sun Aug 04 21:48:46 2013 -0500
@@ -103,11 +103,13 @@
         return false;
     }

-    if (::listen(fd, 1) == -1)
-        panic("ListenSocket(listen): listen() failed!");
+    if (::listen(fd, 1) == -1) {
+        if (errno != EADDRINUSE)
+            panic("ListenSocket(listen): listen() failed!");
+        return false;
+    }

     listening = true;
-
     anyListening = true;
     return true;
 }




On Sun, Aug 4, 2013 at 9:26 PM, Ali Saidi <sa...@umich.edu> wrote:

> gem5 opens up a number of ports when it starts for the terminal,
> debugging, etc. However if a number of gem5 instances startup at the same
> time they can conflict and you'll see the issue below.
>
> If you add m5.disableAllListeners() to the python script your problem will
> go away.
>
> Ali
>
> On Aug 4, 2013, at 8:45 PM, Hao Wang <pkuwa...@gmail.com> wrote:
>
> > Hi.
> >
> > I get the following error:
> >     panic: ListenSocket(listen): listen() failed!
> >     @ cycle 0
> >     [listen:build/ALPHA/base/socket.cc, line 107]
> >
> > when I tried to run hundreds of simulations on a cluster.
> >
> > Each one is a single-core simulation in SE mode.
> > And I tried to limit the number of simulations on one node/machine to 4,
> but this error still happens randomly.
> >
> > Any suggestions?
> >
> > Hao
> >
> > _______________________________________________
> > gem5-users mailing list
> > gem5-users@gem5.org
> > http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
> _______________________________________________
> gem5-users mailing list
> gem5-users@gem5.org
> http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users
>
_______________________________________________
gem5-users mailing list
gem5-users@gem5.org
http://m5sim.org/cgi-bin/mailman/listinfo/gem5-users

Reply via email to