On 6/30/2020 2:58 AM, Reto Gähwiler wrote:
Hello Everyone,
I am facing the following problem working with nuttx and ethernet
connections. A TCP socket is setup as blocking and connected to the server.
The connection is handled in one thread which hangs in the recv call and
processes the data if some arrives. In case of an error the connection is
closed.
Now, if a close() call on that particular TCP connection is called from a
different thread, it terminates the connection and the recv() fails and
breaks free.
If we now connect to a new IP, it first seems to be fine but shortly after
the whole network disappears. No more icmp responses (therefore no ping)
and all other opened connections in different threads are not reachable
anymore. Besides, any of the still opened connections starts to consume all
cpu time. Looking into it with the debugger attached it can be seen,
that in the net/devif/devif_callback.c the for-loop looking for the
callback in the device event list is cycling without an end.
Looking at wireshark while data is transmitted from my client to the server
it looks as follows around the termination. So basically before we
reconnect and fail.
No. Time Source Destination Protocol Length Src.MacAddress Info
43178 0.000451 195.65.177.171 10.62.64.110 TCP 75 Fortinet_09:00:06 29500
→ 1026 [PSH, ACK] Seq=30001 Ack=759475 Win=1758 Len=21
43179 0.000102 10.62.64.110 195.65.177.171 TCP 60 xxxx_0c:70:04 1026 →
29500 [ACK] Seq=759475 Ack=30022 Win=5954 Len=0
43182 0.001144 10.62.64.110 195.65.177.171 TCP 586 xxxx_0c:70:04 1026 →
29500 [PSH, ACK] Seq=759475 Ack=30022 Win=6150 Len=532
43183 0.000437 10.62.64.110 195.65.177.171 TCP 60 xxxx_0c:70:04 [TCP
Out-Of-Order] 1026 → 29500 [FIN, ACK] Seq=759475 Ack=30022 Win=6150 Len=0
43184 0.000049 195.65.177.171 10.62.64.110 TCP 75 Fortinet_09:00:06 29500
→ 1026 [PSH, ACK] Seq=30022 Ack=760007 Win=1758 Len=21
43185 0.000090 10.62.64.110 195.65.177.171 TCP 60 xxxx_0c:70:04 1026 →
29500 [RST, ACK] Seq=760007 Ack=30043 Win=1758 Len=0
43186 0.000012 195.65.177.171 10.62.64.110 TCP 60 Fortinet_09:00:06 [TCP
Dup ACK 43184#1] 29500 → 1026 [ACK] Seq=30043 Ack=760007 Win=1758 Len=0
43187 0.000096 10.62.64.110 195.65.177.171 TCP 60 xxxx_0c:70:04 1026 →
29500 [RST, ACK] Seq=760007 Ack=30043 Win=1758 Len=0
As can be seen are the clients (the device I am working on) sequence number
not synchronised after the last data transmit (seq=759475, len=532 -->
nextseq=760007) and the FIN,ACK also sent by the device (seq=759475 as
well!!!). Therefore, it looks like closing a connection this way is not
thread safe!
In case of an idle connection the sequence numbers would look just fine but
the next connection will trigger the same error.
I then also tried to make use of the shutdown and call it from the thread I
used to call close, but shutdown.c is just a dummy API as already noticed
by seanshpark
<https://nuttx.yahoogroups.narkive.com/YjaUuARV/socket-shutdown>5 years ago.
The platform the code is executed on is based on a stm32h743zi. Since
things seem to happen in the libraries it could affect other platforms as
well.
I was wondering if anyone else ran into the issue of calling close on a
socket from a different thread as the recv/send is handled on and that the
following connection kills the entire ethernet? Please let me know if you
know a fix for blocking sockets or it would be better to go with
non-blocking and work with select/poll instead.
Thanks for your input and help,
best regards, Reto
I am little confused by what you mean by a thread. If you are talking
about a pthread, then yes, it should be able to close any socket opened
by the task group.
The main thread and its children pthreads are members of the same task
group:
https://cwiki.apache.org/confluence/display/NUTTX/Tasks+vs.+Threads+FAQ
Any member of the task group close a socket. The socket is a private
resource of the task group.
If you are trying to close a socket in Task B that was opened in Task A,
that will not work. The socket is a private resource of Task A and
cannot be closed by Task B.