On 11/16/2011 09:58 AM, Amaury FRANCOIS wrote: > Hi, > > I've configured iscsi on RHEL 5.7 (last kernel) with a netapp disk > array and I'm seeing connection problems every 20 minutes (the > connection is dropped and comes back a few minutes or seconds later). > > Here are the messages in /var/log/messages : > > Nov 6 04:29:01 bdd4a kernel: connection1:0: ping timeout of 5 secs > expired, recv timeout 5, last rx 4356175664, last ping 4356180664, now > 4356185664
This means that we did not see any data/pdus from the target for 5 seconds (node.conn[0].timeo.noop_out_interval), so we sent a iscsi ping to make sure the target was still up. The target did not respond to the iscsi ping in 5 seconds (node.conn[0].timeo.noop_out_timeout), so we assumed there was a connection problem. The initiator will then drop the tcp/ip connection and try to reconnect and relogin and restart IO. > Nov 6 04:29:01 bdd4a kernel: connection1:0: detected conn error > (1011) > Nov 6 04:29:02 bdd4a iscsid: Kernel reported iSCSI connection 1:0 > error (1011) state (3) > Nov 6 04:29:07 bdd4a kernel: session1: session recovery timed out > after 5 secs We tried to relogin/reconnect for 5 seconds (node.session.timeo.replacement_timeout) but could not, so the initiator was instructed to fail IO. It then looks like we are able to log back in around 30 seconds later. When this happens it normally is a temporary issue with the network. For some reason we just cannot reconnect to the target for a long'ish (30 secs) time. > > I've also traced the network with tcpdump and analyzed it with > wireshark : > > ARP request about the server from Netapp > ARP response from the server -> the problem always seem to happen > after an ARP request from the netapp array > RST sent from the server on the iscsi connection -> !!!!! > > All NOP in, NOP out seemed to be OK just before, any idea why this > problem happen ? So in the network trace you can see the NOP get sent and its response before the tcp disconnection? If so then that might be a bug in the iscsi layer not processing it right. From the log above it looks like we did not get a response. Can you run the initiator in debug mode? It will lead to lots and lots of log output. echo 1 > /sys/module/libiscsi2/parameters/debug_libiscsi And could you also send your net trace if you have it. -- You received this message because you are subscribed to the Google Groups "open-iscsi" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
