Hi all,

I have a few applications that use the shared memory API. I’m running these on 
CentOS 7.4, and starting VPP using systemd. If VPP happens to crash or be 
intentionally restarted, those applications never seem to recover their API 
connection. They notice that the original VPP process died and try to call 
vl_client_disconnect_from_vlib(). That call tries to send API messages to 
cleanly shut down its connection. The application will time out waiting for a 
response, write a message like:

'vl_client_disconnect:301: peer unresponsive, give up

and eventually consider itself disconnected. When it tries to reconnect, it 
hangs for a while (100 seconds on the last occurrence I checked on) and then 
prints messages like:

vl_map_shmem:619: region init fail
connect_to_vlib_internal:394: vl_client_api map rv -2

The client keeps on trying and continues seeing those same errors. If the 
client is restarted, it sees the same errors after restart. It doesn’t recover 
until VPP is restarted with the client stopped. Once that happens, the client 
can be started again and successfully connect.

The VPP systemd service file that is installed with RPMs built via ‘make 
pkg-rpm' has the following:

[Service]
ExecStartPre=-/bin/rm -f /dev/shm/db /dev/shm/global_vm /dev/shm/vpe-api

When systemd starts VPP, it removes these files which the still-running client 
applications have run shm_open/mmap on. I am guessing that when those clients 
try to disconnect with vl_client_disconnect_from_vlib(), they are stomping on 
something in shared memory that subsequently keeps them from being able to 
connect. If I comment that command from the systemd service definition, the 
problem behavior I described above disappears. The applications write one ‘peer 
unresponsive’ message and then they reconnect to the API successfully and all 
is (relatively) well. This also is the case if I don’t start VPP with 
systemd/systemctl and just run /usr/bin/vpp directly.

Does anyone have any thoughts on whether it would be ok to remove that command 
from the systemd service file? Or is there some other better way to deal with 
VPP crashing from the perspective of a client to the shared memory API?

Thanks!
-Matt

_______________________________________________
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Reply via email to