Hi,

After looking into this a lot more, I’m still not closer to a solution or a root cause. For a while, I thought enabling debug logging for pppd (via the NM_PPP_DEBUG environment variable for NetworkManager) resolved the issue. However, it didn’t reliably. For the debug and non-debug case, I captured straces. In this particular instance, the debug-case didn’t show any problems re-establishing a connection after the connection and pppd was terminated while the non-debug case did. I have attached both straces. When looking at the straces, a couple of things seemed odd to me:

1. After the modem hangs up, the TTY is in a state where ioctls to the TTY return I/O errors. How is pppd supposed restore the termios settings when it’s hung up?
2. NetworkManager doesn’t wait to signal pppd with SIGTERM until pppd is in PHASE_DEAD, even though this should be happening. Ref https://github.com/paulusmack/ppp/issues/6#issuecomment-51176255
3. The termios flags present when pppd starts up differ for the debug/non-debug cases: c_iflags=0x5 vs 0x4 and c_lflags=0 vs 0x8a21. Why would that be the case? Doesn’t ModemManager set this up deterministically? Does it matter?

Besides that, my current working theory is that the problem is somehow caused by the periodic connection status check. If that detects a lost connection (+CGACT: 1,0) before pppd has terminated, it will emit the MM_PORT_CONNECTED notification which calls back to port_connected(). There, we try to re-acquire exclusive access to the TTY which will fail because pppd is still running. Afterwards, data_watch_enable starts to watch the TTY which immediately triggers with the modem hangup.

Sven

Attachment: hhe0-pppd-strace-debug.gz
Description: GNU Zip compressed data

Attachment: hhe0-pppd-strace-nodebug.gz
Description: GNU Zip compressed data





On 24 Oct 2018, at 16:46, Sven Schwermer <s...@svenschwermer.de> wrote:

It looks like I celebrated prematurely. As soon as I disabled the debug-level logging for NetworkManager and ModemManager, the issue started to appear again. This tells me that there is some sort of race condition. I’ll look into this more but I’d appreciate if one of the maintainers could explain the desired/expected series of events when the peer hangs up and pppd terminates.

Thank you!

Sven

On 23 Oct 2018, at 17:18, Sven Schwermer <s...@svenschwermer.de> wrote:

Hi again,

I think, I have found a fix for this. With the following two patches applied to NetworkManager 1.12.2, the issue seems to be gone (final confirmation pending):

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/commit/1913a4d2590d42413b04c0cf86321286bb2e2dbd
https://gitlab.freedesktop.org/NetworkManager/NetworkManager/commit/3392c439582b4ba5da19423fbaeed5843aeae9a9

I suspect, the latter made the difference because pppd basically hangs when the bus method call doesn’t return.

Sven

On 23 Oct 2018, at 11:37, Sven Schwermer <s...@svenschwermer.de> wrote:

Hi Aleksander, Matthew,

I looked into this and the issue that Aleksander referred to in the other thread (https://bugzilla.gnome.org/show_bug.cgi?id=734347). In the issue, Aleksander mentions that MM needs to be informed about the broken connection after pppd has terminated so that the CLOCAL setting is restored when MM interacts with the TTY again. However, if I interpret my logs correctly, MM detects itself that the connection is broken based on the modem’s response to AT+CGACT? (+CGACT: 1,0). This then triggers the notification signal MM_PORT_CONNECTED (with connected=false) which makes port_connected try to acquire the serial port lock which then fails.

Sven

On 22 Oct 2018, at 16:52, Matthew Starr <mst...@hedonline.com> wrote:

Aleksander,

This is the exact same issue I reported a couple months ago:
https://lists.freedesktop.org/archives/modemmanager-devel/2018-August/006593.html

I am currently working around it by adding a NetworkManager dispatcher script that on ppp0 going down, I restart ModemManager.

I never did find a solution to the issue even after trying some modifications to the serial force close code (src/mm-port-serial.c) in ModemManager.  It looks like the force_closed boolean gets set and is never reset, which causes the serial port to never retry communications over the serial port again.

Regards,
Matthew Starr

Date: Mon, 22 Oct 2018 11:39:37 +0200
From: Sven Schwermer <s...@svenschwermer.de>
To: Aleksander Morgado <aleksan...@aleksander.es>
Cc: modemmanager-devel@lists.freedesktop.org
Subject: Re: Infinite connection loop
Message-ID: <41689DBE-1755-4F85-BFA7-
97cf6e588...@svenschwermer.de>
Content-Type: text/plain; charset="utf-8"

I looked a little closer into this now.

Here’s what I think is happening:
1. My cell operator terminates the LCP after some time of inactivity. This is
confirmed by my operator.
2. pppd receives the termination request, acknowledges it and the modem
hangs up the connection.
3. ModemManager fails to re-acquire serial port lock because of the hangup.
4. ModemManager sets the connection state to ‘registered’.
5. ModemManager force-closes the serial port.
6. NetworkManager registers the lost connection and tries to re-establish it.
7. The ModemManager fails to re-open serial port because it had been force-
closed.

pppd doesn’t appear to be re-launched. How is this all supposed to work?
I’m sure the software stack should be able to handle a terminated
connection and re-dial it?

Please find attached the debug-level log. The LCP termination happens at Oct
21 14:53:51.

Best regards,
Sven

-------------- next part --------------
A non-text attachment was scrubbed...
Name: excerpt.long.gz
Type: application/x-gzip
Size: 53662 bytes
Desc: not available
URL: <https://lists.freedesktop.org/archives/modemmanager-
devel/attachments/20181022/1e9e348d/attachment.bin>
-------------- next part --------------


On 20 Oct 2018, at 16:26, Sven Schwermer <s...@svenschwermer.de>
wrote:

Hi again,

I now managed to capture the event in the journal. I limited the auto
connect retried to 4 in order to keep the log size manageable. I didn’t have
debug-level logging active for NetworkManager at the time. I’ll enable that
now and try to reproduce. Please find the log attached. The error seems to
be happening at Oct 20 07:44:26.

Best regards,
Sven

<ed6e56cf2ac5b6bdb2e3ca97276ebd6d.journal.log>

On 20 Oct 2018, at 13:47, Aleksander Morgado
<aleksan...@aleksander.es> wrote:

Hey,

I am running an embedded system with a u-blox SARA-U270 modem and
NetworkManager 1.12.2 / ModemManager 1.8.0. For some reason, the
modem connection seems to break over time. I am not quite sure why that’s
the case, but when I look into the journal, I see NetworkManager and
ModemManager spewing out log messages at a very high rate. This actually
causes significant system load (mostly by dbus-daemon). Here’s an excerpt
of the messages:

Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started...
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
(4/8): Wait to get fully enabled
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
(5/8): Register
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
(6/8): Bearer
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started...
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
(4/8): Wait to get fully enabled
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
(5/8): Register
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
(6/8): Bearer
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn>
[1539780869.7776] modem-broadband[ttyACM0]: failed to connect modem:
Could not open serial device ttyACM0: it has been forced close
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
[1539780869.7778] device (ttyACM0): state change: prepare -> failed (reason
'unknown', sys-iface-state: 'managed')
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
[1539780869.7807] manager: NetworkManager state is now
CONNECTED_LOCAL
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn>
[1539780869.7840] device (ttyACM0): Activation: failed for connection
'gprsconnection'
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
[1539780869.7860] device (ttyACM0): state change: failed -> disconnected
(reason 'none', sys-iface-state: 'managed')
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
[1539780869.7923] policy: auto-activating connection 'gprsconnection'
(14e02637-0ffb-3a28-806f-ad60904afe7f)
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
[1539780869.8005] device (ttyACM0): Activation: starting connection
'gprsconnection' (14e02637-0ffb-3a28-806f-ad60904afe7f)
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
[1539780869.8015] device (ttyACM0): state change: disconnected -> prepare
(reason 'none', sys-iface-state: 'managed')
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
[1539780869.8043] manager: NetworkManager state is now CONNECTING
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started...
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
(4/8): Wait to get fully enabled
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
(5/8): Register
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
(6/8): Bearer
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect started...
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
(4/8): Wait to get fully enabled
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
(5/8): Register
Oct 17 12:54:29 cf352c1 ModemManager[1385]: Simple connect state
(6/8): Bearer
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn>
[1539780869.9321] modem-broadband[ttyACM0]: failed to connect modem:
Could not open serial device ttyACM0: it has been forced close
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
[1539780869.9323] device (ttyACM0): state change: prepare -> failed (reason
'unknown', sys-iface-state: 'managed')
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
[1539780869.9350] manager: NetworkManager state is now
CONNECTED_LOCAL
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <warn>
[1539780869.9399] device (ttyACM0): Activation: failed for connection
'gprsconnection'
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
[1539780869.9430] device (ttyACM0): state change: failed -> disconnected
(reason 'none', sys-iface-state: 'managed')
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
[1539780869.9490] policy: auto-activating connection 'gprsconnection'
(14e02637-0ffb-3a28-806f-ad60904afe7f)
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
[1539780869.9569] device (ttyACM0): Activation: starting connection
'gprsconnection' (14e02637-0ffb-3a28-806f-ad60904afe7f)
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
[1539780869.9585] device (ttyACM0): state change: disconnected -> prepare
(reason 'none', sys-iface-state: 'managed')
Oct 17 12:54:29 cf352c1 NetworkManager[677]: <info>
[1539780869.9653] manager: NetworkManager state is now CONNECTING

It continues like that forever. My modem network connection config is
as follows:

[connection]
id=gprsconnection
type=gsm
autoconnect=true
autoconnect-retries=0
[gsm]
number=*99#
[ipv4]
dns-search=
method=auto
may-fail=false

I have been looking through the code and what I don’t understand is
why the ttyACM0 would ever be forcefully closed. What could be an
explanation for this?

Any insight is appreciated.


Do you have a full log of when this issue happens? If you can
reproduce this more or less frequently, could you setup both
ModemManager and NetworkManager to dump debug logs and try to get
it
reproduced? This issue could be to a wrong synchronization between MM
and NM/pppd, in particular related to the CLOCAL settings in each case
(i.e. MM shouldn't get HUPs in the serial port unless for the cases
where the port is going away).

--
Aleksander
https://aleksander.es




_______________________________________________
ModemManager-devel mailing list
ModemManager-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/modemmanager-devel

Reply via email to