desktop/source/app/officeipcthread.cxx | 69 ++++++++++++++++----------------- 1 file changed, 34 insertions(+), 35 deletions(-)
New commits: commit 38081c0884b64ed1132047973b4dccc42d548c89 Author: Stephan Bergmann <sberg...@redhat.com> Date: Tue Oct 24 17:25:47 2017 +0200 Avoid race between DbusIpcThread::close and DbusIpcThread::execute ...that caused Flatpak'ed LO to sometimes not terminate properly after 243d743dae127fe11ecd7f807121da8815af8f01 "solenv/flatpak-manifest.in: incorporate upstream sandboxing improvements" had removed --socket=session-bus and thus introduced flatpak-dbus-proxy into the mix: > Oct 24 15:25:16 <sberg_> I'm not sure who's at fault there; on the LO side I > have a thread processing incoming dbus requests (DbusIpcThread::execute, > <https://opengrok.libreoffice.org/xref/core/desktop/source/app/officeipcthread.cxx#552>); > that thread is typically waiting within dbus_connection_read_write_dispatch > for new messages; if LO wants to close, it needs to get that thread out of > dbus_connection_read_write_dispatch, and the only way I found back then is > what's in DbusIpcThread::close (just following DbusIpcThread::execute), with > a big "this apparently needs a more DBus-y design anyway" testament to my > cluelesness re dbus; it calls dbus_bus_get_private to connect to itself, and > send a Close message that'll cause DbusIpcThread::execute to fall out of the > loop; the dbus_bus_get_private leads to a flatpak_proxy_incoming in the > proxy, which returns TRUE, and after that I see no further activity in the > proxy (but not sure what functions I should all put breakpoints on), and the > LO side is hung in dbus_bus_release_name a few lines futher down > Oct 24 15:26:20 <alexlarsson> sberg_: i ran this: > https://github.com/sgh/dbus-examples/blob/master/dbus-signal.c > Oct 24 15:26:27 <alexlarsson> sberg_: which calls dbus_bus_release_name > Oct 24 15:26:30 <alexlarsson> and it seems to work for me > Oct 24 15:26:39 <alexlarsson> If i --own-name=org.DBusTest.SignalTest > Oct 24 15:27:50 <sberg_> so maybe it's related to my unorthodox way of trying > to quit the dbus loop there; if anybody with a clue about dbus would be > willing to help me with the above, I'd be very grateful :) > Oct 24 15:28:06 <alexlarsson> I don't see how the proxy should affect it tho > Oct 24 15:28:11 <alexlarsson> other than maybe timing? > Oct 24 15:29:06 <sberg_> maybe; I think I've seen things actually succeed once > when I stepped through the proxy somewhat manually > Oct 24 15:31:01 <alexlarsson> sberg_: eeeentresting > Oct 24 15:31:07 <alexlarsson> sberg_: so, it does an own-call > Oct 24 15:31:13 <alexlarsson> sberg_: maybe that is what breaks? > Oct 24 15:31:33 <sberg_> yeah, that's my dumb speculation > Oct 24 15:34:38 <alexlarsson> So, you send the close from another thread, not > waiting for the reply > Oct 24 15:34:56 <alexlarsson> then you start working on the main connection > immediately > Oct 24 15:35:02 <alexlarsson> shouldn > Oct 24 15:35:25 <alexlarsson> shouldn't you grab the mutex after that? > Oct 24 15:36:35 <alexlarsson> This whole thing smells racy > Oct 24 15:38:18 <alexlarsson> My guess is that with regular dbus, the flush > call will schedule dbus daemon which sends back the message, waking up the > other thread before continuing > Oct 24 15:38:22 <sberg_> alexlarsson, ah, yeah, I see; apparently didn't > occur to me back then that I mustn't call dbus_bus_release_name when the > execute loop may still be busy > Oct 24 15:38:25 <alexlarsson> But now you have two context switches > Oct 24 15:38:37 <alexlarsson> so we continue before we get the reply > Oct 24 15:38:44 <alexlarsson> eh, not reply > Oct 24 15:38:46 <alexlarsson> the close message > Oct 24 15:38:52 <alexlarsson> and then you race > Oct 24 15:39:11 <alexlarsson> Honestly i don't remember the exact rules for > dbus threadedness > Oct 24 15:39:58 <sberg_> ...so looks like something that can be corrected on > the LO side after all; I'll give it a try > Oct 24 15:40:17 <alexlarsson> you're calling dbus_threads_init_default(), so > it should be nominally threadsafe > Oct 24 15:40:29 <alexlarsson> still, I imagine it can still deadlock > Oct 24 15:43:12 <alexlarsson> sberg_: i still don't see the exact deadlock > though > Oct 24 15:43:32 <alexlarsson> i imagine working on the separate private > connection is threadsafe > Oct 24 15:43:35 <alexlarsson> and we flush that > Oct 24 15:44:08 <alexlarsson> then we call dbus_bus_release_name, which i > imagine would block if the dbus thread is in read_write_dispatch > Oct 24 15:44:35 <alexlarsson> The question is, why is the close message not > delivered? > Oct 24 15:44:44 <alexlarsson> I mean, we flushed it... > Oct 24 15:45:48 <alexlarsson> sberg_: oh, i think i know > Oct 24 15:45:53 <alexlarsson> sberg_: release_name probably sent some message, > but then the other thread pop:ed the reply > Oct 24 15:46:04 <alexlarsson> sberg_: queue very long wait > Oct 24 15:51:53 <sberg_> alexlarsson, yeah, sounds plausible; thanks! > Oct 24 15:52:14 <sberg_> (now that I want to reproduce it, closing succeeds > cleanly every time I try, of course...) > Oct 24 15:54:48 <alexlarsson> sberg_: obviously > Oct 24 15:55:30 <alexlarsson> sberg_: that just makes the race theory more > valid tho > Oct 24 15:57:31 <sberg_> alexlarsson, yup, after stopping the unrelated > background LO build that had put the machine under full load, I see it hang > again in release_name, while the execute loop is still happily running too; > that nicely confirms your theory > Oct 24 15:58:28 <alexlarsson> sberg_: alternatively you could tell your users > to always build LO in the background > Oct 24 15:59:04 <sberg_> I'll try to put that into the release notes; lets > see... Change-Id: I2a8a58f9259d2854f42f4aa3db5bb232cf70845d diff --git a/desktop/source/app/officeipcthread.cxx b/desktop/source/app/officeipcthread.cxx index bb597e665ff0..b2380c0f9c43 100644 --- a/desktop/source/app/officeipcthread.cxx +++ b/desktop/source/app/officeipcthread.cxx @@ -612,43 +612,8 @@ void DbusIpcThread::execute() } dbus_connection_flush(connection_.connection); } -} - -void DbusIpcThread::close() { - assert(connection_.connection != nullptr); DBusError e; dbus_error_init(&e); - { - // Let DbusIpcThread::execute return from dbus_connection_read_write; - // for now, just abort on failure (the process would otherwise block, - // with DbusIpcThread::execute hanging in dbus_connection_read_write); - // this apparently needs a more DBus-y design anyway: - DbusConnectionHolder con(dbus_bus_get_private(DBUS_BUS_SESSION, &e)); - assert((con.connection == nullptr) == bool(dbus_error_is_set(&e))); - if (con.connection == nullptr) { - SAL_WARN( - "desktop.app", - "dbus_bus_get_private failed with: " << e.name << ": " - << e.message); - dbus_error_free(&e); - std::abort(); - } - DbusMessageHolder msg( - dbus_message_new_method_call( - "org.libreoffice.LibreOfficeIpc0", - "/org/libreoffice/LibreOfficeIpc0", - "org.libreoffice.LibreOfficeIpcIfc0", "Close")); - if (msg.message == nullptr) { - SAL_WARN("desktop.app", "dbus_message_new_method_call failed"); - std::abort(); - } - if (!dbus_connection_send(con.connection, msg.message, nullptr)) - { - SAL_WARN("desktop.app", "dbus_connection_send failed"); - std::abort(); - } - dbus_connection_flush(con.connection); - } int n = dbus_bus_release_name( connection_.connection, "org.libreoffice.LibreOfficeIpc0", &e); assert((n == -1) == bool(dbus_error_is_set(&e))); @@ -673,6 +638,40 @@ void DbusIpcThread::close() { } } +void DbusIpcThread::close() { + assert(connection_.connection != nullptr); + DBusError e; + dbus_error_init(&e); + // Let DbusIpcThread::execute return from dbus_connection_read_write; for + // now, just abort on failure (the process would otherwise block, with + // DbusIpcThread::execute hanging in dbus_connection_read_write); this + // apparently needs a more DBus-y design anyway: + DbusConnectionHolder con(dbus_bus_get_private(DBUS_BUS_SESSION, &e)); + assert((con.connection == nullptr) == bool(dbus_error_is_set(&e))); + if (con.connection == nullptr) { + SAL_WARN( + "desktop.app", + "dbus_bus_get_private failed with: " << e.name << ": " + << e.message); + dbus_error_free(&e); + std::abort(); + } + DbusMessageHolder msg( + dbus_message_new_method_call( + "org.libreoffice.LibreOfficeIpc0", + "/org/libreoffice/LibreOfficeIpc0", + "org.libreoffice.LibreOfficeIpcIfc0", "Close")); + if (msg.message == nullptr) { + SAL_WARN("desktop.app", "dbus_message_new_method_call failed"); + std::abort(); + } + if (!dbus_connection_send(con.connection, msg.message, nullptr)) { + SAL_WARN("desktop.app", "dbus_connection_send failed"); + std::abort(); + } + dbus_connection_flush(con.connection); +} + #endif namespace _______________________________________________ Libreoffice-commits mailing list libreoffice-comm...@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/libreoffice-commits