On 09/24/2011 12:48 PM, Michael Meeks wrote:
I'm poking at an endless hang in the smoketest:

#12  0xb7d24aec in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libc.so.6
#3  0xb7f1b6c0 in osl_waitCondition ()
from /data/opt/libreoffice/core/solver/unxlngi6.pro/lib/libuno_sal.so.3
#4  0xb72db42a in osl::Condition::wait (this=0xbfffb8c4, pTimeout=0x0)
at /data/opt/libreoffice/core/solver/unxlngi6.pro/inc/osl/conditn.hxx:84
#5  0xb72d9024 in (anonymous namespace)::Test::test (this=0xb7c16008)
at /data/opt/libreoffice/core/smoketestoo_native/smoketest.cxx:200
#6  0xb72d9e2e in CppUnit::TestCaller<<unnamed>::Test>::runTest(void)
(this=0xb73ac0a8)
at /data/opt/libreoffice/core/solver/unxlngi6.pro/inc/cppunit/TestCaller.h:166

        If I were a betting man I'd say this is down to us waiting on a
condition, and not spinning the main-loop; but (to be honest) this
remote-control nonsense is somewhat opaque to me. I see no live
soffice.bin process being controlled. I was slightly amazed to read:

toolkit/source/awt/AsyncCallback::addCallback()

        which seems to do nothing / not fire an exception if
Application::IsInMain() is not true - which is in itself odd.

        I have another quiescent thread:

#2  0xb7d24b44 in pthread_cond_timedwait@@GLIBC_2.3.2 ()
from /lib/libc.so.6
#3  0xb7f3f18e in ?? ()
from /data/opt/libreoffice/core/solver/unxlngi6.pro/lib/libuno_sal.so.3
#4  0xb7c28b05 in start_thread (arg=0xb7c0fb70) at pthread_create.c:297
#5  0xb7d16d5e in clone () from /lib/libc.so.6

        So - I'm tempted to say:

     Result result;
     // Shifted to main thread to work around potential deadlocks
(i112867):
     com::sun::star::awt::AsyncCallback::create(
         connection_.getComponentContext())->addCallback(
             new Callback(
                 disp, url, css::uno::Sequence<  css::beans::PropertyValue
(),
                 new Listener(&result)),
             css::uno::Any());
     result.condition.wait();
     CPPUNIT_ASSERT(result.success);

        should be a timed wait - but only if we fail if the timeout is
triggered (ie. not on the common path). I've committed that at 30
seconds - possibly this needs tweaking to be infinite when under the
debugger.

A timed wait is no solution here. (Timeouts in this kind of code pose at least two problems. For one, they prevent a human from coming back to a hung "make check" after a while, only to find out they no longer get a clue where it hang, as the build has unhelpfully been forced to move forward. For another, what is typically also needed is proper cleanup, like killing abandoned sub-processes, so that manual intervention is needed, anyway.) The real solution, instead, is to not only wait on the Result object, but also on the OfficeConnection. Fixed as <http://cgit.freedesktop.org/libreoffice/core/commit/?id=c09b966f94f5a50fe537916398451339f008947d>.

-Stephan
_______________________________________________
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice

Reply via email to