Testing hangs with Ninja build method
Platform: AMD Bulldozer (yes, that old) OS: Gentoo X86_64 gcc: 11.3.0 (Gentoo 11.3.0-p4) clang: 13.0.1 Xerces-C: v3.2.3 Xalan-C: v1.12.0 Xerces-C was installed via Gentoo ebuild. No errors or warnings were encountered during install. Xalan-C was cloned from github repository. Followed steps for a Ninja build. A few odd warnings were encountered during the process(i.e. -Wdeprecated-copy, -Wextra, etc). No errors reported during build. commands: $ cd [cloned repo] $ cmake -H. -Bbuild -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/local/Xalan-C -DCMAKE_BUILD_TYPE=Release ... -- Xalan-C++ configuration summary -- --- -- -- Version: 1.12.0 -- Library major version: 112 -- Library minor version: 0 -- -- Installation directory: /usr/local/Xalan-C -- C compiler: /usr/bin/cc -- C++ compiler: /usr/bin/c++ -- -- Build shared libraries: ON -- Thread implementation: standard -- Transcoder: icu -- Message Loader: inmemory -- Message Loader Locale: en_US -- Configuring done -- Generating done -- Build files have been written to: /home/[path]/[cloned repo]/build $ ninja -C build When executing testing ($ cd build; ctest -V -j8;), test #21 will hang indefinitely with no timeout encountered. Numerous `21: Waiting for active threads to finish...` messages were reported. Last information reported in the terminal was `21: Waiting for active threads to finish...` message. After ~15 minutes wating, testing process was manually aborted. I cleaned and reset the cloned repository to start again. I repeated the above except using clang/clang++ build. I received the same results with compile. Testing would still hang indefinitely waiting for active threads. Testing had to be manually aborted. Cleaning and resetting the cloned repository, above steps were repeated using `... -G "Unix Makefiles" ...` in the CMake command instead. Configure and build successfully completed similar to above. Unlike the above results, testing with command `make -C build test` command ran to successful completion and exited correctly. I repeated testing with ctest on the build. Again, testing completed and exited successfully. I suspect there is some deviation between the makefile and ninja build setup. A ninja setup, utilizing either gcc or clang, results in testing to hang indefinitely. Please advise if there is some other step should be taken to diagnose.
RE: Testing hangs with Ninja build method
Hi Scott, Hard to diagnose without more information. Can you build with verbose logging, so we can see what ninja is waiting on? Does "ps" or "pstree" show any children which have become stuck? Does it happen if you build with no parallelisation? It's possible there is a broken rule being emitted. It's not been picked up by the CI though. Also possible you've encountered a ninja bug--I've seen them in the past though I would think it unlikely. Or possibly a bug in the CMake version you are using if it's generating a broken rule. If you can identify where it's getting stuck, it might be worth looking through the generated Ninja file to see if there is any obviously broken rule in there. Also worth running ninja under strace so you can follow what's going on. If you trace all threads and child processes, that might give you an indication about what it's waiting on. Kind regards, Roger > -Original Message- > From: Scott Furry > Sent: 16 October 2022 16:25 > To: c-users@xalan.apache.org > Subject: Testing hangs with Ninja build method > > Platform: AMD Bulldozer (yes, that old) > OS: Gentoo X86_64 > gcc: 11.3.0 (Gentoo 11.3.0-p4) > clang: 13.0.1 > Xerces-C: v3.2.3 > Xalan-C: v1.12.0 > > Xerces-C was installed via Gentoo ebuild. No errors or warnings were > encountered during install. > Xalan-C was cloned from github repository. Followed steps for a Ninja build. > A few odd warnings were encountered during the process(i.e. > -Wdeprecated-copy, -Wextra, etc). No errors reported during build. > > commands: > $ cd [cloned repo] > $ cmake -H. -Bbuild -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/local/Xalan-C > -DCMAKE_BUILD_TYPE=Release > ... > -- Xalan-C++ configuration summary > -- --- > -- > -- Version: 1.12.0 > -- Library major version: 112 > -- Library minor version: 0 > -- > -- Installation directory: /usr/local/Xalan-C > -- C compiler: /usr/bin/cc > -- C++ compiler: /usr/bin/c++ > -- > -- Build shared libraries: ON > -- Thread implementation: standard > -- Transcoder: icu > -- Message Loader: inmemory > -- Message Loader Locale: en_US > -- Configuring done > -- Generating done > -- Build files have been written to: /home/[path]/[cloned repo]/build $ ninja > - > C build > > When executing testing ($ cd build; ctest -V -j8;), test #21 will hang > indefinitely with no timeout encountered. Numerous `21: Waiting for active > threads to finish...` messages were reported. Last information reported in > the terminal was `21: Waiting for active threads to finish...` message. After > ~15 minutes wating, testing process was manually aborted. > > I cleaned and reset the cloned repository to start again. I repeated the above > except using clang/clang++ build. I received the same results with compile. > Testing would still hang indefinitely waiting for active threads. Testing had > to > be manually aborted. > > Cleaning and resetting the cloned repository, above steps were repeated > using `... -G "Unix Makefiles" ...` in the CMake command instead. > Configure and build successfully completed similar to above. > > Unlike the above results, testing with command `make -C build test` > command ran to successful completion and exited correctly. I repeated > testing with ctest on the build. Again, testing completed and exited > successfully. > > I suspect there is some deviation between the makefile and ninja build setup. > A ninja setup, utilizing either gcc or clang, results in testing to hang > indefinitely. > > Please advise if there is some other step should be taken to diagnose.
Re: Testing hangs with Ninja build method
Roger, From what I have encountered in the past, Ninja appears notorious for rushing to compile leaving "holes", gaps or otherwise in its wake on occasion. It's easy to blame Ninja but it is also possible there is misconfiguration involved. I'm not personally a fan of Ninja but I haven't made a point of avoiding it outright. I thought it abnormal for Ninja and Makefiles to produce different results when testing. After a Ninja build, I had used `ctest -V ...`. It was giving me what I thought was a verbose output. What was shown appeared unhelpful. Output given: - $ ninja -C build test ... (snip) ... 21: Started thread number 46, using pre-parsed documents. 21: Started thread number 47, using unparsed documents. 21: Started thread number 48, using pre-parsed documents. 21: Started thread number 49, using unparsed documents. 16/21 Test #14: TraceListen-3 Passed 0.15 sec 17/21 Test #17: UseStylesheetParam ... Passed 0.12 sec 18/21 Test #18: XalanTransform ... Passed 0.11 sec 19/21 Test #19: XalanTransformerCallback . Passed 0.11 sec 20/21 Test #20: SimpleXPathCAPI .. Passed 0.09 sec 21: Started thread number 50, using pre-parsed documents. 21: Started thread number 51, using unparsed documents. 21: Started thread number 52, using pre-parsed documents. 21: Started thread number 53, using unparsed documents. 21: Started thread number 54, using pre-parsed documents. 21: Started thread number 55, using unparsed documents. 21: Started thread number 56, using pre-parsed documents. 21: Started thread number 57, using unparsed documents. 21: Started thread number 58, using pre-parsed documents. 21: Started thread number 59, using unparsed documents. 21: Waiting for active threads to finish... # <-- indefinite hang ^C # <-- manually aborting process - No other details were produced. Again, this is after successful configure with CMake and compile with both gcc/clang. It's just testing being difficult. I am unsure how to proceed with "build with verbose logging". I am assuming a compiler/debuggin switch (-O3 or similar?). Please confirm. I am following the basic information given on build instructions (https://apache.github.io/xalan-c/build.html) and I am rather rusty on compiling. The references to "ps" or "pstree" are foreign to me. Guidance would be appreciated. Thank You, Scott Furry On 2022-10-16 09:43, Roger Leigh wrote: Hi Scott, Hard to diagnose without more information. Can you build with verbose logging, so we can see what ninja is waiting on? Does "ps" or "pstree" show any children which have become stuck? Does it happen if you build with no parallelisation? It's possible there is a broken rule being emitted. It's not been picked up by the CI though. Also possible you've encountered a ninja bug--I've seen them in the past though I would think it unlikely. Or possibly a bug in the CMake version you are using if it's generating a broken rule. If you can identify where it's getting stuck, it might be worth looking through the generated Ninja file to see if there is any obviously broken rule in there. Also worth running ninja under strace so you can follow what's going on. If you trace all threads and child processes, that might give you an indication about what it's waiting on. Kind regards, Roger -Original Message- From: Scott Furry Sent: 16 October 2022 16:25 To: c-users@xalan.apache.org Subject: Testing hangs with Ninja build method Platform: AMD Bulldozer (yes, that old) OS: Gentoo X86_64 gcc: 11.3.0 (Gentoo 11.3.0-p4) clang: 13.0.1 Xerces-C: v3.2.3 Xalan-C: v1.12.0 Xerces-C was installed via Gentoo ebuild. No errors or warnings were encountered during install. Xalan-C was cloned from github repository. Followed steps for a Ninja build. A few odd warnings were encountered during the process(i.e. -Wdeprecated-copy, -Wextra, etc). No errors reported during build. commands: $ cd [cloned repo] $ cmake -H. -Bbuild -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/local/Xalan-C -DCMAKE_BUILD_TYPE=Release ... -- Xalan-C++ configuration summary -- --- -- -- Version: 1.12.0 -- Library major version: 112 -- Library minor version: 0 -- -- Installation directory: /usr/local/Xalan-C -- C compiler: /usr/bin/cc -- C++ compiler: /usr/bin/c++ -- -- Build shared libraries: ON -- Thread implementation: standard -- Transcoder: icu -- Message Loader: inmemory -- Message Loader Locale: en_US -- Configuring done -- Generating done -- Build files have been written to: /home/[path]/[cloned repo]/build $ ninja - C build When executing testing ($ cd build; ctest -V -j8;), test #21 will hang indefinitely with no timeout encountered. Numerous `21: Waiting for active threa
RE: Testing hangs with Ninja build method
Hi Scott, This isn't ninja at fault. Ninja itself should have a complete dependency graph, so its behaviour should not be materially different than traditional make--but it's usually much faster due to the lack of pattern rules etc since they have to be expanded up front at generation time (by CMake). I would firstly suggest splitting up the build and test steps. That might be one cause of problems--the test rules might not depend upon the build products. That's likely due to the "ctest" test framework being a bit separated from the build--it might not know which targets to depend upon, so it expects you to build everything needed first. Secondly, try running "ctest" rather than using the "test" target. This way you'll invoke the test runner directly. This will let you control the parallelisation of the test execution. Use "ctest -j nnn" to run with 1 or multiple parallel tests. If Xalan-C has issues with the tests interfering with each other then this will identify if that's a problem, separate from any issues with the build system. Kind regards, Roger > -Original Message- > From: Scott Furry > Sent: 16 October 2022 17:09 > To: c-users@xalan.apache.org > Subject: Re: Testing hangs with Ninja build method > > Roger, > > From what I have encountered in the past, Ninja appears notorious for > rushing to compile leaving "holes", gaps or otherwise in its wake on occasion. > It's easy to blame Ninja but it is also possible there is misconfiguration > involved. I'm not personally a fan of Ninja but I haven't made a point of > avoiding it outright. I thought it abnormal for Ninja and Makefiles to produce > different results when testing. > > > After a Ninja build, I had used `ctest -V ...`. It was giving me what I > thought > was a verbose output. What was shown appeared unhelpful. > Output given: > - > $ ninja -C build test > ... (snip) ... > 21: Started thread number 46, using pre-parsed documents. > 21: Started thread number 47, using unparsed documents. > 21: Started thread number 48, using pre-parsed documents. > 21: Started thread number 49, using unparsed documents. > 16/21 Test #14: TraceListen-3 Passed 0.15 sec > 17/21 Test #17: UseStylesheetParam ... Passed 0.12 sec > 18/21 Test #18: XalanTransform ... Passed 0.11 sec > 19/21 Test #19: XalanTransformerCallback . Passed 0.11 sec > 20/21 Test #20: SimpleXPathCAPI .. Passed 0.09 sec > 21: Started thread number 50, using pre-parsed documents. > 21: Started thread number 51, using unparsed documents. > 21: Started thread number 52, using pre-parsed documents. > 21: Started thread number 53, using unparsed documents. > 21: Started thread number 54, using pre-parsed documents. > 21: Started thread number 55, using unparsed documents. > 21: Started thread number 56, using pre-parsed documents. > 21: Started thread number 57, using unparsed documents. > 21: Started thread number 58, using pre-parsed documents. > 21: Started thread number 59, using unparsed documents. > 21: Waiting for active threads to finish... # <-- indefinite hang > ^C # <-- manually aborting process
Re: Testing hangs with Ninja build method
I had been careful to document and follow the same steps in ensuring I was not imagining things. The problem of testing hanging right at the end was persistent even after cleaning/resetting repository and switching between CMake build methods (i.e. "Unix Makefiles" or Ninja). For some reason, CMake plus Ninja was resulting in my testing observation. I played with the testing parallelism value (`ctest -jx`) and 'poof' - problem was gone. Testing finished almost instantaneously. Test #21 would finish in almost a quarter of a second. Weird. After repeated clean/reset/build/test iterations, including switching between gcc and clang compilers, this problem no longer presents itself. I'm going to file this one under 'build gremlins'. Roger, Very much appreciate the assistance. SF On 2022-10-16 10:55, Roger Leigh wrote: Hi Scott, This isn't ninja at fault. Ninja itself should have a complete dependency graph, so its behaviour should not be materially different than traditional make--but it's usually much faster due to the lack of pattern rules etc since they have to be expanded up front at generation time (by CMake). I would firstly suggest splitting up the build and test steps. That might be one cause of problems--the test rules might not depend upon the build products. That's likely due to the "ctest" test framework being a bit separated from the build--it might not know which targets to depend upon, so it expects you to build everything needed first. Secondly, try running "ctest" rather than using the "test" target. This way you'll invoke the test runner directly. This will let you control the parallelisation of the test execution. Use "ctest -j nnn" to run with 1 or multiple parallel tests. If Xalan-C has issues with the tests interfering with each other then this will identify if that's a problem, separate from any issues with the build system. Kind regards, Roger -Original Message- From: Scott Furry Sent: 16 October 2022 17:09 To: c-users@xalan.apache.org Subject: Re: Testing hangs with Ninja build method Roger, From what I have encountered in the past, Ninja appears notorious for rushing to compile leaving "holes", gaps or otherwise in its wake on occasion. It's easy to blame Ninja but it is also possible there is misconfiguration involved. I'm not personally a fan of Ninja but I haven't made a point of avoiding it outright. I thought it abnormal for Ninja and Makefiles to produce different results when testing. After a Ninja build, I had used `ctest -V ...`. It was giving me what I thought was a verbose output. What was shown appeared unhelpful. Output given: - $ ninja -C build test ... (snip) ... 21: Started thread number 46, using pre-parsed documents. 21: Started thread number 47, using unparsed documents. 21: Started thread number 48, using pre-parsed documents. 21: Started thread number 49, using unparsed documents. 16/21 Test #14: TraceListen-3 Passed 0.15 sec 17/21 Test #17: UseStylesheetParam ... Passed 0.12 sec 18/21 Test #18: XalanTransform ... Passed 0.11 sec 19/21 Test #19: XalanTransformerCallback . Passed 0.11 sec 20/21 Test #20: SimpleXPathCAPI .. Passed 0.09 sec 21: Started thread number 50, using pre-parsed documents. 21: Started thread number 51, using unparsed documents. 21: Started thread number 52, using pre-parsed documents. 21: Started thread number 53, using unparsed documents. 21: Started thread number 54, using pre-parsed documents. 21: Started thread number 55, using unparsed documents. 21: Started thread number 56, using pre-parsed documents. 21: Started thread number 57, using unparsed documents. 21: Started thread number 58, using pre-parsed documents. 21: Started thread number 59, using unparsed documents. 21: Waiting for active threads to finish... # <-- indefinite hang ^C # <-- manually aborting process