Testing hangs with Ninja build method

2022-10-16 Thread Scott Furry

Platform: AMD Bulldozer (yes, that old)
OS:   Gentoo X86_64
gcc:  11.3.0 (Gentoo 11.3.0-p4)
clang:    13.0.1
Xerces-C: v3.2.3
Xalan-C:  v1.12.0

Xerces-C was installed via Gentoo ebuild. No errors or warnings were 
encountered during install.
Xalan-C was cloned from github repository. Followed steps for a Ninja 
build. A few odd warnings were encountered during the process(i.e. 
-Wdeprecated-copy, -Wextra, etc). No errors reported during build.


commands:
$ cd [cloned repo]
$ cmake -H. -Bbuild -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/local/Xalan-C 
-DCMAKE_BUILD_TYPE=Release

...
-- Xalan-C++ configuration summary
-- ---
--
--   Version: 1.12.0
--   Library major version:   112
--   Library minor version:   0
--
--   Installation directory:  /usr/local/Xalan-C
--   C compiler:  /usr/bin/cc
--   C++ compiler:    /usr/bin/c++
--
--   Build shared libraries:  ON
--   Thread implementation:   standard
--   Transcoder:  icu
--   Message Loader:  inmemory
--   Message Loader Locale:   en_US
-- Configuring done
-- Generating done
-- Build files have been written to: /home/[path]/[cloned repo]/build
$ ninja -C build

When executing testing ($ cd build; ctest -V -j8;), test #21 will hang 
indefinitely with no timeout encountered. Numerous `21: Waiting for 
active threads to finish...` messages were reported. Last information 
reported in the terminal was `21: Waiting for active threads to 
finish...` message. After ~15 minutes wating, testing process was 
manually aborted.


I cleaned and reset the cloned repository to start again. I repeated the 
above except using clang/clang++ build. I received the same results with 
compile. Testing would still hang indefinitely waiting for active 
threads. Testing had to be manually aborted.


Cleaning and resetting the cloned repository, above steps were repeated 
using `... -G "Unix Makefiles" ...` in the CMake command instead. 
Configure and build successfully completed similar to above.


Unlike the above results, testing with command `make -C build test` 
command ran to successful completion and exited correctly. I repeated 
testing with ctest on the build. Again, testing completed and exited 
successfully.


I suspect there is some deviation between the makefile and ninja build 
setup. A ninja setup, utilizing either gcc or clang, results in testing 
to hang indefinitely.


Please advise if there is some other step should be taken to diagnose.



RE: Testing hangs with Ninja build method

2022-10-16 Thread Roger Leigh
Hi Scott,

Hard to diagnose without more information.  Can you build with verbose logging, 
so we can see what ninja is waiting on?  Does "ps" or "pstree" show any 
children which have become stuck?  Does it happen if you build with no 
parallelisation?

It's possible there is a broken rule being emitted.  It's not been picked up by 
the CI though.  Also possible you've encountered a ninja bug--I've seen them in 
the past though I would think it unlikely.  Or possibly a bug in the CMake 
version you are using if it's generating a broken rule.  If you can identify 
where it's getting stuck, it might be worth looking through the generated Ninja 
file to see if there is any obviously broken rule in there.

Also worth running ninja under strace so you can follow what's going on.  If 
you trace all threads and child processes, that might give you an indication 
about what it's waiting on.

Kind regards,
Roger

> -Original Message-
> From: Scott Furry 
> Sent: 16 October 2022 16:25
> To: c-users@xalan.apache.org
> Subject: Testing hangs with Ninja build method
> 
> Platform: AMD Bulldozer (yes, that old)
> OS:   Gentoo X86_64
> gcc:  11.3.0 (Gentoo 11.3.0-p4)
> clang:    13.0.1
> Xerces-C: v3.2.3
> Xalan-C:  v1.12.0
> 
> Xerces-C was installed via Gentoo ebuild. No errors or warnings were
> encountered during install.
> Xalan-C was cloned from github repository. Followed steps for a Ninja build.
> A few odd warnings were encountered during the process(i.e.
> -Wdeprecated-copy, -Wextra, etc). No errors reported during build.
> 
> commands:
> $ cd [cloned repo]
> $ cmake -H. -Bbuild -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/local/Xalan-C
> -DCMAKE_BUILD_TYPE=Release
> ...
> -- Xalan-C++ configuration summary
> -- ---
> --
> --   Version: 1.12.0
> --   Library major version:   112
> --   Library minor version:   0
> --
> --   Installation directory:  /usr/local/Xalan-C
> --   C compiler:  /usr/bin/cc
> --   C++ compiler:    /usr/bin/c++
> --
> --   Build shared libraries:  ON
> --   Thread implementation:   standard
> --   Transcoder:  icu
> --   Message Loader:  inmemory
> --   Message Loader Locale:   en_US
> -- Configuring done
> -- Generating done
> -- Build files have been written to: /home/[path]/[cloned repo]/build $ ninja 
> -
> C build
> 
> When executing testing ($ cd build; ctest -V -j8;), test #21 will hang
> indefinitely with no timeout encountered. Numerous `21: Waiting for active
> threads to finish...` messages were reported. Last information reported in
> the terminal was `21: Waiting for active threads to finish...` message. After
> ~15 minutes wating, testing process was manually aborted.
> 
> I cleaned and reset the cloned repository to start again. I repeated the above
> except using clang/clang++ build. I received the same results with compile.
> Testing would still hang indefinitely waiting for active threads. Testing had 
> to
> be manually aborted.
> 
> Cleaning and resetting the cloned repository, above steps were repeated
> using `... -G "Unix Makefiles" ...` in the CMake command instead.
> Configure and build successfully completed similar to above.
> 
> Unlike the above results, testing with command `make -C build test`
> command ran to successful completion and exited correctly. I repeated
> testing with ctest on the build. Again, testing completed and exited
> successfully.
> 
> I suspect there is some deviation between the makefile and ninja build setup.
> A ninja setup, utilizing either gcc or clang, results in testing to hang
> indefinitely.
> 
> Please advise if there is some other step should be taken to diagnose.



Re: Testing hangs with Ninja build method

2022-10-16 Thread Scott Furry

Roger,

From what I have encountered in the past, Ninja appears notorious for 
rushing to compile leaving "holes", gaps or otherwise in its wake on 
occasion. It's easy to blame Ninja but it is also possible there is 
misconfiguration involved. I'm not personally a fan of Ninja but I 
haven't made a point of avoiding it outright. I thought it abnormal for 
Ninja and Makefiles to produce different results when testing.



After a Ninja build, I had used `ctest -V ...`. It was giving me what I 
thought was a verbose output. What was shown appeared unhelpful.

Output given:
-
$ ninja -C build test
... (snip) ...
21: Started thread number 46, using pre-parsed documents.
21: Started thread number 47, using unparsed documents.
21: Started thread number 48, using pre-parsed documents.
21: Started thread number 49, using unparsed documents.
16/21 Test #14: TraceListen-3    Passed 0.15 sec
17/21 Test #17: UseStylesheetParam ...   Passed 0.12 sec
18/21 Test #18: XalanTransform ...   Passed 0.11 sec
19/21 Test #19: XalanTransformerCallback .   Passed 0.11 sec
20/21 Test #20: SimpleXPathCAPI ..   Passed 0.09 sec
21: Started thread number 50, using pre-parsed documents.
21: Started thread number 51, using unparsed documents.
21: Started thread number 52, using pre-parsed documents.
21: Started thread number 53, using unparsed documents.
21: Started thread number 54, using pre-parsed documents.
21: Started thread number 55, using unparsed documents.
21: Started thread number 56, using pre-parsed documents.
21: Started thread number 57, using unparsed documents.
21: Started thread number 58, using pre-parsed documents.
21: Started thread number 59, using unparsed documents.
21: Waiting for active threads to finish...    # <-- indefinite hang
^C # <-- manually aborting 
process

-

No other details were produced. Again, this is after successful 
configure with CMake and compile with both gcc/clang. It's just testing 
being difficult.


I am unsure how to proceed with "build with verbose logging". I am 
assuming a compiler/debuggin switch (-O3 or similar?). Please confirm.


I am following the basic information given on build instructions 
(https://apache.github.io/xalan-c/build.html) and I am rather rusty on 
compiling. The references to "ps" or "pstree" are foreign to me. 
Guidance would be appreciated.


Thank You,
Scott Furry


On 2022-10-16 09:43, Roger Leigh wrote:

Hi Scott,

Hard to diagnose without more information.  Can you build with verbose logging, so we can see what 
ninja is waiting on?  Does "ps" or "pstree" show any children which have become 
stuck?  Does it happen if you build with no parallelisation?

It's possible there is a broken rule being emitted.  It's not been picked up by 
the CI though.  Also possible you've encountered a ninja bug--I've seen them in 
the past though I would think it unlikely.  Or possibly a bug in the CMake 
version you are using if it's generating a broken rule.  If you can identify 
where it's getting stuck, it might be worth looking through the generated Ninja 
file to see if there is any obviously broken rule in there.

Also worth running ninja under strace so you can follow what's going on.  If 
you trace all threads and child processes, that might give you an indication 
about what it's waiting on.

Kind regards,
Roger


-Original Message-
From: Scott Furry 
Sent: 16 October 2022 16:25
To: c-users@xalan.apache.org
Subject: Testing hangs with Ninja build method

Platform: AMD Bulldozer (yes, that old)
OS:   Gentoo X86_64
gcc:  11.3.0 (Gentoo 11.3.0-p4)
clang:    13.0.1
Xerces-C: v3.2.3
Xalan-C:  v1.12.0

Xerces-C was installed via Gentoo ebuild. No errors or warnings were
encountered during install.
Xalan-C was cloned from github repository. Followed steps for a Ninja build.
A few odd warnings were encountered during the process(i.e.
-Wdeprecated-copy, -Wextra, etc). No errors reported during build.

commands:
$ cd [cloned repo]
$ cmake -H. -Bbuild -G Ninja -DCMAKE_INSTALL_PREFIX=/usr/local/Xalan-C
-DCMAKE_BUILD_TYPE=Release
...
-- Xalan-C++ configuration summary
-- ---
--
--   Version: 1.12.0
--   Library major version:   112
--   Library minor version:   0
--
--   Installation directory:  /usr/local/Xalan-C
--   C compiler:  /usr/bin/cc
--   C++ compiler:    /usr/bin/c++
--
--   Build shared libraries:  ON
--   Thread implementation:   standard
--   Transcoder:  icu
--   Message Loader:  inmemory
--   Message Loader Locale:   en_US
-- Configuring done
-- Generating done
-- Build files have been written to: /home/[path]/[cloned repo]/build $ ninja -
C build

When executing testing ($ cd build; ctest -V -j8;), test #21 will hang
indefinitely with no timeout encountered. Numerous `21: Waiting for active
threa

RE: Testing hangs with Ninja build method

2022-10-16 Thread Roger Leigh
Hi Scott,

This isn't ninja at fault.  Ninja itself should have a complete dependency 
graph, so its behaviour should not be materially different than traditional 
make--but it's usually much faster due to the lack of pattern rules etc since 
they have to be expanded up front at generation time (by CMake).

I would firstly suggest splitting up the build and test steps.  That might be 
one cause of problems--the test rules might not depend upon the build products. 
 That's likely due to the "ctest" test framework being a bit separated from the 
build--it might not know which targets to depend upon, so it expects you to 
build everything needed first.

Secondly, try running "ctest" rather than using the "test" target.  This way 
you'll invoke the test runner directly.  This will let you control the 
parallelisation of the test execution.  Use "ctest -j nnn" to run with 1 or 
multiple parallel tests.  If Xalan-C has issues with the tests interfering with 
each other then this will identify if that's a problem, separate from any 
issues with the build system.

Kind regards,
Roger

> -Original Message-
> From: Scott Furry 
> Sent: 16 October 2022 17:09
> To: c-users@xalan.apache.org
> Subject: Re: Testing hangs with Ninja build method
> 
> Roger,
> 
>  From what I have encountered in the past, Ninja appears notorious for
> rushing to compile leaving "holes", gaps or otherwise in its wake on occasion.
> It's easy to blame Ninja but it is also possible there is misconfiguration
> involved. I'm not personally a fan of Ninja but I haven't made a point of
> avoiding it outright. I thought it abnormal for Ninja and Makefiles to produce
> different results when testing.
> 
> 
> After a Ninja build, I had used `ctest -V ...`. It was giving me what I 
> thought
> was a verbose output. What was shown appeared unhelpful.
> Output given:
> -
> $ ninja -C build test
> ... (snip) ...
> 21: Started thread number 46, using pre-parsed documents.
> 21: Started thread number 47, using unparsed documents.
> 21: Started thread number 48, using pre-parsed documents.
> 21: Started thread number 49, using unparsed documents.
> 16/21 Test #14: TraceListen-3    Passed 0.15 sec
> 17/21 Test #17: UseStylesheetParam ...   Passed 0.12 sec
> 18/21 Test #18: XalanTransform ...   Passed 0.11 sec
> 19/21 Test #19: XalanTransformerCallback .   Passed 0.11 sec
> 20/21 Test #20: SimpleXPathCAPI ..   Passed 0.09 sec
> 21: Started thread number 50, using pre-parsed documents.
> 21: Started thread number 51, using unparsed documents.
> 21: Started thread number 52, using pre-parsed documents.
> 21: Started thread number 53, using unparsed documents.
> 21: Started thread number 54, using pre-parsed documents.
> 21: Started thread number 55, using unparsed documents.
> 21: Started thread number 56, using pre-parsed documents.
> 21: Started thread number 57, using unparsed documents.
> 21: Started thread number 58, using pre-parsed documents.
> 21: Started thread number 59, using unparsed documents.
> 21: Waiting for active threads to finish...    # <-- indefinite hang
> ^C # <-- manually aborting process



Re: Testing hangs with Ninja build method

2022-10-16 Thread Scott Furry
I had been careful to document and follow the same steps in ensuring I 
was not imagining things. The problem of testing hanging right at the 
end was persistent even after cleaning/resetting repository and 
switching between CMake build methods (i.e. "Unix Makefiles" or Ninja). 
For some reason, CMake plus Ninja was resulting in my testing observation.


I played with the testing parallelism value (`ctest -jx`) and 'poof' - 
problem was gone. Testing finished almost instantaneously. Test #21 
would finish in almost a quarter of a second. Weird.


After repeated clean/reset/build/test iterations, including switching 
between gcc and clang compilers, this problem no longer presents itself.


I'm going to file this one under 'build gremlins'.

Roger,
Very much appreciate the assistance.

SF

On 2022-10-16 10:55, Roger Leigh wrote:

Hi Scott,

This isn't ninja at fault.  Ninja itself should have a complete dependency 
graph, so its behaviour should not be materially different than traditional 
make--but it's usually much faster due to the lack of pattern rules etc since 
they have to be expanded up front at generation time (by CMake).

I would firstly suggest splitting up the build and test steps.  That might be one cause 
of problems--the test rules might not depend upon the build products.  That's likely due 
to the "ctest" test framework being a bit separated from the build--it might 
not know which targets to depend upon, so it expects you to build everything needed first.

Secondly, try running "ctest" rather than using the "test" target.  This way you'll 
invoke the test runner directly.  This will let you control the parallelisation of the test execution.  Use 
"ctest -j nnn" to run with 1 or multiple parallel tests.  If Xalan-C has issues with the tests 
interfering with each other then this will identify if that's a problem, separate from any issues with the 
build system.

Kind regards,
Roger


-Original Message-
From: Scott Furry 
Sent: 16 October 2022 17:09
To: c-users@xalan.apache.org
Subject: Re: Testing hangs with Ninja build method

Roger,

  From what I have encountered in the past, Ninja appears notorious for
rushing to compile leaving "holes", gaps or otherwise in its wake on occasion.
It's easy to blame Ninja but it is also possible there is misconfiguration
involved. I'm not personally a fan of Ninja but I haven't made a point of
avoiding it outright. I thought it abnormal for Ninja and Makefiles to produce
different results when testing.


After a Ninja build, I had used `ctest -V ...`. It was giving me what I thought
was a verbose output. What was shown appeared unhelpful.
Output given:
-
$ ninja -C build test
... (snip) ...
21: Started thread number 46, using pre-parsed documents.
21: Started thread number 47, using unparsed documents.
21: Started thread number 48, using pre-parsed documents.
21: Started thread number 49, using unparsed documents.
16/21 Test #14: TraceListen-3    Passed 0.15 sec
17/21 Test #17: UseStylesheetParam ...   Passed 0.12 sec
18/21 Test #18: XalanTransform ...   Passed 0.11 sec
19/21 Test #19: XalanTransformerCallback .   Passed 0.11 sec
20/21 Test #20: SimpleXPathCAPI ..   Passed 0.09 sec
21: Started thread number 50, using pre-parsed documents.
21: Started thread number 51, using unparsed documents.
21: Started thread number 52, using pre-parsed documents.
21: Started thread number 53, using unparsed documents.
21: Started thread number 54, using pre-parsed documents.
21: Started thread number 55, using unparsed documents.
21: Started thread number 56, using pre-parsed documents.
21: Started thread number 57, using unparsed documents.
21: Started thread number 58, using pre-parsed documents.
21: Started thread number 59, using unparsed documents.
21: Waiting for active threads to finish...    # <-- indefinite hang
^C # <-- manually aborting process