I've progressed much further, and openoffice-fbsd-nightly,
openoffice-linux32-nightly, openoffice-linux64-nightly, and
openoffice-linux64-rat are now building, while openoffice-linux32-snapshot
is only temporarily breaking due to SourceForge issues. I've also made some
interesting discoveries about the Windows buildbots.

I had to revert r1736692 (in r1753709), setting main/extensions.lst back to
generic https:// SourceForge URLs, as the mirror-specific http:// URLs are
now broken, which was causing all buildbots that use
--enable-bundled-dictionaries to fail. Enough buildbots support https now
to make this a net benefit.

Also had to upload openssl-0.9.8zg to ooo-extras for the
openoffice-linux32-snapshot, but the most recent build failed because it
couldn't download some dictionaries/languages from SourceForge, which I am
generally finding to be too flaky.

I think we should either run ./bootstrap multiple times on the buildbots,
or list SourceForge URLs several times in external_deps.lst and
extensions.lst, to compensate for SourceForge's unreliability. Buildbots
should immediately fail the build if ./bootstrap fails, as there is no
hope: ./bootstrap only downloads dependencies needed for the build, and if
any are missing, the build will definitely fail, and burning CPU for up to
5 hours is pure waste. Alternatively we should find a more reliable
ooo-extras hosting solution than SourceForge. We could also cache
dependencies offline between builds, but I am guessing that has licensing
issues?

That leaves the Windows bots.
aoo-w7snap is still missing LWP::Protocol::https.
aoo-win7 was failing to delete the old build files (rm: cannot remove
`build/ext_libraries/apr/wntmsci12.pro/misc/build/apr-1.4.5/Makefile.win':
Device or resource busy).
Something seems to be keeping that file open even after the previous builds
are over.
According to ext_libraries/apr/makefile.mk, the BUILD_ACTION on Windows is:
INCLUDE="$(INCLUDE);./include"  nmake -f Makefile.win buildall
so I suspected nmake hangs.

I patched the build script to run "ps -W" (results at
https://ci.apache.org/builders/aoo-win7/builds/348/steps/ps/logs/stdio)
which showed 4 nmake processes, 2 from March 30, 2 from April 4, part of
orphaned trees also containing perl, sh, and dmake.
Killing nmake (through hacks to the buildbot script, as I still don't have
remote access) had no effect - deleting apr-1.4.5/Makefile.win was still
giving the same error.
Even killing dmake, sh and perl (which had to be done in creative ways on
that dodgy OS - some through taskkill, some through Cygwin's kill) still
had no effect.
After all Cygwin processes were dead, that error was still coming up!

So I started looking at non-Cygwin processes. DEVENV.EXE and DEVENV.COM had
the same March 30 / April 4 start times as the hung process trees, and
killing them *finally* allowed apr-1.4.5/Makefile.win to be deleted.

I'll experiment a lot more over the weekend, but right now I think the
problem could be that the buildbot runs VSVARS.BAT to set up the Visual
Studio environment, which (presumably) contains the evil DEVENV that break
the build and locks files while hung. On my own Windows VM I don't run
VSVARS.BAT, and I can't reproduce the problem. Do the rest of you that
build on Windows use it?

That buildbot is currently running out of disk space, and it doesn't help
that we "svn co" and then "svn export" a second copy. Originally the
buildbot script used other tricks, like "svn switch", or keeping an SVN
checkout across builds that was just updated and then exported from for
each build, but some time ago I switched to a full "svn co" because it was
too unreliable (eg. files can get locked and need "svn cleanup"). With a
full checkout there is no need to export, as we get a fresh copy each time.
I'll overhaul that buildbot script and try make it simpler and cleaner.

On Tue, Jul 19, 2016 at 8:17 PM, Damjan Jovanovic <dam...@apache.org> wrote:

> Hi
>
> I contacted Infra on HipChat and asked them to fix the buildbots I could
> find with the Perl LWP::Protocol::https problem (aoo-w7snap,
> openoffice-fbsd-nightly, and openoffice-linux32-nightly) or give me access
> to do it myself, and @pono fixed at least the openoffice-linux32-nightly
> bot.
>
> The other buildbots are either failing earlier or failing for other
> reasons. For example openoffice-linux64-nightly was failing to download
> openssl ("500 Can't connect to www.openssl.org:443"), but I've uploaded
> it to ooo-extras and it's gotten further in the build I am forcing now.
>
> Damjan
>
>

Reply via email to