Hi,

On Wed, Apr 13, 2022 at 05:21:03PM +0000, build...@builder.wildebeest.org wrote:
> A new failure has been detected on builder elfutils-centos-x86_64 while 
> building elfutils.
> 
> Full details are available at:
>     https://builder.wildebeest.org/buildbot/#builders/1/builds/932
> 
> Build state: failed test (failure)
> Revision: 399b55a75830f1854c8da9f29282810e82f270b6
> Worker: centos-x86_64
> Build Reason: (unknown)
> Blamelist: Mark Wielaard <m...@klomp.org>
> 
> Steps:
> [...] 
> - 8: make check ( failure )
>     Logs:
>         - stdio: 
> https://builder.wildebeest.org/buildbot/#builders/1/builds/932/steps/8/logs/stdio
>         - test-suite.log: 
> https://builder.wildebeest.org/buildbot/#builders/1/builds/932/steps/8/logs/test-suite_log

Hmmm, this seems a little random. The change was just adding some
(unused) constants to dwarf.h. The log says:

command timed out: 1200 seconds without output running ['make', 'check', 
'-j4'], attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=1925.591587

It looks like run-debuginfod-federation-sqlite.sh is missing.

Looking at the buildbot worker the end of 
run-debuginfod-federation-sqlite.sh.log is:

+ mvalue=2
+ '[' -z 2 ']'
+ echo 'metric thread_work_total{role="groom"}: 2'
metric thread_work_total{role="groom"}: 2
+ '[' 2 -eq 2 ']'
+ break
+ '[' 18 -eq 0 ']'
+ curl -s http://127.0.0.1:9112/buildid/beefbeefbeefd00dd00d/debuginfo
+ curl -s http://127.0.0.1:9112/metrics
+ grep 'error_count.*sqlite'
error_count{sqlite3="database disk image is malformed"} 6
error_count{sqlite3="file is encrypted or is not a database"} 1
+ kill -INT 28184 28371
+ wait 28184 28371

Which seems to correspond to this part in run-debuginfod-federation-sqlite.sh

########################################################################
# Corrupt the sqlite database and get debuginfod to trip across its errors
curl -s http://127.0.0.1:$PORT1/metrics | grep 'sqlite3.*reset'
dd if=/dev/zero of=$DB bs=1 count=1

# trigger some random activity that's Sure to get sqlite3 upset
kill -USR1 $PID1
wait_ready $PORT1 'thread_work_total{role="traverse"}' 2
wait_ready $PORT1 'thread_work_pending{role="scan"}' 0
wait_ready $PORT1 'thread_busy{role="scan"}' 0
kill -USR2 $PID1
wait_ready $PORT1 'thread_work_total{role="groom"}' 2
curl -s http://127.0.0.1:$PORT1/buildid/beefbeefbeefd00dd00d/debuginfo > 
/dev/null || true
curl -s http://127.0.0.1:$PORT1/metrics | grep 'error_count.*sqlite'
# Run the tests again without the servers running. The target file should
# be found in the cache.

kill -INT $PID1 $PID2
wait $PID1 $PID2

So maybe corruptin the sqlite database prevents a proper shutdown of
the debuginfod process?

Cheers,

Mark

Reply via email to