Thanks everybody for the extra info and Cyril for the research
I have killed the spider jobs.
I have modified the crontab in tye.debian.org to disable cron.hourly job
(spiderbts) for now.
I have removed the status files [1] and launched a job spiderinit [2] to
re-create them.
[1] in /srv/i18n.debian.org/dl10n/data/spiderbts/data/status.??
[2] sudo -u debian-i18n /srv/i18n.debian.org/dl10n/git/cron/spiderinit &
Tomorrow I'll have a look at the logs of the spiderinit job [3] and
launch the cron.hourly job once.
[3] /srv/i18n.debian.org/log/spiderinit/spiderinit.20230512-2134.[err|log]
Then I'll see how long does it take and if there is any issue.
If everything went well the webpages should show correct data. Then I'll
set the "hourly" job to run 6 times a day and will keep an eye these days.
I agree that a lockfile is needed, I'll try to work on that too and when
it's set, and the issue is fixed, I'll update the cron to run hourly again.
Kind regards
El 12/5/23 a las 12:04, Cyril Brulebois escribió:
Cyril Brulebois <k...@debian.org> (2023-05-12):
I'll keeping looking at what's supposed to happen on tye, but I'm not
sure I'll be able to get to the bottom of it on my own.
At least there's a HUGE red flag on tye. Load to the roof, RAM/swap
almost full, lots of dl10n-spider processes running for the same
language, some of them started May 9th.
kibi@tye:~$ uptime
10:02:58 up 12 days, 21:47, 2 users, load average: 63.24, 64.57, 66.51
kibi@tye:~$ free -h
total used free shared buff/cache
available
Mem: 1.9Gi 1.7Gi 69Mi 1.0Mi 125Mi
57Mi
Swap: 511Mi 511Mi 0.0Ki
kibi@tye:~$ ps faux|grep dl10n-spider|grep -o -- '--check-bts
..'|sort|uniq -c
4 --check-bts ca
1 --check-bts cs
1 --check-bts da
51 --check-bts de
7 --check-bts es
2 --check-bts fr
kibi@tye:~$ ps faux|awk '/CRON/ {print $9}'|sort|uniq -c
11 May09
23 May10
23 May11
1 00:15
1 02:15
1 03:15
1 04:15
1 05:15
1 06:15
1 07:15
1 08:13
1 08:15
1 09:15
2 10:00
1 10:01
Note that many de.po occurrences appear in the status file for other
languages, looks like processes heavily stomping onto others' feet?
It looks to me there should be some locking at the very least to avoid
that amount of concurrency. And that it would probably be best to start
afresh, killing all those processes, maybe disabling the cron jobs,
cleaning temporary and maybe corrupted data files, and triggering a
single run manually to see if it works.
But then, I have 0 knowledge about the spider, and I'll leave that up to
someone else: I don't want to risk making the matter worse!
Cheers,
--
Laura Arjona Reina
https://wiki.debian.org/LauraArjona