Hello,

> that suggests that we try to write something to a closed file.
>
> To be investigated :)

Ok, so I have a better grasp on what's going on. Cuirass web server is
receiving some requests such as "/builds/1234)" which were not rejected,
but worst, caused SQL queries such as "select * from Builds".

As the table is quite large, it caused some of the DB workers to
hang. Once all DB workers were hanging, the queries started to
accumulate until the open fd limit (1024) was reached.

I did consolidate the HTTP queries validation, and Cuirass web server is
now running since 48 hours, which has not happened in months I think.

I also added some warnings to detect DB workers hanging for more than 5
seconds. The next step is to log all SQL queries using[1]. This should
allow us to spot this kind of issues more easily. Logging the duration
of each query should also help us to optimize the queries.

I'm still waiting a few days before closing this issue.

Thanks,

Mathieu

[1]: https://notabug.org/guile-sqlite3/guile-sqlite3/pulls/16 



Reply via email to