Hi, On Thu, Jan 16, 2025 at 10:19:49AM +0800, 赵庭海(庭章) wrote: > Hi all, > Recently, I have noticed a potential problem in the pg_stat_statements > extension. When the garbage collection is triggered within this extension, > if there is a significant amount of data to be written to the > pgss_query_texts.stat file (for example, when tracking very long SQL > queries), the corresponding backend process holds the pgss->lock for an > extended period. This causes all other backend processes to wait for this > lock in the extension's hook during all SQL executions, and these processes > do not respond to interrupts during this time. > To temporarily address this issue, I have written a patch that introduces a > parameter to control the maximum length of tracked SQL queries.
I don't think that it would move the needle much. Deallocating entries is very expensive, even when the query text file isn't being cleaned up, as it needs to sort all entries by usage to remove the least recently used all with an exclusive pgss->lock. The real solution is probably to rely on the new pluggable statistic architecture rather than using the hash table / query text file. > It seems like this is not an ideal solution, at least, I think it is > necessary to log messages at the log level before and after the garbage > collection process. This would help us diagnose similar issues in the > future.I spent a considerable amount of time investigating this issue due to > the lack of relevant logs. > I believe adding these logs would be beneficial for troubleshooting. > Thanks for your attention to this issue. Isn't the pg_stat_statements_info.dealloc counter enough to figure out the root issue?