On Tue, Nov 22, 2016 at 12:54 PM, Tom Lane <t...@sss.pgh.pa.us> wrote: > Robert Haas <robertmh...@gmail.com> writes: >> But that's not what is at issue here. The issue is whether, when >> asked to exit immediately, all processes should exit immediately, or >> whether it would be better for all processes except one to exit >> immediately and the last one exit non-immediately. In other words, >> when the user asks for an immediate shutdown, do they really mean it, >> or are they OK taking time to do some other stuff first? > > Peter already gave the response to that, which is that users do not > expect an immediate shutdown to have permanent harmful effects. > It doesn't have such effects so far as the SQL data goes; why is it > okay to blow away statistical data?
You're doing that anyway. The backends aren't going to send any accumulated but unsent statistics to the stats collector before exiting; they're just going to exit. >> You're arguing that preserving the stats file is more important than >> timely shutdown, but I don't buy it. > > Yes, I am, and I disagree with you. The current decision on this point > was made ages ago, before autovacuum even existed let alone relied on > the stats for proper functioning. The tradeoff you're saying you're > okay with is "we'll shut down a few seconds faster, but you're going > to have table bloat problems later because autovacuum won't know it > needs to do anything". I wonder how many of the complaints we get > about table bloat are a consequence of people not realizing that > "pg_ctl stop -m immediate" is going to cost them. That would be useful information to have, but I bet the answer is "not that many". Most people don't shut down their database very often; they're looking for continuous uptime. It looks to me like autovacuum activity causes the statistics files to get refreshed at least once per autovacuum_naptime, which defaults to once a minute, so on the average we're talking about the loss of perhaps 30 seconds worth of statistics. What percentage of database activity occurs within 30 second of an immediate shutdown? For a hypothetical installation where the DBA does an immediate shutdown at four randomly-chosen times each day, the answer is "less than 0.2%". In real installations, the time between immediate shutdown is likely to be weeks or months, and so the answer is perhaps two or three orders of magnitude less than that. The loss of a few (or even a few dozen) parts-per-million of statistics data shouldn't make any material difference. To make this add up to a significant loss, you have to suppose that there was a particularly large transaction in flight just before the crash. But it can't have been in flight at the moment of the crash, because then the statistics message would not then have been sent. So it has to be the case that some really big transaction ended and then just after that the DBA did an immediate shutdown. I'm not arguing that it couldn't happen, but it's a pretty narrow target to be aiming at. I also think that you're wildly overestimating the likelihood that writing the stats file will be fast, because (1) anything that involves writing to the disk can be very slow, either because there's a lot of other write activity or because the disk is going bad, the latter being actually a pretty common cause of emergency database shutdowns, (2) the stats files can be quite large if the database system contains hundreds of thousands or even millions of objects, which is not all that infrequent, and (3) pgstat wait timeouts are pretty common, which would not be the case if writing the file was invariably fast (c.f. 75b48e1fff8a4dedd3ddd7b76f6360b5cc9bb741). >> ... Yeah, it's not good, but neither are the things that prompt >> people to perform an immediate shutdown in the first place. > > Really? I think many users think an immediate shutdown is just fine. Why would anybody ever perform an immediate shutdown rather than a fast shutdown if a fast shutdown were fast? Now you've incurred the overhead of a crash recovery for no gain. In my experience, people perform immediate shutdowns mostly when they try something else and it doesn't work. (Other reasons: They are Oracle users who think that immediate will do what it does on Oracle, namely what we call a fast shutdown; or they don't want to wait for the shutdown checkpoint.) >> The patch that >> you claim moots this one was inspired by immediate shutdowns taking >> too long, and that patch enjoyed pretty broad support IIRC. > > I think that's historical revisionism. The commit message for 82233ce7e > says "(This might happen, for example, if a backend gets tangled trying > to malloc() due to gettext(), as in an example illustrated by MauMau.)". > There is absolutely nothing in it about people being dissatisfied with > the shutdown timing in normal cases; rather, it was done to prevent > cases where shutdown failed to happen at all. OK, you're probably right about that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers