On Sun, Feb 3, 2013 at 4:51 PM, Tomas Vondra <t...@fuzzy.cz> wrote: > LOG: last_statwrite 11133-08-28 19:22:31.711744+02 is later than > collector's time 2013-02-04 00:54:21.113439+01 for db 19093 > WARNING: pgstat wait timeout > LOG: last_statwrite 39681-12-23 18:48:48.9093+01 is later than > collector's time 2013-02-04 00:54:31.424681+01 for db 46494 > FATAL: could not find block containing chunk 0x2af4a60 > LOG: statistics collector process (PID 10063) exited with exit code 1 > WARNING: pgstat wait timeout > WARNING: pgstat wait timeout > > I'm not entirely sure where the FATAL came from, but it seems it was > somehow related to the issue - it was quite reproducible, although I > don't see how exactly could this happen. There relevant block of code > looks like this: > > char *writetime; > char *mytime; > > /* Copy because timestamptz_to_str returns a static buffer */ > writetime = pstrdup(timestamptz_to_str(dbentry->stats_timestamp)); > mytime = pstrdup(timestamptz_to_str(cur_ts)); > elog(LOG, "last_statwrite %s is later than collector's time %s for " > "db %d", writetime, mytime, dbentry->databaseid); > pfree(writetime); > pfree(mytime); > > which seems quite fine to mee. I'm not sure how one of the pfree calls > could fail?
I don't recall seeing the FATAL errors myself, but didn't keep the logfile around. (I do recall seeing the pgstat wait timeout). Are you using windows? pstrdup seems to be different there. I'm afraid I don't have much to say on the code. Indeed I never even look at it (other than grepping for pstrdup just now). I am taking a purely experimental approach, Since Alvaro and others have looked at the code. > > Anyway, attached is a patch that fixes all three issues, i.e. > > 1) the un-initialized timestamp > 2) the "void" ommited from the signature > 3) rename to "pg_stat" instead of just "stat" Thanks. If I shutdown the server and blow away the stats with "rm data/pg_stat/*", it recovers gracefully when I start it back up. If a do "rm -r data/pg_stat" then it has problems the next time I shut it down, but I have no right to do that in the first place. If I initdb a database without this patch, then shut it down and restart with binaries that include this patch, and need to manually make the pg_stat directory. Does that mean it needs a catalog bump in order to force an initdb? A review: It applies cleanly (some offsets, no fuzz), builds without warnings, and passes make check including with cassert. The final test done in "make check" inherently tests this code, and it passes. If I intentionally break the patch by making pgstat_read_db_statsfile add one to the oid it opens, then the test fails. So the existing test is at least plausible as a test. doc/src/sgml/monitoring.sgml needs to be changed: "a permanent copy of the statistics data is stored in the global subdirectory". I'm not aware of any other needed changes to the docs. The big question is whether we want this. I think we do. While having hundreds of databases in a cluster is not recommended, that is no reason not to handle it better than we do. I don't see any down-sides, other than possibly some code uglification. Some file systems might not deal well with having lots of small stats files being rapidly written and rewritten, but it is hard to see how the current behavior would be more favorable for those systems. We do not already have this. There is no relevant spec. I can't see how this could need pg_dump support (but what about pg_upgrade?) I am not aware of any dangers. I have a question about its completeness. When I first start up the cluster and have not yet touched it, there is very little stats collector activity, either with or without this patch. When I kick the cluster sufficiently (I've been using vacuumdb -a to do that) then there is a lot of stats collector activity. Even once the vacuumdb has long finished, this high level of activity continues even though the database is otherwise completely idle, and this seems to happen for every. This patch makes that high level of activity much more efficient, but it does not reduce the activity. I don't understand why an idle database cannot get back into the state right after start-up. I do not think that the patch needs to solve this problem in order to be accepted, but if it can be addressed while the author and reviewers are paying attention to this part of the system, that would be ideal. And if not, then we should at least remember that there is future work that could be done here. I created 1000 databases each with 200 single column tables (x serial primary key). After vacuumdb -a, I let it idle for a long time to see what steady state was reached. without the patch: vacuumdb -a real 11m2.624s idle steady state: 48.17% user, 39.24% system, 11.78% iowait, 0.81% idle. with the patch: vacuumdb -a real 6m41.306s idle steady state: 7.86% user, 5.00% system 0.09% iowait 87% idle, I also ran pgbench on a scale that fits in memory with fsync=off, on a singe CPU machine. With the same above-mentioned 1000 databases as unused decoys to bloat the stats file. pgbench_tellers and branches undergo enough turn over that they should get vacuumed every minuted (nap time). Without the patch, they only get vacuumed every 40 minutes or so as the autovac workers are so distracted by reading the bloated stats file. and the TPS is ~680. With the patch, they get vacuumed every 1 to 2 minutes and TPS is ~940 So this seems to be effective at it intended goal. I have not done a review of the code itself, only the performance. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers