On Tue, Feb 5, 2013 at 2:31 PM, Tomas Vondra <t...@fuzzy.cz> wrote: > On 5.2.2013 19:23, Jeff Janes wrote: > >> If I shutdown the server and blow away the stats with "rm >> data/pg_stat/*", it recovers gracefully when I start it back up. If a >> do "rm -r data/pg_stat" then it has problems the next time I shut it >> down, but I have no right to do that in the first place. If I initdb >> a database without this patch, then shut it down and restart with >> binaries that include this patch, and need to manually make the >> pg_stat directory. Does that mean it needs a catalog bump in order to >> force an initdb? > > Ummmm, what you mean by "catalog bump"?
There is a catalog number in src/include/catalog/catversion.h, which when changed forces one to redo initdb. Formally I guess it is only for system catalog changes, but I thought it was used for any on-disk changes during development cycles. I like it the way it is, as I can use the same data directory for both versions of the binary (patched and unpatched), and just manually create or remove the directory pg_stat directory when changing modes. That is ideal for testing this patch, probably not ideal for being committed into the tree along with all the other ongoing devel work. But I think this is something the committer has to worry about. >> I have a question about its completeness. When I first start up the >> cluster and have not yet touched it, there is very little stats >> collector activity, either with or without this patch. When I kick >> the cluster sufficiently (I've been using vacuumdb -a to do that) then >> there is a lot of stats collector activity. Even once the vacuumdb >> has long finished, this high level of activity continues even though >> the database is otherwise completely idle, and this seems to happen >> for every. This patch makes that high level of activity much more >> efficient, but it does not reduce the activity. I don't understand >> why an idle database cannot get back into the state right after >> start-up. > > What do you mean by "stats collector activity"? Is it reading/writing a > lot of data, or is it just using a lot of CPU? Basically, the launching of new autovac workers and the work that that entails. Your patch reduces the size of data that needs to be written, read, and parsed for every launch, but not the number of times that that happens. > Isn't that just a natural and expected behavior because the database > needs to actually perform ANALYZE to actually collect the data. Although > the tables are empty, it costs some CPU / IO and there's a lot of them > (1000 dbs, each with 200 tables). It isn't touching the tables at all, just the stats files. I was wrong about the cluster opening quiet. It only does that if, while the cluster was shutdown, you remove the statistics files which I was doing, as I was switching back and forth between patched and unpatched. When the cluster opens, any databases that don't have statistics in the stat file(s) will not get an autovacuum worker process spawned. They only start getting spawned once someone asks for statistics for that database. But then once that happens, that database then gets a worker spawned for it every naptime (or, at least, as close to that as the server can keep up with) for eternity, even if that database is never used again. The only way to stop this is the unsupported way of blowing away the permanent stats files. > > I don't think there's a way around this. You may increase the autovacuum > naptime, but that's about all. > >> I do not think that the patch needs to solve this problem in order to >> be accepted, but if it can be addressed while the author and reviewers >> are paying attention to this part of the system, that would be ideal. >> And if not, then we should at least remember that there is future work >> that could be done here. > > If I understand that correctly, you see the same behaviour even without > the patch, right? In that case I'd vote not to make the patch more > complex, and try to improve that separately (if it's even possible). OK. I just thought that while digging through the code, you might have a good idea for fixing this part as well. If so, it would be a shame for that idea to be lost when you move on to other things. > > >> I created 1000 databases each with 200 single column tables (x serial >> primary key). >> >> After vacuumdb -a, I let it idle for a long time to see what steady >> state was reached. >> >> without the patch: >> vacuumdb -a real 11m2.624s >> idle steady state: 48.17% user, 39.24% system, 11.78% iowait, 0.81% idle. >> >> with the patch: >> vacuumdb -a real 6m41.306s >> idle steady state: 7.86% user, 5.00% system 0.09% iowait 87% idle, > > Nice. Another interesting numbers would be device utilization, average > I/O speed I didn't gather that data, as I never figured out how to interpret those numbers and so don't have much faith in them. (But I am pretty impressed with the numbers I do understand) > and required space (which should be ~2x the pgstat.stat size > without the patch). I didn't study this in depth, but the patch seems to do what it should (that is, take less space, not more). If I fill the device up so that there is less than 3x the size of the stats file available for use (i.e. space for the file itself and for 1 temp copy version of it but not space for a complete second temp copy), I occasionally get out-of-space warning with unpatched. But never get those errors with patched. Indeed, with patch I never get warnings even with only 1.04 times the aggregate size of the stats files available for use. (That is, size for all the files, plus just 1/25 that amount to spare. Obviously this limit is specific to having 1000 databases of equal size.) > >> I also ran pgbench on a scale that fits in memory with fsync=off, on a >> singe CPU machine. With the same above-mentioned 1000 databases as >> unused decoys to bloat the stats file. >> >> pgbench_tellers and branches undergo enough turn over that they should >> get vacuumed every minuted (nap time). >> >> Without the patch, they only get vacuumed every 40 minutes or so as >> the autovac workers are so distracted by reading the bloated stats >> file. and the TPS is ~680. >> >> With the patch, they get vacuumed every 1 to 2 minutes and TPS is ~940 > > Great, I haven't really aimed to improve pgbench results, but it seems > natural that the decreased CPU utilization can go somewhere else. Not bad. My goal there was to prove to myself that the correct tables were getting vacuumed. The TPS measurements were just a by-product of that, but since I had them I figured I'd post them. > Have you moved the stats somewhere to tmpfs, or have you used the > default location (on disk)? All the specific work I reported was with them on disk, except the part about running out of space, which was done on /dev/shm. But even with the data theoretically going to disk, the kernel caches it well enough that I wouldn't expect things to change very much. Two more questions I've come up with: If I invoke pg_stat_reset() from a database, the corresponding file does not get removed from the pg_stat_tmp directory. And when the database is shut down, a file for the reset database does get created in pg_stat. Is this OK? Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers