Re: [PERFORM] Performance Benchmarking for data-warehousing instance?
Hi, On 8.2.2014 07:41, Huy Nguyen wrote: > Do you happen to have a link to it? Though I think different machine > specs should yield different optimal postgresql.conf. An optimal configuration is not just about machine specs, it's about the workload and application configuration too. So there's no benchmark that would give you the best config for your application. > I'm looking for a hand-crafted set of data + queries tailored for OLAP > so that I can try to manually tweak one config at a time and run against > the benchmark. I think using pgtune is the best starting point you can get, and you may tweak it based on your actual workload. If you can prepare a sample of the workload (i.e. a representative amount of data) and run a set of actual queries (generated by the application), that'd be an excellent situation. > I might considering creating one if no one has done it before. So how exactly is that going to work? There's an benchmark for this, called TPC-H [1], but again - this is just a model of how a DWH/DSS application may look like. I've spent a lot of time working with it a while ago (see [2]), and IMHO the values recommended by pgtune are quite fine. [1] http://www.tpc.org/tpch/default.asp [2] http://www.fuzzy.cz/en/articles/dss-tpc-h-benchmark-with-postgresql/ regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bloated tables and why is vacuum full the only option
On 7.2.2014 19:47, Claudio Freire wrote: > > Question is... why isn't all that free space being used? The table > grows in size even though there's plenty (65%) of free space. > > I've got autovacuum severely crippled and that could be a reason, but > I do perform regular vacuum runs weekly that always run to completion. > I also do routine reindexing to stop index bloat on its tracks, yet > freshly-reindexed indexes get considerably reduced in size with vacuum > full. Are you logging autovacuum actions? I.e. what is log_autovacuum_min_duration set to? It it's set to -1 you won't get any messages because of conflicting locks or stuff like that, which might be the culprit here. Also, when you're running the weekly VACUUM, do VACUUM (VERBOSE) and post it here. That might at least help us eliminate some of the usual suspects. regards Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bloated tables and why is vacuum full the only option
On Sun, Feb 9, 2014 at 12:50 PM, Tomas Vondra wrote: > On 7.2.2014 19:47, Claudio Freire wrote: >> >> Question is... why isn't all that free space being used? The table >> grows in size even though there's plenty (65%) of free space. >> >> I've got autovacuum severely crippled and that could be a reason, but >> I do perform regular vacuum runs weekly that always run to completion. >> I also do routine reindexing to stop index bloat on its tracks, yet >> freshly-reindexed indexes get considerably reduced in size with vacuum >> full. > > Are you logging autovacuum actions? I.e. what is > > log_autovacuum_min_duration > > set to? It it's set to -1 you won't get any messages because of > conflicting locks or stuff like that, which might be the culprit here. It was set to -1. I set it to 5000 and I'll be keeping an eye on the logs. > Also, when you're running the weekly VACUUM, do VACUUM (VERBOSE) and > post it here. That might at least help us eliminate some of the usual > suspects. I'm using a cron job for this, I'll see about dumping the results to a log file and post when it's done. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bloated tables and why is vacuum full the only option
Claudio Freire writes: >>> I also do routine reindexing to stop index bloat on its tracks, yet >>> freshly-reindexed indexes get considerably reduced in size with vacuum >>> full. AFAIK there's no reason for vacuum full to produce a different result from reindex. Did you mean to say that the indexes get smaller than what they had been after some normal operation? If so it's worth noting this comment from the btree index building code (nbtsort.c): * It is not wise to pack the pages entirely full, since then *any* * insertion would cause a split (and not only of the leaf page; the need * for a split would cascade right up the tree). The steady-state load * factor for btrees is usually estimated at 70%. We choose to pack leaf * pages to the user-controllable fill factor (default 90%) while upper pages * are always packed to 70%. This gives us reasonable density (there aren't * many upper pages if the keys are reasonable-size) without risking a lot of * cascading splits during early insertions. As the comment notes, the initial state of a freshly-built index is packed more densely than what you can expect after a lot of insertions/updates have occurred. That's not a bug, it's just a fact of life. Also, there are certain usage patterns that can result in btree indexes having densities much lower than the conventional-wisdom 70%. The main one I've seen in practice is "decimation", where you delete say 99 out of every 100 entries in index order. This leaves just a few live entries in each leaf page --- but our btree code doesn't reclaim an index page for recycling until it's totally empty. So you can end up with a very low load factor after doing something like that, and a reindex is the only good way to fix it. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bloated tables and why is vacuum full the only option
On Sun, Feb 9, 2014 at 4:40 PM, Tom Lane wrote: > Claudio Freire writes: I also do routine reindexing to stop index bloat on its tracks, yet freshly-reindexed indexes get considerably reduced in size with vacuum full. > > AFAIK there's no reason for vacuum full to produce a different result > from reindex. Did you mean to say that the indexes get smaller than > what they had been after some normal operation? If so it's worth noting > this comment from the btree index building code (nbtsort.c): Smaller than after reindex. It was a surprise to me too. > Also, there are certain usage patterns that can result in btree indexes > having densities much lower than the conventional-wisdom 70%. The main > one I've seen in practice is "decimation", where you delete say 99 out > of every 100 entries in index order. This leaves just a few live entries > in each leaf page --- but our btree code doesn't reclaim an index page > for recycling until it's totally empty. So you can end up with a very > low load factor after doing something like that, and a reindex is the > only good way to fix it. That's exactly the kind of pattern the "archival" step results in, that's why I do routine reindexing. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bloated tables and why is vacuum full the only option
On Fri, Feb 7, 2014 at 10:47 AM, Claudio Freire wrote: > What I'm seeing, though, is not that, but bloat proportional to table > size (always stuck at about 65% bloat). What's weird, is that vacuum > full does the trick of reducing table size and bloat back to 0%. I > haven't had time yet to verify whether it goes back to 65% after > vacuum full (that will take time, maybe a month). Try pgcompact, it was designed particularily for such cases like yours https://github.com/grayhemp/pgtoolkit. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bloated tables and why is vacuum full the only option
On Sun, Feb 9, 2014 at 7:32 PM, Sergey Konoplev wrote: > On Fri, Feb 7, 2014 at 10:47 AM, Claudio Freire > wrote: >> What I'm seeing, though, is not that, but bloat proportional to table >> size (always stuck at about 65% bloat). What's weird, is that vacuum >> full does the trick of reducing table size and bloat back to 0%. I >> haven't had time yet to verify whether it goes back to 65% after >> vacuum full (that will take time, maybe a month). > > Try pgcompact, it was designed particularily for such cases like yours > https://github.com/grayhemp/pgtoolkit. It's a pity that that requires several sequential scans of the tables. For my case, that's probably as intrusive as the exclusive locks. I noticed I didn't mention, but the tables involved are around 20-50GB in size. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bloated tables and why is vacuum full the only option
On Sun, Feb 9, 2014 at 2:58 PM, Claudio Freire wrote: > On Sun, Feb 9, 2014 at 7:32 PM, Sergey Konoplev wrote: >> Try pgcompact, it was designed particularily for such cases like yours >> https://github.com/grayhemp/pgtoolkit. > > It's a pity that that requires several sequential scans of the tables. > For my case, that's probably as intrusive as the exclusive locks. Probably you should run it with --no-pgstattuple if you are talking about these seq scans. If your tables are not TOASTed then the approximation method of gathering statistics would work pretty good for you. > I noticed I didn't mention, but the tables involved are around 20-50GB in > size. It is not the thing I would worry about. I regularly use it with even bigger tables. -- Kind regards, Sergey Konoplev PostgreSQL Consultant and DBA http://www.linkedin.com/in/grayhemp +1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979 gray...@gmail.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance