Re: [PERFORM] Performance Benchmarking for data-warehousing instance?

2014-02-09 Thread Tomas Vondra
Hi,

On 8.2.2014 07:41, Huy Nguyen wrote:
> Do you happen to have a link to it? Though I think different machine
> specs should yield different optimal postgresql.conf.

An optimal configuration is not just about machine specs, it's about the
workload and application configuration too. So there's no benchmark that
would give you the best config for your application.

> I'm looking for a hand-crafted set of data + queries tailored for OLAP
> so that I can try to manually tweak one config at a time and run against
> the benchmark.

I think using pgtune is the best starting point you can get, and you may
tweak it based on your actual workload. If you can prepare a sample of
the workload (i.e. a representative amount of data) and run a set of
actual queries (generated by the application), that'd be an excellent
situation.

> I might considering creating one if no one has done it before.

So how exactly is that going to work? There's an benchmark for this,
called TPC-H [1], but again - this is just a model of how a DWH/DSS
application may look like.

I've spent a lot of time working with it a while ago (see [2]), and IMHO
the values recommended by pgtune are quite fine.

[1] http://www.tpc.org/tpch/default.asp
[2] http://www.fuzzy.cz/en/articles/dss-tpc-h-benchmark-with-postgresql/

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bloated tables and why is vacuum full the only option

2014-02-09 Thread Tomas Vondra
On 7.2.2014 19:47, Claudio Freire wrote:
>
> Question is... why isn't all that free space being used? The table
> grows in size even though there's plenty (65%) of free space.
> 
> I've got autovacuum severely crippled and that could be a reason, but
> I do perform regular vacuum runs weekly that always run to completion.
> I also do routine reindexing to stop index bloat on its tracks, yet
> freshly-reindexed indexes get considerably reduced in size with vacuum
> full.

Are you logging autovacuum actions? I.e. what is

log_autovacuum_min_duration

set to? It it's set to -1 you won't get any messages because of
conflicting locks or stuff like that, which might be the culprit here.

Also, when you're running the weekly VACUUM, do VACUUM (VERBOSE) and
post it here. That might at least help us eliminate some of the usual
suspects.

regards
Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bloated tables and why is vacuum full the only option

2014-02-09 Thread Claudio Freire
On Sun, Feb 9, 2014 at 12:50 PM, Tomas Vondra  wrote:
> On 7.2.2014 19:47, Claudio Freire wrote:
>>
>> Question is... why isn't all that free space being used? The table
>> grows in size even though there's plenty (65%) of free space.
>>
>> I've got autovacuum severely crippled and that could be a reason, but
>> I do perform regular vacuum runs weekly that always run to completion.
>> I also do routine reindexing to stop index bloat on its tracks, yet
>> freshly-reindexed indexes get considerably reduced in size with vacuum
>> full.
>
> Are you logging autovacuum actions? I.e. what is
>
> log_autovacuum_min_duration
>
> set to? It it's set to -1 you won't get any messages because of
> conflicting locks or stuff like that, which might be the culprit here.

It was set to -1. I set it to 5000 and I'll be keeping an eye on the logs.

> Also, when you're running the weekly VACUUM, do VACUUM (VERBOSE) and
> post it here. That might at least help us eliminate some of the usual
> suspects.

I'm using a cron job for this, I'll see about dumping the results to a
log file and post when it's done.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bloated tables and why is vacuum full the only option

2014-02-09 Thread Tom Lane
Claudio Freire  writes:
>>> I also do routine reindexing to stop index bloat on its tracks, yet
>>> freshly-reindexed indexes get considerably reduced in size with vacuum
>>> full.

AFAIK there's no reason for vacuum full to produce a different result
from reindex.  Did you mean to say that the indexes get smaller than
what they had been after some normal operation?  If so it's worth noting
this comment from the btree index building code (nbtsort.c):

 * It is not wise to pack the pages entirely full, since then *any*
 * insertion would cause a split (and not only of the leaf page; the need
 * for a split would cascade right up the tree).  The steady-state load
 * factor for btrees is usually estimated at 70%.  We choose to pack leaf
 * pages to the user-controllable fill factor (default 90%) while upper pages
 * are always packed to 70%.  This gives us reasonable density (there aren't
 * many upper pages if the keys are reasonable-size) without risking a lot of
 * cascading splits during early insertions.

As the comment notes, the initial state of a freshly-built index is packed
more densely than what you can expect after a lot of insertions/updates
have occurred.  That's not a bug, it's just a fact of life.

Also, there are certain usage patterns that can result in btree indexes
having densities much lower than the conventional-wisdom 70%.  The main
one I've seen in practice is "decimation", where you delete say 99 out
of every 100 entries in index order.  This leaves just a few live entries
in each leaf page --- but our btree code doesn't reclaim an index page
for recycling until it's totally empty.  So you can end up with a very
low load factor after doing something like that, and a reindex is the
only good way to fix it.

regards, tom lane


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bloated tables and why is vacuum full the only option

2014-02-09 Thread Claudio Freire
On Sun, Feb 9, 2014 at 4:40 PM, Tom Lane  wrote:
> Claudio Freire  writes:
 I also do routine reindexing to stop index bloat on its tracks, yet
 freshly-reindexed indexes get considerably reduced in size with vacuum
 full.
>
> AFAIK there's no reason for vacuum full to produce a different result
> from reindex.  Did you mean to say that the indexes get smaller than
> what they had been after some normal operation?  If so it's worth noting
> this comment from the btree index building code (nbtsort.c):

Smaller than after reindex. It was a surprise to me too.

> Also, there are certain usage patterns that can result in btree indexes
> having densities much lower than the conventional-wisdom 70%.  The main
> one I've seen in practice is "decimation", where you delete say 99 out
> of every 100 entries in index order.  This leaves just a few live entries
> in each leaf page --- but our btree code doesn't reclaim an index page
> for recycling until it's totally empty.  So you can end up with a very
> low load factor after doing something like that, and a reindex is the
> only good way to fix it.

That's exactly the kind of pattern the "archival" step results in,
that's why I do routine reindexing.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bloated tables and why is vacuum full the only option

2014-02-09 Thread Sergey Konoplev
On Fri, Feb 7, 2014 at 10:47 AM, Claudio Freire  wrote:
> What I'm seeing, though, is not that, but bloat proportional to table
> size (always stuck at about 65% bloat). What's weird, is that vacuum
> full does the trick of reducing table size and bloat back to 0%. I
> haven't had time yet to verify whether it goes back to 65% after
> vacuum full (that will take time, maybe a month).

Try pgcompact, it was designed particularily for such cases like yours
https://github.com/grayhemp/pgtoolkit.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bloated tables and why is vacuum full the only option

2014-02-09 Thread Claudio Freire
On Sun, Feb 9, 2014 at 7:32 PM, Sergey Konoplev  wrote:
> On Fri, Feb 7, 2014 at 10:47 AM, Claudio Freire  
> wrote:
>> What I'm seeing, though, is not that, but bloat proportional to table
>> size (always stuck at about 65% bloat). What's weird, is that vacuum
>> full does the trick of reducing table size and bloat back to 0%. I
>> haven't had time yet to verify whether it goes back to 65% after
>> vacuum full (that will take time, maybe a month).
>
> Try pgcompact, it was designed particularily for such cases like yours
> https://github.com/grayhemp/pgtoolkit.

It's a pity that that requires several sequential scans of the tables.
For my case, that's probably as intrusive as the exclusive locks.

I noticed I didn't mention, but the tables involved are around 20-50GB in size.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bloated tables and why is vacuum full the only option

2014-02-09 Thread Sergey Konoplev
On Sun, Feb 9, 2014 at 2:58 PM, Claudio Freire  wrote:
> On Sun, Feb 9, 2014 at 7:32 PM, Sergey Konoplev  wrote:
>> Try pgcompact, it was designed particularily for such cases like yours
>> https://github.com/grayhemp/pgtoolkit.
>
> It's a pity that that requires several sequential scans of the tables.
> For my case, that's probably as intrusive as the exclusive locks.

Probably you should run it with --no-pgstattuple if you are talking
about these seq scans. If your tables are not TOASTed then the
approximation method of gathering statistics would work pretty good
for you.

> I noticed I didn't mention, but the tables involved are around 20-50GB in 
> size.

It is not the thing I would worry about. I regularly use it with even
bigger tables.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance