[PERFORM] Inserts or Updates

2012-02-07 Thread Ofer Israeli
Hi all,

We are currently "stuck" with a performance bottleneck in our server using PG 
and we are thinking of two potential solutions which I would be happy to hear 
your opinion about.

Our system has a couple of tables that hold client generated information.  The 
clients communicate every minute with the server and thus we perform an update 
on these two tables every minute.  We are talking about ~50K clients (and 
therefore records).

These constant updates have made the table sizes to grow drastically and index 
bloating.  So the two solutions that we are talking about are:

 1.  Configure autovacuum to work more intensively in both time and cost 
parameters.
Pros:
Not a major architectural change.
Cons:
Autovacuum does not handle index bloating and thus we will need to periodically 
reindex the tables.
Perhaps we will also need to run vacuum full periodically if the autovacuum 
cleaning is not at the required pace and therefore defragmentation of the 
tables is needed?


 1.  Creating a new table every minute and inserting the data into this new 
temporary table (only inserts).  This process will happen every minute.  Note 
that in this process we will also need to copy missing data (clients that 
didn't communicate) from older table.
Pros:
Tables are always compact.
We will not reach a limit of autovacuum.
Cons:
Major architectural change.

So to sum it up, we would be happy to refrain from performing a major change to 
the system (solution #2), but we are not certain that the correct way to work 
in our situation, constant updates of records, is to configure an aggressive 
autovacuum or perhaps the "known methodology" is to work with temporary tables 
that are always inserted into?


Thank you,
Ofer


Re: [PERFORM] Inserts or Updates

2012-02-07 Thread Kevin Grittner
Ofer Israeli  wrote:
 
> Our system has a couple of tables that hold client generated
> information. The clients communicate every minute with the server
> and thus we perform an update on these two tables every minute. We
> are talking about ~50K clients (and therefore records).
> 
> These constant updates have made the table sizes to grow
> drastically and index bloating. So the two solutions that we are
> talking about are:
> 
> 1. Configure autovacuum to work more intensively in both time and
> cost parameters.
> Pros:
> Not a major architectural change.
> Cons:
> Autovacuum does not handle index bloating and thus we will need to
> periodically reindex the tables.
 
Done aggressively enough, autovacuum should prevent index bloat, too.
 
> Perhaps we will also need to run vacuum full periodically if the
> autovacuum cleaning is not at the required pace and therefore
> defragmentation of the tables is needed?
 
The other thing that can cause bloat in this situation is a
long-running transaction.  To correct occasional bloat due to that on
small frequently-updated tables we run CLUSTER on them daily during
off-peak hours.  If you are on version 9.0 or later, VACUUM FULL
instead would be fine.  While this locks the table against other
action while it runs, on a small table it is a small enough fraction
of a second that nobody notices.
 
> 1. Creating a new table every minute and inserting the data into
> this new temporary table (only inserts). This process will happen
> every minute. Note that in this process we will also need to copy
> missing data (clients that didn't communicate) from older table.
> Pros:
> Tables are always compact.
> We will not reach a limit of autovacuum.
> Cons:
> Major architectural change.
 
I would try the other alternative first.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserts or Updates

2012-02-07 Thread Andy Colson

On 2/7/2012 4:18 AM, Ofer Israeli wrote:

Hi all,

We are currently “stuck” with a performance bottleneck in our server
using PG and we are thinking of two potential solutions which I would be
happy to hear your opinion about.

Our system has a couple of tables that hold client generated
information. The clients communicate *every* minute with the server and
thus we perform an update on these two tables every minute. We are
talking about ~50K clients (and therefore records).

These constant updates have made the table sizes to grow drastically and
index bloating. So the two solutions that we are talking about are:



You dont give any table details, so I'll have to guess.  Maybe you have 
too many indexes on your table?  Or, you dont have a good primary index, 
which means your updates are changing the primary key?


If you only have a primary index, and you are not changing it, Pg should 
be able to do HOT updates.


If you have lots of indexes, you should review them, you probably don't 
need half of them.



And like Kevin said, try the simple one first.  Wont hurt anything, and 
if it works, great!


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index with all necessary columns - Postgres vs MSSQL

2012-02-07 Thread Igor Neyman
From: Gudmundur Johannesson [mailto:gudmundur.johannes...@gmail.com] 
Sent: Thursday, February 02, 2012 11:42 AM
To: Merlin Moncure
Cc: pgsql-performance@postgresql.org
Subject: Re: Index with all necessary columns - Postgres vs MSSQL

Hi,

I want to start by thanking you guys for a quick response and I will try to 
provide all the information you request.  

1) What version am I running:
"PostgreSQL 9.1.2, compiled by Visual C++ build 1500, 64-bit"

2) Schema:
CREATE TABLE test( id integer,  dtstamp timestamp without time zone,  rating 
real) WITH ( OIDS=FALSE);
CREATE INDEX test_all ON test USING btree  (id , dtstamp, rating);
200M rows
Table size 9833MB
Index size 7653 MB

3) Difference between the first and the second run time?
The statement executed is:
SELECT count(1) FROM test
WHERE id in (58,83,88,98,124,141,170,195,
202,252,265,293,305,331,348)
AND dtstamp between cast('2011-10-19 08:00:00' as timestamp)  and 
cast('2011-10-19 16:00:00' as timestamp)
a) 1st run = 26 seconds
b) 2nd run = 0.234 seconds
c) 3rd-6th run = 0.06 seconds

If I perform the query above for another day then I get 26 seconds for the 1st 
query.

4) What was the execution plan of it
"Aggregate  (cost=151950.75..151950.76 rows=1 width=0)"
"  ->  Bitmap Heap Scan on data_cbm_reading cbm  (cost=1503.69..151840.82 
rows=43974 width=0)"
"    Recheck Cond: ((virtual_id = ANY 
('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[])) AND 
("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND 
("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"
"    ->  Bitmap Index Scan on data_cbm_reading_all  (cost=0.00..1492.70 
rows=43974 width=0)"
"  Index Cond: ((virtual_id = ANY 
('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[])) AND 
("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND 
("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"

5) In this case, I shut down the mssql server/machine and restart it.  To be on 
the safe side, I ensured the cache is empty using dbcc freeproccache and dbcc 
dropcleanbuffers.
Then I tried the same statement as above:
a) 1st run = 0.8 seconds
b) 2nd, 3rd, ... run = 0.04 seconds
c) change the select statement for any another other day and run it again give 
1st run 0.5 seconds
d) 2nd, 3rd, ... run = 0.04 seconds

6) You wrote "I doubt covering indexes is going to make that query 23x faster."
I decided to check out how mssql performs if it cannot use a covering index.  
In order to do that, I drop my current index and create it again on id, 
dtstamp.  That forces mssql to look into the data file and the index is no 
longer sufficient.
Running the following statement force the "rating" columns to be accessed:
select sum(rating)
FROM test
   WHERE id in 
(58,83,88,98,124,141,170,195,202,252,265,293,305,331,348)
   AND dtstamp >= '2011-10-19 08:00:00' AND dtstamp <= '2011-10-19 
16:00:00'
a) 1st run = 20 seconds
b) 2nd run = 0.6
c) 3rd, ... run = 0.3 seconds
As you can see the response time gets just as bad as in Postgres.
Now lets recreate the mssql index with all the columns and double check the 
response time:
a) 1st run = 2 seconds
b) 2nd run = 0.12
c) 3rd, ... run = 0.3 seconds


Therefore, I must conclude that in the case of mssql the "covering" index is 
making a huge impact.

I have spent the whole day providing this data (takes a while to shuffle 200M 
rows) and tomorrow I will try your suggestion regarding two indexes.

Do you think I should try using the latest build of the source for 9.2 since 
index-only-scan is "ready" according to 
http://www.depesz.com/index.php/2011/10/08/waiting-for-9-2-index-only-scans/
?

Thanks,
   - Gummi


Gudmundur,

Just for clarification purposes:

This schema:

CREATE TABLE test( id integer,  dtstamp timestamp without time zone,  rating 
real) WITH ( OIDS=FALSE);
CREATE INDEX test_all ON test USING btree  (id , dtstamp, rating);

and this query plan:

"Aggregate  (cost=151950.75..151950.76 rows=1 width=0)"
"  ->  Bitmap Heap Scan on data_cbm_reading cbm  (cost=1503.69..151840.82 
rows=43974 width=0)"
"Recheck Cond: ((virtual_id = ANY 
('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[])) AND 
("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND 
("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"
"->  Bitmap Index Scan on data_cbm_reading_all  (cost=0.00..1492.70 
rows=43974 width=0)"
"  Index Cond: ((virtual_id = ANY 
('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[])) AND 
("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND 
("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"

reference different table and index names.
Also, EXPLAIN ANALYZE would provide additional info compared to just EXPLAIN.

One option you could try, is to cluster your table based on " test_all" ind

Re: [PERFORM] Inserts or Updates

2012-02-07 Thread Ofer Israeli
Thanks Kevin for the ideas.  Now that you have corrected our misconception 
regarding the autovacuum not handling index bloating, we are looking into 
running autovacuum frequently enough to make sure we don't have significant 
increase in table size or index size.  We intend to keep our transactions short 
enough not to reach the situation where vacuum full or CLUSTER is needed.

Thanks,
Ofer

-Original Message-
From: Kevin Grittner [mailto:kevin.gritt...@wicourts.gov] 
Sent: Tuesday, February 07, 2012 2:28 PM
To: Ofer Israeli; pgsql-performance@postgresql.org
Cc: Netta Kabala; Olga Vingurt
Subject: Re: [PERFORM] Inserts or Updates

Ofer Israeli  wrote:
 
> Our system has a couple of tables that hold client generated
> information. The clients communicate every minute with the server
> and thus we perform an update on these two tables every minute. We
> are talking about ~50K clients (and therefore records).
> 
> These constant updates have made the table sizes to grow
> drastically and index bloating. So the two solutions that we are
> talking about are:
> 
> 1. Configure autovacuum to work more intensively in both time and
> cost parameters.
> Pros:
> Not a major architectural change.
> Cons:
> Autovacuum does not handle index bloating and thus we will need to
> periodically reindex the tables.
 
Done aggressively enough, autovacuum should prevent index bloat, too.
 
> Perhaps we will also need to run vacuum full periodically if the
> autovacuum cleaning is not at the required pace and therefore
> defragmentation of the tables is needed?
 
The other thing that can cause bloat in this situation is a
long-running transaction.  To correct occasional bloat due to that on
small frequently-updated tables we run CLUSTER on them daily during
off-peak hours.  If you are on version 9.0 or later, VACUUM FULL
instead would be fine.  While this locks the table against other
action while it runs, on a small table it is a small enough fraction
of a second that nobody notices.
 
> 1. Creating a new table every minute and inserting the data into
> this new temporary table (only inserts). This process will happen
> every minute. Note that in this process we will also need to copy
> missing data (clients that didn't communicate) from older table.
> Pros:
> Tables are always compact.
> We will not reach a limit of autovacuum.
> Cons:
> Major architectural change.
 
I would try the other alternative first.
 
-Kevin

Scanned by Check Point Total Security Gateway.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserts or Updates

2012-02-07 Thread Claudio Freire
On Tue, Feb 7, 2012 at 2:27 PM, Ofer Israeli  wrote:
> Thanks Kevin for the ideas.  Now that you have corrected our misconception 
> regarding the autovacuum not handling index bloating, we are looking into 
> running autovacuum frequently enough to make sure we don't have significant 
> increase in table size or index size.  We intend to keep our transactions 
> short enough not to reach the situation where vacuum full or CLUSTER is 
> needed.

Also, rather than going overboard with autovacuum settings, do make it
more aggressive, but also set up a regular, manual vacuum of either
the whole database or whatever tables you need to vacuum at
known-low-load hours.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserts or Updates

2012-02-07 Thread Ofer Israeli
Hi Andy,

The two tables I am referring to have the following specs:
Table 1:
46 columns
23 indexes on fields of the following types:
INTEGER - 7
TIMESTAMP - 2
VARCHAR - 12
UUID - 2

23 columns
12 indexes on fields of the following types:
INTEGER - 3
TIMESTAMP - 1
VARCHAR - 6
UUID - 2

All indexes are default indexes.  

The primary index is INTERGER and is not updated.

The indexes are used for sorting and filtering purposes in our UI.


I will be happy to hear your thoughts on this.

Thanks,
Ofer

-Original Message-
From: Andy Colson [mailto:a...@squeakycode.net] 
Sent: Tuesday, February 07, 2012 4:47 PM
To: Ofer Israeli
Cc: pgsql-performance@postgresql.org; Olga Vingurt; Netta Kabala
Subject: Re: [PERFORM] Inserts or Updates

On 2/7/2012 4:18 AM, Ofer Israeli wrote:
> Hi all,
>
> We are currently "stuck" with a performance bottleneck in our server
> using PG and we are thinking of two potential solutions which I would be
> happy to hear your opinion about.
>
> Our system has a couple of tables that hold client generated
> information. The clients communicate *every* minute with the server and
> thus we perform an update on these two tables every minute. We are
> talking about ~50K clients (and therefore records).
>
> These constant updates have made the table sizes to grow drastically and
> index bloating. So the two solutions that we are talking about are:
>

You dont give any table details, so I'll have to guess.  Maybe you have 
too many indexes on your table?  Or, you dont have a good primary index, 
which means your updates are changing the primary key?

If you only have a primary index, and you are not changing it, Pg should 
be able to do HOT updates.

If you have lots of indexes, you should review them, you probably don't 
need half of them.


And like Kevin said, try the simple one first.  Wont hurt anything, and 
if it works, great!

-Andy

Scanned by Check Point Total Security Gateway.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserts or Updates

2012-02-07 Thread Ofer Israeli
Hi Claudio,

You mean running a VACUUM statement manually?  I would basically try to avoid 
such a situation as the way I see it, the database should be configured in such 
a manner that it will be able to handle the load at any given moment and so I 
wouldn't want to manually intervene here.  If you think differently, I'll be 
happy to stand corrected.


Thanks,
Ofer


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Claudio Freire
Sent: Tuesday, February 07, 2012 7:31 PM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Inserts or Updates

On Tue, Feb 7, 2012 at 2:27 PM, Ofer Israeli  wrote:
> Thanks Kevin for the ideas.  Now that you have corrected our misconception 
> regarding the autovacuum not handling index bloating, we are looking into 
> running autovacuum frequently enough to make sure we don't have significant 
> increase in table size or index size.  We intend to keep our transactions 
> short enough not to reach the situation where vacuum full or CLUSTER is 
> needed.

Also, rather than going overboard with autovacuum settings, do make it
more aggressive, but also set up a regular, manual vacuum of either
the whole database or whatever tables you need to vacuum at
known-low-load hours.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Scanned by Check Point Total Security Gateway.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserts or Updates

2012-02-07 Thread Claudio Freire
On Tue, Feb 7, 2012 at 2:43 PM, Ofer Israeli  wrote:
> You mean running a VACUUM statement manually?  I would basically try to avoid 
> such a situation as the way I see it, the database should be configured in 
> such a manner that it will be able to handle the load at any given moment and 
> so I wouldn't want to manually intervene here.  If you think differently, 
> I'll be happy to stand corrected.

I do think differently.

Autovacuum isn't perfect, and you shouldn't make it too aggressive
since it does generate a lot of I/O activity. If you can pick a time
where it will be able to run without interfering too much, running
vacuum "manually" (where manually could easily be a cron task, ie,
automatically but coming from outside the database software itself),
you'll be able to dial down autovacuum and have more predictable load
overall.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserts or Updates

2012-02-07 Thread Ofer Israeli
>> You mean running a VACUUM statement manually?  I would basically try to
>> avoid such a situation as the way I see it, the database should be
>> configured in such a manner that it will be able to handle the load at 
>> any given moment and so I wouldn't want to manually intervene here.  If 
>> you think differently, I'll be happy to stand corrected.
> 
> I do think differently.
> 
> Autovacuum isn't perfect, and you shouldn't make it too aggressive
> since it does generate a lot of I/O activity. If you can pick a time
> where it will be able to run without interfering too much, running
> vacuum "manually" (where manually could easily be a cron task, ie,
> automatically but coming from outside the database software itself),
> you'll be able to dial down autovacuum and have more predictable load
> overall.
> 


Something specific that you refer to in autovacuum's non-perfection, that is, 
what types of issues are you aware of?

As for the I/O - this is indeed true that it can generate much activity, but 
the way I see it, if you run performance tests and the tests succeed in all 
parameters even with heavy I/O, then you are good to go.  That is, I don't mind 
the server doing lots of I/O as long as it's not causing lags in processing the 
messages that it handles.


Thanks,
Ofer


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserts or Updates

2012-02-07 Thread Claudio Freire
On Tue, Feb 7, 2012 at 4:12 PM, Ofer Israeli  wrote:
> Something specific that you refer to in autovacuum's non-perfection, that is, 
> what types of issues are you aware of?

I refer to its criteria for when to perform vacuum/analyze. Especially
analyze. It usually fails to detect the requirement to analyze a table
- sometimes value distributions change without triggering an
autoanalyze. It's expected, as the autoanalyze works on number of
tuples updates/inserted relative to table size, which is too generic
to catch business-specific conditions.

As everything, it depends on your business. The usage pattern, the
kinds of updates performed, how data varies in time... but in essence,
I've found that forcing a periodic vacuum/analyze of tables beyond
what autovacuum does improves stability. You know a lot more about the
business and access/update patterns than autovacuum, so you can
schedule them where they are needed and autovacuum wouldn't.

> As for the I/O - this is indeed true that it can generate much activity, but 
> the way I see it, if you run performance tests and the tests succeed in all 
> parameters even with heavy I/O, then you are good to go.  That is, I don't 
> mind the server doing lots of I/O as long as it's not causing lags in 
> processing the messages that it handles.

If you don't mind the I/O, by all means, crank it up.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Index with all necessary columns - Postgres vs MSSQL

2012-02-07 Thread Merlin Moncure
On Tue, Feb 7, 2012 at 11:59 AM, Gudmundur Johannesson
 wrote:
> On Tue, Feb 7, 2012 at 3:11 PM, Igor Neyman  wrote:
>>
>> From: Gudmundur Johannesson [mailto:gudmundur.johannes...@gmail.com]
>> Sent: Thursday, February 02, 2012 11:42 AM
>> To: Merlin Moncure
>> Cc: pgsql-performance@postgresql.org
>> Subject: Re: Index with all necessary columns - Postgres vs MSSQL
>>
>> Hi,
>>
>> I want to start by thanking you guys for a quick response and I will try
>> to provide all the information you request.
>>
>> 1) What version am I running:
>> "PostgreSQL 9.1.2, compiled by Visual C++ build 1500, 64-bit"
>>
>> 2) Schema:
>> CREATE TABLE test( id integer,  dtstamp timestamp without time zone,
>> rating real) WITH ( OIDS=FALSE);
>> CREATE INDEX test_all ON test USING btree  (id , dtstamp, rating);
>> 200M rows
>> Table size 9833MB
>> Index size 7653 MB
>>
>> 3) Difference between the first and the second run time?
>> The statement executed is:
>> SELECT count(1) FROM test
>> WHERE id in (58,83,88,98,124,141,170,195,
>> 202,252,265,293,305,331,348)
>> AND dtstamp between cast('2011-10-19 08:00:00' as timestamp)  and
>> cast('2011-10-19 16:00:00' as timestamp)
>> a) 1st run = 26 seconds
>> b) 2nd run = 0.234 seconds
>> c) 3rd-6th run = 0.06 seconds
>>
>> If I perform the query above for another day then I get 26 seconds for the
>> 1st query.
>>
>> 4) What was the execution plan of it
>> "Aggregate  (cost=151950.75..151950.76 rows=1 width=0)"
>> "  ->  Bitmap Heap Scan on data_cbm_reading cbm  (cost=1503.69..151840.82
>> rows=43974 width=0)"
>> "    Recheck Cond: ((virtual_id = ANY
>> ('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[]))
>> AND ("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND
>> ("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"
>> "    ->  Bitmap Index Scan on data_cbm_reading_all
>> (cost=0.00..1492.70 rows=43974 width=0)"
>> "  Index Cond: ((virtual_id = ANY
>> ('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[]))
>> AND ("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND
>> ("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"
>>
>> 5) In this case, I shut down the mssql server/machine and restart it.  To
>> be on the safe side, I ensured the cache is empty using dbcc freeproccache
>> and dbcc dropcleanbuffers.
>> Then I tried the same statement as above:
>> a) 1st run = 0.8 seconds
>> b) 2nd, 3rd, ... run = 0.04 seconds
>> c) change the select statement for any another other day and run it again
>> give 1st run 0.5 seconds
>> d) 2nd, 3rd, ... run = 0.04 seconds
>>
>> 6) You wrote "I doubt covering indexes is going to make that query 23x
>> faster."
>> I decided to check out how mssql performs if it cannot use a covering
>> index.  In order to do that, I drop my current index and create it again on
>> id, dtstamp.  That forces mssql to look into the data file and the index is
>> no longer sufficient.
>> Running the following statement force the "rating" columns to be accessed:
>> select sum(rating)
>> FROM test
>>    WHERE id in
>> (58,83,88,98,124,141,170,195,202,252,265,293,305,331,348)
>>    AND dtstamp >= '2011-10-19 08:00:00' AND dtstamp <=
>> '2011-10-19 16:00:00'
>> a) 1st run = 20 seconds
>> b) 2nd run = 0.6
>> c) 3rd, ... run = 0.3 seconds
>> As you can see the response time gets just as bad as in Postgres.
>> Now lets recreate the mssql index with all the columns and double check
>> the response time:
>> a) 1st run = 2 seconds
>> b) 2nd run = 0.12
>> c) 3rd, ... run = 0.3 seconds
>>
>>
>> Therefore, I must conclude that in the case of mssql the "covering" index
>> is making a huge impact.
>>
>> I have spent the whole day providing this data (takes a while to shuffle
>> 200M rows) and tomorrow I will try your suggestion regarding two indexes.
>>
>> Do you think I should try using the latest build of the source for 9.2
>> since index-only-scan is "ready" according to
>> http://www.depesz.com/index.php/2011/10/08/waiting-for-9-2-index-only-scans/
>> ?
>>
>> Thanks,
>>    - Gummi
>>
>>
>> Gudmundur,
>>
>> Just for clarification purposes:
>>
>> This schema:
>>
>> CREATE TABLE test( id integer,  dtstamp timestamp without time zone,
>>  rating real) WITH ( OIDS=FALSE);
>> CREATE INDEX test_all ON test USING btree  (id , dtstamp, rating);
>>
>> and this query plan:
>>
>> "Aggregate  (cost=151950.75..151950.76 rows=1 width=0)"
>> "  ->  Bitmap Heap Scan on data_cbm_reading cbm  (cost=1503.69..151840.82
>> rows=43974 width=0)"
>> "        Recheck Cond: ((virtual_id = ANY
>> ('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[]))
>> AND ("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND
>> ("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"
>> "        ->  Bitmap Index Scan on data_cbm_reading_all
>>  (cost=0.00..1492.70 rows=43974 width=0)"
>> "              Index Cond: ((virtual_i

Re: [PERFORM] Inserts or Updates

2012-02-07 Thread Andy Colson

-Original Message-
From: Andy Colson [mailto:a...@squeakycode.net]
Sent: Tuesday, February 07, 2012 4:47 PM
To: Ofer Israeli
Cc: pgsql-performance@postgresql.org; Olga Vingurt; Netta Kabala
Subject: Re: [PERFORM] Inserts or Updates

On 2/7/2012 4:18 AM, Ofer Israeli wrote:

Hi all,

We are currently "stuck" with a performance bottleneck in our server
using PG and we are thinking of two potential solutions which I would be
happy to hear your opinion about.

Our system has a couple of tables that hold client generated
information. The clients communicate *every* minute with the server and
thus we perform an update on these two tables every minute. We are
talking about ~50K clients (and therefore records).

These constant updates have made the table sizes to grow drastically and
index bloating. So the two solutions that we are talking about are:



You dont give any table details, so I'll have to guess.  Maybe you have
too many indexes on your table?  Or, you dont have a good primary index,
which means your updates are changing the primary key?

If you only have a primary index, and you are not changing it, Pg should
be able to do HOT updates.

If you have lots of indexes, you should review them, you probably don't
need half of them.


And like Kevin said, try the simple one first.  Wont hurt anything, and
if it works, great!

-Andy




On 2/7/2012 11:40 AM, Ofer Israeli wrote:
> Hi Andy,
>
> The two tables I am referring to have the following specs:
> Table 1:
> 46 columns
> 23 indexes on fields of the following types:
> INTEGER - 7
> TIMESTAMP - 2
> VARCHAR - 12
> UUID - 2
>
> 23 columns
> 12 indexes on fields of the following types:
> INTEGER - 3
> TIMESTAMP - 1
> VARCHAR - 6
> UUID - 2
>
> All indexes are default indexes.
>
> The primary index is INTERGER and is not updated.
>
> The indexes are used for sorting and filtering purposes in our UI.
>
>
> I will be happy to hear your thoughts on this.
>
> Thanks,
> Ofer
>

Fixed that top post for ya.

Wow, so out of 46 columns, half of them have indexes?  That's a lot. 
I'd bet you could drop a bunch of them.  You should review them and see 
if they are actually helping you.  You already found out that maintain 
all those indexes is painful.  If they are not speeding up your SELECT's 
by a huge amount, you should drop them.


Sounds like you went thru your sql statements and any field that was 
either in the where or order by clause you added an index for?


You need to find the columns that are the most selective.  An index 
should be useful at cutting the number of rows down.  Once you have it 
cut down, an index on another field wont really help that much.  And 
after a result set has been collected, an index may or may not help for 
sorting.


Running some queries with EXPLAIN ANALYZE would be helpful.  Give it a 
run, drop an index, try it again to see if its about the same, or if 
that index made a difference.


-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserts or Updates

2012-02-07 Thread Kevin Grittner
Andy Colson  wrote:
 
> Wow, so out of 46 columns, half of them have indexes?  That's a
> lot.  I'd bet you could drop a bunch of them.  You should review
> them and see if they are actually helping you.  You already found
> out that maintain all those indexes is painful.  If they are not
> speeding up your SELECT's by a huge amount, you should drop them.
 
You might want to review usage counts in pg_stat_user_indexes.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Inserts or Updates

2012-02-07 Thread Andy Colson

Oh, I knew I'd seen index usage stats someplace.

give this a run:

select * from pg_stat_user_indexes where relname = 'SuperBigTable';

http://www.postgresql.org/docs/current/static/monitoring-stats.html

-Andy

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] random_page_cost = 2.0 on Heroku Postgres

2012-02-07 Thread Peter van Hardenberg
Per the thread from last month, I've updated the default
random_page_cost on Heroku Postgres to reduce the expected cost of a
random_page on all new databases.

Thanks to everyone who helped come to this conclusion!

Peter

-- 
Peter van Hardenberg
San Francisco, California
"Everything was beautiful, and nothing hurt." -- Kurt Vonnegut

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to demonstrate the effect of direct I/O ?

2012-02-07 Thread Greg Smith

On 02/04/2012 06:25 PM, Tomas Vondra wrote:

What should I do to see the effect of direct I/O?


Test something other than a mainstream Linux filesystem.  The two times 
I've either measured an improvement myself for direct I/O were a) 
Veritas VxFS on Linux, which has some documented acceleration here and 
b) on Solaris.  You won't find a compelling performance improvement 
listed at 
https://ext4.wiki.kernel.org/articles/c/l/a/Clarifying_Direct_IO%27s_Semantics_fd79.html 
and Linux has generally ignored direct I/O as something important to 
optimize for.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] index scan forward vs backward = speed difference of 357X slower!

2012-02-07 Thread Kevin Traster
PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.1.2
20080704 (Red Hat 4.1.2-51), 64-bit

Dedicated DB server

4GB ram

Shared_Buffers = 1 GB

Effective_cache_size = 3GB

Work_mem = 32GB

Analyze done

Queries ran multiple times, same differences/results

Default Statistics = 1000


Query (5366ms) :

explain analyze select initcap (fullname), initcap(issuer),upper(rsymbol),
initcap(industry),
activity,to_char(shareschange,'FM9,999,999,999,999,999'),sharespchange ||+
E'\%' from changes where activity in (4,5) and mfiled >= (select
max(mfiled) from changes) order by shareschange asc limit 15


Slow Ascending explain Analyze:

http://explain.depesz.com/s/zFz


Query (15ms) :

explain analyze select initcap (fullname), initcap(issuer),upper(rsymbol),
initcap(industry),
activity,to_char(shareschange,'FM9,999,999,999,999,999'),sharespchange ||+
E'\%' from changes where activity in (4,5) and mfiled >= (select
max(mfiled) from changes) order by shareschange desc limit 15


Fast descending explain analyze:

http://explain.depesz.com/s/OP7



The index: changes_shareschange is a btree index created with default
ascending order


The query plan and estimates are exactly the same, except desc has index
scan backwards instead of index scan for changes_shareschange.


Yet, actual runtime performance is different by 357x slower for the
ascending version instead of descending.


Why and how do I fix it?


Re: [PERFORM] Index with all necessary columns - Postgres vs MSSQL

2012-02-07 Thread Gudmundur Johannesson
On Tue, Feb 7, 2012 at 3:11 PM, Igor Neyman  wrote:

> From: Gudmundur Johannesson [mailto:gudmundur.johannes...@gmail.com]
> Sent: Thursday, February 02, 2012 11:42 AM
> To: Merlin Moncure
> Cc: pgsql-performance@postgresql.org
> Subject: Re: Index with all necessary columns - Postgres vs MSSQL
>
> Hi,
>
> I want to start by thanking you guys for a quick response and I will try
> to provide all the information you request.
>
> 1) What version am I running:
> "PostgreSQL 9.1.2, compiled by Visual C++ build 1500, 64-bit"
>
> 2) Schema:
> CREATE TABLE test( id integer,  dtstamp timestamp without time zone,
> rating real) WITH ( OIDS=FALSE);
> CREATE INDEX test_all ON test USING btree  (id , dtstamp, rating);
> 200M rows
> Table size 9833MB
> Index size 7653 MB
>
> 3) Difference between the first and the second run time?
> The statement executed is:
> SELECT count(1) FROM test
> WHERE id in (58,83,88,98,124,141,170,195,
> 202,252,265,293,305,331,348)
> AND dtstamp between cast('2011-10-19 08:00:00' as timestamp)  and
> cast('2011-10-19 16:00:00' as timestamp)
> a) 1st run = 26 seconds
> b) 2nd run = 0.234 seconds
> c) 3rd-6th run = 0.06 seconds
>
> If I perform the query above for another day then I get 26 seconds for the
> 1st query.
>
> 4) What was the execution plan of it
> "Aggregate  (cost=151950.75..151950.76 rows=1 width=0)"
> "  ->  Bitmap Heap Scan on data_cbm_reading cbm  (cost=1503.69..151840.82
> rows=43974 width=0)"
> "Recheck Cond: ((virtual_id = ANY
> ('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[]))
> AND ("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND
> ("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"
> "->  Bitmap Index Scan on data_cbm_reading_all
> (cost=0.00..1492.70 rows=43974 width=0)"
> "  Index Cond: ((virtual_id = ANY
> ('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[]))
> AND ("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND
> ("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"
>
> 5) In this case, I shut down the mssql server/machine and restart it.  To
> be on the safe side, I ensured the cache is empty using dbcc freeproccache
> and dbcc dropcleanbuffers.
> Then I tried the same statement as above:
> a) 1st run = 0.8 seconds
> b) 2nd, 3rd, ... run = 0.04 seconds
> c) change the select statement for any another other day and run it again
> give 1st run 0.5 seconds
> d) 2nd, 3rd, ... run = 0.04 seconds
>
> 6) You wrote "I doubt covering indexes is going to make that query 23x
> faster."
> I decided to check out how mssql performs if it cannot use a covering
> index.  In order to do that, I drop my current index and create it again on
> id, dtstamp.  That forces mssql to look into the data file and the index is
> no longer sufficient.
> Running the following statement force the "rating" columns to be accessed:
> select sum(rating)
> FROM test
>WHERE id in
> (58,83,88,98,124,141,170,195,202,252,265,293,305,331,348)
>AND dtstamp >= '2011-10-19 08:00:00' AND dtstamp <=
> '2011-10-19 16:00:00'
> a) 1st run = 20 seconds
> b) 2nd run = 0.6
> c) 3rd, ... run = 0.3 seconds
> As you can see the response time gets just as bad as in Postgres.
> Now lets recreate the mssql index with all the columns and double check
> the response time:
> a) 1st run = 2 seconds
> b) 2nd run = 0.12
> c) 3rd, ... run = 0.3 seconds
>
>
> Therefore, I must conclude that in the case of mssql the "covering" index
> is making a huge impact.
>
> I have spent the whole day providing this data (takes a while to shuffle
> 200M rows) and tomorrow I will try your suggestion regarding two indexes.
>
> Do you think I should try using the latest build of the source for 9.2
> since index-only-scan is "ready" according to
> http://www.depesz.com/index.php/2011/10/08/waiting-for-9-2-index-only-scans/
> ?
>
> Thanks,
>- Gummi
>
>
> Gudmundur,
>
> Just for clarification purposes:
>
> This schema:
>
> CREATE TABLE test( id integer,  dtstamp timestamp without time zone,
>  rating real) WITH ( OIDS=FALSE);
> CREATE INDEX test_all ON test USING btree  (id , dtstamp, rating);
>
> and this query plan:
>
> "Aggregate  (cost=151950.75..151950.76 rows=1 width=0)"
> "  ->  Bitmap Heap Scan on data_cbm_reading cbm  (cost=1503.69..151840.82
> rows=43974 width=0)"
> "Recheck Cond: ((virtual_id = ANY
> ('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[]))
> AND ("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND
> ("timestamp" <= '2011-10-19 16:00:00'::timestamp without time zone))"
> "->  Bitmap Index Scan on data_cbm_reading_all
>  (cost=0.00..1492.70 rows=43974 width=0)"
> "  Index Cond: ((virtual_id = ANY
> ('{58,83,88,98,124,141,170,195,202,252,265,293,305,331,348}'::integer[]))
> AND ("timestamp" >= '2011-10-19 08:00:00'::timestamp without time zone) AND
> ("timestamp" <= '2011-10-