[PERFORM] choosing the right RAID level for PostgresQL database

2011-02-13 Thread sergey
Hello,

I got a disk array appliance of 8 disks 1T each (UltraStor RS8IP4). It will
be used solely by PostgresQL database and I am trying to choose the best
RAID level for it.

The most priority is for read performance since we operate large data sets
(tables, indexes) and we do lots of searches/scans, joins and nested
queries. With the old disks that we have now the most slowdowns happen on
SELECTs.

Fault tolerance is less important, it can be 1 or 2 disks.

Space is the least important factor. Even 1T will be enough.

Which RAID level would you recommend in this situation. The current options
are 60, 50 and 10, but probably other options can be even better.

Thank you!


[PERFORM] postgres: 100% CPU utilization

2007-04-22 Thread Sergey Tsukinovsky
Hi,

 

I'm currently dealing with performance issues of postgres and looking
for some advice.

 

Platform

Postgres: 7.0.2

OS: FreeBSD4.4

DB: size - about 50M, most frequently updated tables are of an average
size of 1000-2000 rows and there are not many of them, about 15 in total

 

Description

My current system load keeps the postgres CPU utilization at the level
of 90-100%. 

'vacuumdb' results in a sharp drop of the CPU usage down to 25-30%, but
not for a long period of time - it gets back to 100% within 30 minutes.

Disk IO ratio during the test keeps on about 0.5 MB/s

 

Questions:

1. When reading the 'vacuum analyze' output how to identify which one of
the actions had the most effect on reducing the CPU usage - garbage
cleaning or statistics recalculation for the analyzer?

2. What would be the recommended set of parameters to tune up in order
to improve the performance over the time, instead of considering an
option to vacuum every 30 minutes or so?

3. Is it safe to run 'vacuum' as frequently as every 15-30 minutes?

4. Suggestions?

 

I know that 7.0.2 is an old version and therefore ran the same test on
7.3.18 - the performance behavior was similar. 

 

Thank you in advance,

Sergey

_

This message, including any attachments, is confidential and/or
privileged and contains information intended only for the person(s)
named above. Any other distribution, copying or disclosure is strictly
prohibited. If you are not the intended recipient or have received this
message in error, please notify us immediately by reply email and
permanently delete the original transmission from all of your systems
and hard drives, including any attachments, without making a copy.

 



Re: [PERFORM] postgres: 100% CPU utilization

2007-04-26 Thread Sergey Tsukinovsky
Thanks for this reply, Ron.
This is almost what I was looking for.

While the upgrade to the latest version is out of the question (which
unfortunately for me became the subject of this discussion) still, I was
looking for the ways to improve the performance of the 7.0.2 version. 

Extensive use of vacuum was almost obvious, though I was hoping to get
some more tips from postrges gurus (or dinosaurs, if you want).

Anyways, the 8.2.4 was not performing so well without auto-vacuum. It
ramped up to 50% of CPU usage in 2 hours under the load.
With the auto-vacuum ON I've got what I really need and thus I know what
to do next.

Just for the record - the hardware that was used for the test has the
following parameters:
AMD Opteron 2GHZ
2GB RAM
LSI Logic SCSI

Thanks everyone for your assistance!
Sergey




-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Ron
Sent: Monday, April 23, 2007 11:07 AM
To: Mario Weilguni
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] postgres: 100% CPU utilization

At 04:53 AM 4/23/2007, Mario Weilguni wrote:
>Am Donnerstag, 19. April 2007 schrieb Sergey Tsukinovsky:
> > 2. What would be the recommended set of parameters to tune up in
order
> > to improve the performance over the time, instead of considering an
> > option to vacuum every 30 minutes or so?
> >
> > 3. Is it safe to run 'vacuum' as frequently as every 15-30 minutes?
>No problem.
>
> >
> > 4. Suggestions?
>Do yourself a favor and upgrade at least to 8.1.x and use autovacuum.
In fact, I'll go one step further and say that pg improves so much 
from release to release that everyone should make superhuman efforts 
to always be running the latest stable release.

Even the differences between 8.1.x and 8.2.x are worth it.

(and the fewer and more modern the releases "out in the wild", the 
easier community support is)
Cheers,
Ron Peacetree 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

__
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
__

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] cached entities

2007-06-20 Thread Sergey Konoplev

Hi

I'd like to know how to get information about which PG entities are in
kernel cache, if possible.

--
Regards,
Sergey Konoplev

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


[PERFORM] Re: [PERFORM] How to diagnose a “context-switching ” storm problem ?

2010-04-12 Thread Sergey Konoplev
2010/4/12 RD黄永卫 :
> I donnt know what make the "context-switching"  storm ?
>
> How should I investigate  the real reason ?
>
> Could you please give me some advice ?

It might be because of cascading locks so try to monitor them when it happens.

You may find this query useful:

SELECT
granted,
count(1) AS locks,
pid,
now() - xact_start AS xact_age,
now() - query_start AS query_age,
current_query
FROM
pg_locks AS l
LEFT JOIN pg_stat_activity AS a ON
pid = procpid
GROUP BY 1, 3, 4, 5, 6
ORDER BY 1 DESC, 2 DESC
-- ORDER BY 4 DESC
LIMIT 100;


-- 
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com /
Linkedin: http://ru.linkedin.com/in/grayhemp /
JID/GTalk: gray...@gmail.com / Skype: gray-hemp / ICQ: 29353802

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] What is the best way to optimize the query.

2010-07-18 Thread Sergey Konoplev
Hello,

On 17 July 2010 12:50, Srikanth  wrote:
> I am sending u the query along with execution plan. Please help
>

It would be better if you start with it:

http://www.postgresql.org/docs/8.4/interactive/indexes.html
http://www.mohawksoft.org/?q=node/56


-- 
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com /
Linkedin: http://ru.linkedin.com/in/grayhemp /
JID/GTalk: gray...@gmail.com / Skype: gray-hemp / ICQ: 29353802

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Copy performance issues

2010-08-20 Thread Sergey Konoplev
Hi,

Try to split your data in small batches. It helped me in a similar
situation recently. I was loading about a million rows into the table
highly populated with indexes and different triggers and the batch
size was 100 (using COPY). The only thing I did with DDL is droped FKs
and recreated them after.

BTW question to gurus - why and in what cases small batch loading
could theoretically be faster then huge one if there is no another
load on the database but this?


On 18 August 2010 20:25, Saadat Anwar  wrote:
> I am having severe COPY performance issues after adding indices. What used
> to take a few minutes (without indices) now takes several hours (with
> indices). I've tried to tweak the database configuration (based on Postgres
> documentation and forums), but it hasn't helped as yet. Perhaps, I haven't
> increased the limits sufficiently. Dropping and recreating indices may not
> be an option due to a long time it takes to rebuild all indices.
>
> I'll appreciate someone looking at my configuration and giving me a few
> ideas on how to increase the copy performance.
>
> Thanks.
> Saadat.
>
> Table structure:
> ===
> table C:
>    Table "public.C"
>   Column  |   Type   | Modifiers
> --+--+---
>  sclk     | double precision | not null
>  chan     | smallint | not null
>  det  | smallint | not null
>  x    | real | not null
>  y    | real | not null
>  z        | real | not null
>  r        | real |
>  t    | real |
>  lat      | real |
>  lon      | real |
>  a        | real |
>  b        | real |
>  c        | real |
>  time | real |
>  qa       | smallint | not null
>  qb       | smallint | not null
>  qc       | smallint | not null
> Indexes:
>     "C_pkey" PRIMARY KEY, btree (sclk, chan, det)
>
>
> partitioned into 19 sub-tables covering lat bands. For example:
>
> sub-table C0:
>    Inherits: C
>    Check constraints:
>        "C0_lat_check" CHECK (lat >= (-10::real) AND lat < 0::real)
>    Indexes:
>    "C0_pkey" PRIMARY KEY, btree (sclk, chan, det)
>    "C0_lat" btree (lat)
>    "C0_time" btree (time)
>    "C0_lon" btree (lon)
>
> sub-table C1:
>    Inherits: C
>    Check constraints:
>        "C1_lat_check" CHECK (lat >= (-20::real) AND lat < -10::real)
>    Indexes:
>    "C1_pkey" PRIMARY KEY, btree (sclk, chan, det)
>    "C1_lat" btree (lat)
>    "C1_time" btree (time)
>    "C1_lon" btree (lon)
>
> The partitions C?s are ~30G (328,000,000 rows) each except one, which is
> ~65G (909,000,000 rows). There are no rows in umbrella table C from which
> C1, C2, ..., C19 inherit. The data is partitioned in C1, C2, ..., C19 in
> order to promote better access. Most people will access the data in C by
> specifying a lat range. Also, C?s can become quite large over time.
>
> The COPY operation copies one file per partition, for each of the 19
> partitions. Each file is between 300,000 - 600,000 records.
>
>
> System configuration:
> 
> 1. RHEL5 x86_64
> 2. 32G RAM
> 3. 8T RAID5 partition for database on a Dell PERC 5/E controller
>    (I understand that I'll never get fast inserts/updates on it based on
>     http://wiki.postgresql.org/wiki/SlowQueryQuestions but cannot change
>     to a RAID0+1 for now).
>     Database's filesystem is ext4 on LVM on RAID5.
> 4. Postgres 8.4.2
>     shared_buffers = 10GB
>     temp_buffers = 16MB
>     work_mem = 2GB
>     maintenance_work_mem = 256MB
>     max_files_per_process = 1000
>     effective_io_concurrency = 3
>     wal_buffers = 8MB
>     checkpoint_segments = 40
>     enable_seqscan = off
>     effective_cache_size = 16GB
> 5. analyze verbose; ran on the database before copy operation
>
> Bonnie++ output:
> =
> Version  1.03   --Sequential Output-- --Sequential Input-
> --Random-
>     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine    Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
> %CP
> dbtest    64240M 78829  99 266172  42 47904   6 58410  72 116247   9 767.9
> 1
>     --Sequential Create-- Random
> Create
>     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>   files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP

[PERFORM] longest prefix match querries

2006-07-07 Thread Hripchenko Sergey
Hi, all.

i'm trying to tune application which makes alots of queries
with semantics(find LONGEST PREFIX MATCH in a string) like:

SELECT cost
FROM tarif
WHERE $1 LIKE prefix
ORDER BY length(prefix) DESC
LIMIT 1

from table like:

CREATE TABLE tarif (
id bigint NOT NULL,
prefix varchar(55) NOT NULL,
cost numeric(x, x) not null
) WITHOUT OIDS;

where $1 is the phone numbers.. for example.
it's common task for voice billing applications.


so, generally i can do it that ways:

WHERE $1 LIKE prefix
WHERE $1 SIMILAR TO prefix
WHERE $1 ~ prefix
WHERE position(prefix in $1) = 0

(
surely i must adopt prefix for matching rules,
e.g. LIKE prefix || '%'
and the best is to create trigger which modifies prefix on
insert/update time
)

BUT! this methods doesn't use indexes!!
this is the main disadvantage.

voip3a=# EXPLAIN ANALYZE SELECT cost FROM tarif WHERE '78123319060' like prefix 
ORDER BY length(prefix) LIMIT 1;
QUERY PLAN
--
 Limit  (cost=3028.90..3028.90 rows=1 width=22) (actual time=162.189..162.192 
rows=1 loops=1)
   ->  Sort  (cost=3028.90..3030.43 rows=612 width=22) (actual 
time=162.181..162.181 rows=1 loops=1)
 Sort Key: length((prefix)::text)
 ->  Seq Scan on tarif  (cost=0.00..3000.57 rows=612 width=22) (actual 
time=4.132..161.715 rows=39 loops=1)
   Filter: ('78123319060'::text ~~ (prefix)::text)
 Total runtime: 162.340 ms
(6 rows)

voip3a=# SELECT count(*) from tarif;
 count

 122323
(1 row)




AND there are many more effective algorithms for searching LONGEST PREFIX
MATCH in a string.. like
http://search.cpan.org/~avif/Tree-Trie-1.1/Trie.pm
for example




Is there any ways to improve perfomance?
May be implement indexes using Tire algoritm ?
(if so can you please show me some url's to start...)


Thanks, Sergey



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] pgsql do not handle NULL constants in the view

2008-08-20 Thread Sergey Hripchenko
Forgot to add

asteriskpilot=> SELECT version();
version

 PostgreSQL 8.2.9 on i386-redhat-linux-gnu, compiled by GCC gcc (GCC) 4.1.2 
20070925 (Red Hat 4.1.2-27)
(1 row)

asteriskpilot=> \q
[EMAIL PROTECTED] data]# uname -a
Linux ast-sql.intermedia.net 2.6.23.1-21.fc7 #1 SMP Thu Nov 1 21:09:24 EDT 2007 
i686 i686 i386 GNU/Linux
 [EMAIL PROTECTED] data]# cat /etc/redhat-release
Fedora release 7 (Moonshine)
[EMAIL PROTECTED] data]# rpm -qa | grep postgres
postgresql-8.2.9-1.fc7
postgresql-libs-8.2.9-1.fc7
postgresql-server-8.2.9-1.fc7
postgresql-contrib-8.2.9-1.fc7
postgresql-devel-8.2.9-1.fc7


From: Sergey Hripchenko
Sent: Wednesday, August 20, 2008 1:17 PM
To: 'pgsql-performance@postgresql.org'
Subject: pgsql do not handle NULL constants in the view

Hi all,

Looks like I found a bug with views optimization:

For example create a test view:

CREATE OR REPLACE VIEW bar AS
SELECT *
FROM (
(
SELECT calldate, duration, billsec, 
get_asterisk_cdr_caller_id(accountcode) AS caller_id
FROM asterisk_cdr
) UNION ALL (
SELECT start_time, get_interval_seconds(completed_time-start_time), 
get_interval_seconds(answered_time-start_time), NULL
FROM asterisk_huntgroups_calls
)
) AS foo;

And perform select on it:

EXPLAIN SELECT * FROM bar WHERE caller_id = 1007;

Theoretically second UNION statement shouldn't be executed at all (because 1007 
!= NULL)... but postgres performs seq-scans on both UNION parts.

asteriskpilot=> EXPLAIN ANALYZE SELECT * FROM bar WHERE caller_id = 1007;
  QUERY PLAN
--
 Subquery Scan foo  (cost=0.00..94509.49 rows=7303 width=28) (actual 
time=12249.473..14841.648 rows=25 loops=1)
   Filter: (caller_id = 1007)
   ->  Append  (cost=0.00..76252.26 rows=1460578 width=24) (actual 
time=0.065..13681.814 rows=1460405 loops=1)
 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..57301.22 rows=1120410 
width=20) (actual time=0.064..10427.353 rows=1120237 loops=1)
   ->  Seq Scan on asterisk_cdr  (cost=0.00..46097.12 rows=1120410 
width=20) (actual time=0.059..8326.974 rows=1120237 loops=1)
 ->  Subquery Scan "*SELECT* 2"  (cost=0.00..18951.04 rows=340168 
width=24) (actual time=0.034..1382.653 rows=340168 loops=1)
   ->  Seq Scan on asterisk_huntgroups_calls  (cost=0.00..15549.36 
rows=340168 width=24) (actual time=0.031..863.529 rows=340168 loops=1)
 Total runtime: 14841.739 ms
(8 rows)


But if we wrap this NULL value into the _IMMUTABLE RETURNS NULL ON NULL INPUT_ 
function postgres handle this view properly

asteriskpilot=> EXPLAIN SELECT * FROM bar WHERE caller_id = 1007;
 QUERY PLAN
-
 Append  (cost=20.21..15663.02 rows=1015 width=24)
   ->  Subquery Scan "*SELECT* 1"  (cost=20.21..3515.32 rows=1014 width=20)
 ->  Bitmap Heap Scan on asterisk_cdr  (cost=20.21..3505.18 rows=1014 
width=20)
   Recheck Cond: (get_asterisk_cdr_caller_id(accountcode) = 1007)
   ->  Bitmap Index Scan on asterisk_cdr_caller_id  
(cost=0.00..19.96 rows=1014 width=0)
 Index Cond: (get_asterisk_cdr_caller_id(accountcode) = 
1007)
   ->  Result  (cost=0.00..12147.69 rows=1 width=24)
 One-Time Filter: NULL::boolean
 ->  Seq Scan on asterisk_huntgroups_calls  (cost=0.00..12147.68 rows=1 
width=24)






This message (including attachments) is private and confidential. If you have 
received this message in error, please notify us and remove it from your system.


[PERFORM] pgsql do not handle NULL constants in the view

2008-08-20 Thread Sergey Hripchenko
Hi all,

Looks like I found a bug with views optimization:

For example create a test view:

CREATE OR REPLACE VIEW bar AS
SELECT *
FROM (
(
SELECT calldate, duration, billsec, 
get_asterisk_cdr_caller_id(accountcode) AS caller_id
FROM asterisk_cdr
) UNION ALL (
SELECT start_time, get_interval_seconds(completed_time-start_time), 
get_interval_seconds(answered_time-start_time), NULL
FROM asterisk_huntgroups_calls
)
) AS foo;

And perform select on it:

EXPLAIN SELECT * FROM bar WHERE caller_id = 1007;

Theoretically second UNION statement shouldn't be executed at all (because 1007 
!= NULL)... but postgres performs seq-scans on both UNION parts.

asteriskpilot=> EXPLAIN ANALYZE SELECT * FROM bar WHERE caller_id = 1007;
  QUERY PLAN
--
 Subquery Scan foo  (cost=0.00..94509.49 rows=7303 width=28) (actual 
time=12249.473..14841.648 rows=25 loops=1)
   Filter: (caller_id = 1007)
   ->  Append  (cost=0.00..76252.26 rows=1460578 width=24) (actual 
time=0.065..13681.814 rows=1460405 loops=1)
 ->  Subquery Scan "*SELECT* 1"  (cost=0.00..57301.22 rows=1120410 
width=20) (actual time=0.064..10427.353 rows=1120237 loops=1)
   ->  Seq Scan on asterisk_cdr  (cost=0.00..46097.12 rows=1120410 
width=20) (actual time=0.059..8326.974 rows=1120237 loops=1)
 ->  Subquery Scan "*SELECT* 2"  (cost=0.00..18951.04 rows=340168 
width=24) (actual time=0.034..1382.653 rows=340168 loops=1)
   ->  Seq Scan on asterisk_huntgroups_calls  (cost=0.00..15549.36 
rows=340168 width=24) (actual time=0.031..863.529 rows=340168 loops=1)
 Total runtime: 14841.739 ms
(8 rows)


But if we wrap this NULL value into the _IMMUTABLE RETURNS NULL ON NULL INPUT_ 
function postgres handle this view properly

asteriskpilot=> EXPLAIN SELECT * FROM bar WHERE caller_id = 1007;
 QUERY PLAN
-
 Append  (cost=20.21..15663.02 rows=1015 width=24)
   ->  Subquery Scan "*SELECT* 1"  (cost=20.21..3515.32 rows=1014 width=20)
 ->  Bitmap Heap Scan on asterisk_cdr  (cost=20.21..3505.18 rows=1014 
width=20)
   Recheck Cond: (get_asterisk_cdr_caller_id(accountcode) = 1007)
   ->  Bitmap Index Scan on asterisk_cdr_caller_id  
(cost=0.00..19.96 rows=1014 width=0)
 Index Cond: (get_asterisk_cdr_caller_id(accountcode) = 
1007)
   ->  Result  (cost=0.00..12147.69 rows=1 width=24)
 One-Time Filter: NULL::boolean
 ->  Seq Scan on asterisk_huntgroups_calls  (cost=0.00..12147.68 rows=1 
width=24)






This message (including attachments) is private and confidential. If you have 
received this message in error, please notify us and remove it from your system.


Re: [PERFORM] pgsql do not handle NULL constants in the view

2008-08-20 Thread Sergey Hripchenko
Thx it helps.

Shame on me %) I forgot that NULL itself has no type, and thought that each 
constant in the view are casted to the resulting type at the creation time.

-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED]
Sent: Wednesday, August 20, 2008 4:54 PM
To: Sergey Hripchenko
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] pgsql do not handle NULL constants in the view

Sergey Hripchenko <[EMAIL PROTECTED]> writes:
> CREATE OR REPLACE VIEW bar AS
> SELECT *
> FROM (
> (
> SELECT calldate, duration, billsec, 
> get_asterisk_cdr_caller_id(accountcode) AS caller_id
> FROM asterisk_cdr
> ) UNION ALL (
> SELECT start_time, get_interval_seconds(completed_time-start_time), 
> get_interval_seconds(answered_time-start_time), NULL
> FROM asterisk_huntgroups_calls
> )
> ) AS foo;

Try casting the NULL to integer (or whatever the datatype of the other
union arm is) explicitly.

regards, tom lane

This message (including attachments) is private and confidential. If you have 
received this message in error, please notify us and remove it from your system.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Why creating GIN table index is so slow than inserting data into empty table with the same index?

2009-03-23 Thread Sergey Burladyan

example:

select version();
  version

 PostgreSQL 8.3.6 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 
4.3.3-3) 4.3.3

show maintenance_work_mem ;
 maintenance_work_mem
--
 128MB

create table a (i1 int, i2 int, i3 int, i4 int, i5 int, i6 int);

insert into a select n, n, n, n, n, n from generate_series(1, 10) as n;
INSERT 0 10
Время: 570,110 мс

create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) );
CREATE INDEX
Время: 203068,314 мс

truncate a;
drop index arr_gin ;

create index arr_gin on a using gin ( (array[i1, i2, i3, i4, i5, i6]) );
CREATE INDEX
Время: 3,246 мс

insert into a select n, n, n, n, n, n from generate_series(1, 10) as n;
INSERT 0 10
Время: 2405,481 мс

select pg_size_pretty(pg_total_relation_size('a')) as total,
   pg_size_pretty(pg_relation_size('a')) as table;
  total  |  table
-+-
 9792 kB | 5096 kB


203068.314 ms VS 2405.481 ms, is this behaviour normal ?

Thanks !

-- 
Sergey Burladyan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] regression ? 8.4 do not apply One-Time Filter to subquery

2009-07-01 Thread Sergey Burladyan

8.4 from CVS HEAD:
EXPLAIN ANALYZE select * from (select n, 1 as r from generate_series(1, 10) 
as n union all select n, 2 from generate_series(1, 10) as n) as x where r = 
3;
   QUERY PLAN   
 

 Result  (cost=0.00..30.00 rows=10 width=36) (actual time=90.723..90.723 rows=0 
loops=1)
   ->  Append  (cost=0.00..30.00 rows=10 width=36) (actual time=90.720..90.720 
rows=0 loops=1)
 ->  Function Scan on generate_series n  (cost=0.00..15.00 rows=5 
width=36) (actual time=45.191..45.191 rows=0 loops=1)
   Filter: (1 = 3)
 ->  Function Scan on generate_series n  (cost=0.00..15.00 rows=5 
width=36) (actual time=45.522..45.522 rows=0 loops=1)
   Filter: (2 = 3)
 Total runtime: 118.709 ms
(7 rows)

8.3.7:
EXPLAIN ANALYZE select * from (select n, 1 as r from generate_series(1, 10) 
as n union all select n, 2 from generate_series(1, 10) as n) as x where r = 
3;
QUERY PLAN
--
 Result  (cost=0.00..25.02 rows=2 width=8) (actual time=0.005..0.005 rows=0 
loops=1)
   ->  Append  (cost=0.00..25.02 rows=2 width=8) (actual time=0.004..0.004 
rows=0 loops=1)
 ->  Result  (cost=0.00..12.50 rows=1 width=4) (actual 
time=0.001..0.001 rows=0 loops=1)
   One-Time Filter: false
   ->  Function Scan on generate_series n  (cost=0.00..12.50 rows=1 
width=4) (never executed)
 ->  Result  (cost=0.00..12.50 rows=1 width=4) (actual 
time=0.000..0.000 rows=0 loops=1)
   One-Time Filter: false
   ->  Function Scan on generate_series n  (cost=0.00..12.50 rows=1 
width=4) (never executed)
 Total runtime: 0.053 ms
(9 rows)

Is it right ?

-- 
Sergey Burladyan

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] sub-select makes query take too long - unusable

2009-11-22 Thread Sergey Aleynikov
Hello,

SubPlan 2
->  Seq Scan on item_price  (cost=0.00..423.30 rows=1 width=8)
(actual time=1.914..1.914 rows=0 loops=10669)
  Filter: ((item_id = $1) AND (zone_id =
'OUsEaRcAA3jQrg42WHUm8A'::bpchar) AND (price_type = 0) AND
((size_name)::text = ($2)::text))

This means that, for every one of 10669 output rows, DB scanned whole
item_price table, spending 20.4 of 20.8 secs there. Do you have any
indexes there? Especially, on item_id column.

Best regards,
Sergey Aleynikov

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-22 Thread Sergey Aleynikov
Hello,

> * Is there any way I can nail the query planner to a particular query plan,
> rather than have it keep changing its mind?

All these setting leads to choosing different plans. If you have small
number of complex sensitive queires, you can run explain on them with
correct settings, then re-order query (joins, subselects) according to
given query plan, and, before running it, call

set local join_collapse_limit = 1;
set local from_collapse_limit = 1;

This will prevent joins/subselects reordering inside current
transaction block, leading to consistent plans. But that gives no 100%
guarantee for chosing, for example, hash join over nested loop.

You can, as noted in presiouse message, experiment with gego_*
constants - especially, lower geqo_threshold to catch better plans
(but this can take many runs). Or, for production, set geqo=off - this
can dramatically increasy query planning, but results would be more
consistent.

>Is it normal to keep having to tune the query-planner's settings, or should it 
>be possible to >set it once, and leave it?

I have collapse limits set for some complex reporting queries, and
think it's adequate solutuon.

>Worse still, doing a cluster of most of the tables and vacuum full analyze   
>made most of the queries >respond much better, but the vox query became very 
>slow again, until I set it to A (which, a few days >ago, did not work well).

Is your autovacuuming tuned correctly? For large tables, i set it
running much more agressivly then in default install.

Best regards,
Sergey Aleynikov

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-26 Thread Sergey Aleynikov
Hello,

2009/11/25 Richard Neill :

>It's a simple query, but using a complex view. So I can't really re-order it.
View is inserted directly into your query by PG, and then reordered
according to from_collapse_limit. Probably, problems lies in the view?
How good is it performing? Or from_collapse_limit is _too low_, so
view isn't expanded right?

>Are you saying that this means that the query planner frequently makes the 
>wrong choice here?
Look at explain analyze. If on some step estimation from planner
differs by (for start) two order of magnitude from what's really
retrieved, then there's a wrong statistics count. But if, on every
step, estimation is not too far away from reality - you suffer from
what i've described - planner can't reoder efficiently enough query.
Because of it happen sometimes - i suspect gego. Or wrong statistics.

>I hadn't changed it from the defaults; now I've changed it to:
> autovacuum_max_workers = 6
> autovacuum_vacuum_scale_factor = 0.002
> autovacuum_analyze_scale_factor = 0.001

If your tables are not >100mln rows, that's agressive enough. On
100mln rows, this'd analyze table every 100k changed
(inserted/updated/deleted) rows. Is this enough for you? Default on
large tables are definatly too low. If you get now consistent times -
then you've been hit by wrong statistics.

Best regards,
Sergey Aleynikov

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query times change by orders of magnitude as DB ages

2009-11-26 Thread Sergey Aleynikov
Hello,

2009/11/25 Richard Neill :

Also, if you find odd statistics of freshly analyzed table - try
increasing statistics target, using
ALTER TABLE .. ALTER COLUMN .. SET STATISTICS ...

If you're using defaults - it's again low for large tables. Start with
200, for example.

Best regards,
Sergey Aleynikov

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] unlogged tables

2011-11-14 Thread Sergey Konoplev
Hi,

On 12 November 2011 00:18, Stephen Frost  wrote:
> In a crash, unlogged tables are automatically truncated.

BTW I wonder what for they are truncated in a crash case?

-- 
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray...@gmail.com Skype: gray-hemp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] unlogged tables

2011-11-14 Thread Sergey Konoplev
On 14 November 2011 12:58, Richard Huxton  wrote:
> Because they bypass the transaction-log (WAL), hence unlogged.
> There's no way to know whether there were partial updates applied when the
> system restarts.

I probably did not understand the "truncate" meaning correct. It
truncates all the records of the table or several recent records only?

>
> --
>  Richard Huxton
>  Archonet Ltd
>



-- 
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray...@gmail.com Skype: gray-hemp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] unlogged tables

2011-11-14 Thread Sergey Konoplev
On 14 November 2011 14:17, Richard Huxton  wrote:
> On 14/11/11 10:08, Sergey Konoplev wrote:
>>
>> On 14 November 2011 12:58, Richard Huxton  wrote:
> Let's say you were doing something like "UPDATE unlogged_table SET x=1 WHERE
> y=2". If a crash occurs during this command, there's no guarantee that the
> affected disk pages were all updated. Worse, a single page might be
> partially updated or even have rubbish in it (depending on the nature of the
> crash).
>
> Without the WAL there's no way to check whether the table is good or not, or
> even to know what the last updates were. So - the only safe thing to do is
> truncate the unlogged tables.
>
> In the event of a normal shutdown, we can flush all the writes to disk so we
> know all the data has been written, so there is no need to truncate.

Thank you for the explanation. Now I understand it.

>
> --
>  Richard Huxton
>  Archonet Ltd
>



-- 
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com
LinkedIn: http://ru.linkedin.com/in/grayhemp
JID/GTalk: gray...@gmail.com Skype: gray-hemp

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why is a hash join being used?

2012-06-20 Thread Sergey Konoplev
On Wed, Jun 20, 2012 at 1:34 AM, Tim Jacobs  wrote:
> The nested loop join performs very quickly, whereas the hash join is 
> incredibly slow. If I disable the hash join temporarily then a nested loop 
> join is used in the second case and is the query runs much more quickly. How 
> can I change my configuration to favor the nested join in this case? Is this 
> a bad idea?

First do ANALYZE the tables and try the tests again.

If it helped check your autovacuum configuration. Look at
http://www.postgresql.org/docs/9.1/static/routine-vacuuming.html#AUTOVACUUM
and the pg_stat_user_tables table (last_* and *_count fields).

If it still produces wrong plan then try to increase statistics
entries by ALTER TABLE SET STATISTICS (do not forget to ANALYZE after
doing it) or by the default_statistics_target configuration parameter.
Read more about it here
http://www.postgresql.org/docs/9.1/static/planner-stats.html.

> Alternatively, since I will be doing selections like this many times, what 
> indexes can be put in place to expedite the query without mucking with the 
> query optimizer? I've already created an index on the struct_id field of 
> residue_atom_coords (each unique struct_id should only have a small number of 
> rows for the residue_atom_coords table).

As I can see everything is okay with indexes.

>
> Thanks in advance,
> Tim
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sergey Konoplev

a database architect, software developer at PostgreSQL-Consulting.com
http://www.postgresql-consulting.com

Jabber: gray...@gmail.com Skype: gray-hemp Phone: +79160686204

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any tool/script available which can be used to measure scalability of an application's database.

2012-07-13 Thread Sergey Konoplev
On Tue, Jul 10, 2012 at 12:21 PM, Sreejith Balakrishnan
 wrote:
> Dear @,
>
> Is there any tool or some sort of script available, for PostgreSQL, which
> can be used to measure scalability of an application's database. Or is there
> any guideline on how to do this.

"scalability of an application's database" can be understood either
like a relation of transactions per second to database size or like an
ability of database to be sharded/partitioned or may be like something
else.

Could you please explain more specifically the original task?
What is the goal of it?

> I am a bit confused about the concept of measuring scalability of an
> application's database.
>
> How is the scalability measured?
>
> Is it like loading the DB with a bulk data volume and then do performance
> testing by using tools like JMeter?
>
> Could any one kindly help me on this..
>
> Thanks,
> Sreejith.
>
> =-=-=
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you



-- 
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray...@gmail.com Skype: gray-hemp Phone: +79160686204

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Any tool/script available which can be used to measure scalability of an application's database.

2012-07-14 Thread Sergey Konoplev
> We have around 15 to 18 separate products.What we are told to do is to check
> the scalability of the underlying DB of each product (application).
>
>> Sounds like your client / boss has a case of buzz-word-itis. "Scalability"
>> means lots of different things:

Yes, it is still not clear what exactly they want from you, but that
is what I would do...

I would take the metrics Craig described. These ones:

> - How well it copes with growth of data sizes
> - How well it copes with growth of query rates / activity
> - How well it copes with larger user counts (may not be the same as prior)
- Also hard drives activity, CPU, etc

And started to collect this statistics using monitoring tools like
http://www.cacti.net/, for example.

After a week/month/quarter, as time passes and the database activity
and size changes, you will see how the measurements are changed
(usually degraded). So you would be able to make conclusions on
whether your environment meets current requirements or not and to
forecast critical points.

As Craig mentioned, you may also try to simulate your database
activity either with pgbench. I would just like to show you this
article http://www.westnet.com/~gsmith/content/postgresql/pgbench-scaling.htm
where you will find some hints for your case.

Also look at the playback tools
http://wiki.postgresql.org/wiki/Statement_Playback.

-- 
Sergey Konoplev

a database architect, software developer at PostgreSQL-Consulting.com
http://www.postgresql-consulting.com

Jabber: gray...@gmail.com Skype: gray-hemp Phone: +79160686204

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Process 11812 still waiting for ExclusiveLock on extension of relation

2012-07-18 Thread Sergey Konoplev
Hi,

On Tue, Jul 17, 2012 at 7:57 PM, David Kerr  wrote:
> I suspect that this is related to a sustained heavy load that would stop 
> autovacuum from
> getting at this table... Does that sound plausible?

Well, not sure. Let us look at the table's statistics first.

\x
select * from pg_stat_user_tables where relname = 'yourtablename';

> I'm wondering what options I have to smooth over these episodes / speed up 
> the extensions.
> I'm thinking of something like, CLUSTER or VACUUM FULL (those take quite a 
> run so I'd like
> some direction on it before i TiaS =) )

Instead of CLUSTER I would suggest you to use one of the tools below.
They do not block the table as CLUSTER does.

pg_reorg http://reorg.projects.postgresql.org/pg_reorg.html
Faster, but requires a lot of IO and additional disk space, also it
needs PK on the table.

pgcompactor http://code.google.com/p/pgtoolkit/
Allows to smooth IO, auto-determines reorganizing necessity for tables
and indexes, no PK restriction.

> I suspect that Partitioning would help.  Any other ideas?

Partitioning is a good thing to think about when you deal with big tables.

>
>
> Jul 17 08:11:52 perf: [3-1] user=test,db=perf LOG:  process 11812 still 
> waiting for ExclusiveLock
> on extension of relation 60777 of database 16387 after 1000.270 ms
>
> System resouces were fine:
>
> PGDATA
> --
> 07/17/12 08:11:48
> Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
> avgqu-sz   await  svctm  %util
> dm-2  1.20  3085.20   77.20 3994.20 15363.20 56680.0017.69
> 15.573.82   0.06  26.22
>
> 07/17/12 08:11:53
> Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
> avgqu-sz   await  svctm  %util
> dm-2  0.40  2097.20   51.80 2610.20 10344.00 37659.2018.03
>  5.231.96   0.05  14.28
>
>
> PGXLOG
> --
> 07/17/12 08:11:48
> Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
> avgqu-sz   await  svctm  %util
> dm-4  0.00  3958.200.00  600.40 0.00 36449.6060.71
>  0.440.74   0.73  43.54
>
> 07/17/12 08:11:53
> Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz 
> avgqu-sz   await  svctm  %util
> dm-4  0.00  2905.200.00  403.40 0.00 26492.8065.67
>  0.320.80   0.79  31.96
>
> CPU
> --
> CPU %user %nice   %system   %iowait%steal 
> %idle
> 08:11:48all 24.49  0.00  3.19  1.17  0.00 
> 71.15
> 08:11:53all 17.53  0.00  3.13  0.68  0.00 
> 78.65
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sergey Konoplev

a database architect, software developer at PostgreSQL-Consulting.com
http://www.postgresql-consulting.com

Jabber: gray...@gmail.com Skype: gray-hemp Phone: +79160686204

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hash aggregation

2012-10-10 Thread Sergey Konoplev
On Wed, Oct 10, 2012 at 9:09 AM, Korisk  wrote:
> Hello! Is it possible to speed up the plan?
>  Sort  (cost=573977.88..573978.38 rows=200 width=32) (actual 
> time=10351.280..10351.551 rows=4000 loops=1)
>Output: name, (count(name))
>Sort Key: hashcheck.name
>Sort Method: quicksort  Memory: 315kB
>->  HashAggregate  (cost=573968.24..573970.24 rows=200 width=32) (actual 
> time=10340.507..10341.288 rows=4000 loops=1)
>  Output: name, count(name)
>  ->  Seq Scan on public.hashcheck  (cost=0.00..447669.16 
> rows=25259816 width=32) (actual time=0.019..2798.058 rows=25259817 loops=1)
>Output: id, name, value
>  Total runtime: 10351.989 ms

AFAIU there are no query optimization solution for this.

It may be worth to create a table hashcheck_stat (name, cnt) and
increment/decrement the cnt values with triggers if you need to get
counts fast.

-- 
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray...@gmail.com Skype: gray-hemp Phone: +14158679984


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hash aggregation

2012-10-10 Thread Sergey Konoplev
On Wed, Oct 10, 2012 at 9:13 PM, Korisk  wrote:
>->  Index Only Scan Backward using hashcheck_name_idx on public.hashcheck  
> (cost=100.00..1398674.92 rows=25986792 width=32)

It seems odd.

Is it possible to look at the non default configuration?

SELECT name, setting, reset_val FROM pg_settings WHERE setting <> reset_val;

>  (actual time=0.104..3785.767 rows=25990002 loops=1)
>  Output: name
>  Heap Fetches: 0
>  Total runtime: 7452.509 ms
> (6 rows)
>
> Благодаря шаманствам на:
> http://www.sql.ru/forum/actualthread.aspx?tid=974484
>
> 11.10.2012, 01:30, "Sergey Konoplev" :
>> On Wed, Oct 10, 2012 at 9:09 AM, Korisk  wrote:
>>
>>>  Hello! Is it possible to speed up the plan?
>>>   Sort  (cost=573977.88..573978.38 rows=200 width=32) (actual 
>>> time=10351.280..10351.551 rows=4000 loops=1)
>>> Output: name, (count(name))
>>> Sort Key: hashcheck.name
>>> Sort Method: quicksort  Memory: 315kB
>>> ->  HashAggregate  (cost=573968.24..573970.24 rows=200 width=32) 
>>> (actual time=10340.507..10341.288 rows=4000 loops=1)
>>>   Output: name, count(name)
>>>   ->  Seq Scan on public.hashcheck  (cost=0.00..447669.16 
>>> rows=25259816 width=32) (actual time=0.019..2798.058 rows=25259817 loops=1)
>>> Output: id, name, value
>>>   Total runtime: 10351.989 ms
>>
>> AFAIU there are no query optimization solution for this.
>>
>> It may be worth to create a table hashcheck_stat (name, cnt) and
>> increment/decrement the cnt values with triggers if you need to get
>> counts fast.
>>
>> --
>> Sergey Konoplev
>>
>> a database and software architect
>> http://www.linkedin.com/in/grayhemp
>>
>> Jabber: gray...@gmail.com Skype: gray-hemp Phone: +14158679984



-- 
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray...@gmail.com Skype: gray-hemp Phone: +14158679984


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hash aggregation

2012-10-11 Thread Sergey Konoplev
On Thu, Oct 11, 2012 at 8:15 AM, Korisk  wrote:
> What's your seq_page_cost and random_page_cost?
> hashes=# SELECT name, setting, reset_val FROM pg_settings WHERE setting <> 
> reset_val;
>   name   |setting | reset_val
> -++---
>  archive_command | (disabled) |
>  enable_bitmapscan   | off| on
>  enable_indexscan| off| on
>  enable_seqscan  | off| on
>  log_file_mode   | 0600   | 384
>  random_page_cost| 0.1| 4
>  seq_page_cost   | 0.1| 1
>  transaction_isolation   | read committed | default
>  unix_socket_permissions | 0777   | 511

Could you please try to set *_page_cost to 1 and then EXPLAIN ANALYZE it again?

>->  Index Only Scan Backward using hashcheck_name_idx on public.hashcheck
>  (cost=100.00..1398674.92 rows=25986792 width=32)
>  (actual time=0.104..3785.767 rows=25990002 loops=1)

I am just guessing but it might probably be some kind of a precision
bug, and I would like to check this.

> (9 rows)
>
> Postgresql 9.2.1 was configured and built with default settings.
>
> Thank you.



-- 
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray...@gmail.com Skype: gray-hemp Phone: +14158679984


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Drawbacks of create index where is not null ?

2012-10-11 Thread Sergey Konoplev
On Wed, Oct 10, 2012 at 10:42 PM, Scott Marlowe  wrote:
> I think the query planner has gotten a little smarter of late:
>
> smarlowe=# create index on a (i) where i is not null;
> CREATE INDEX
> smarlowe=# explain select * from a where i =10;
>QUERY PLAN
> 
>  Bitmap Heap Scan on a  (cost=4.28..78.00 rows=100 width=4)
>Recheck Cond: (i = 10)
>->  Bitmap Index Scan on a_i_idx  (cost=0.00..4.26 rows=100 width=0)
>  Index Cond: (i = 10)
> (4 rows)

It is even smarter a little bit more:

[local]:5432 grayhemp@grayhemp=# create index h_idx1 on h (n) where v
is not null;
CREATE INDEX

[local]:5432 grayhemp@grayhemp=# explain analyze select * from h where
v = '0.5';
  QUERY PLAN
--
 Bitmap Heap Scan on h  (cost=1616.10..8494.68 rows=1 width=30)
(actual time=111.735..111.735 rows=0 loops=1)
   Recheck Cond: (v IS NOT NULL)
   Filter: (v = '0.5'::text)
   ->  Bitmap Index Scan on h_idx1  (cost=0.00..1616.10 rows=102367
width=0) (actual time=19.027..19.027 rows=100271 loops=1)
(5 rows)


-- 
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray...@gmail.com Skype: gray-hemp Phone: +14158679984


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hash aggregation

2012-10-11 Thread Sergey Konoplev
On Thu, Oct 11, 2012 at 8:55 PM, Korisk  wrote:
> hashes=# explain analyse verbose select name, count(name) as cnt from  
> hashcheck group by name order by name desc;

Now set enable_bitmapscan and enable_indexscan to on an try it again.

Then set enable_seqscan to on and run it one more time.

>   
>   QUERY PLAN
>
> --
> 
>  GroupAggregate  (cost=100.00..1596612.97 rows=200 width=32) 
> (actual time=0.136..7272.240 rows=4001 loops=1)
>Output: name, count(name)
>->  Index Only Scan using hashcheck_name_rev_idx on public.hashcheck  
> (cost=100.00..140.96 rows=25990002 width=32) (act
> ual time=0.121..3624.624 rows=25990002 loops=1)
>  Output: name
>  Heap Fetches: 0
>  Total runtime: 7272.735 ms
> (6 rows)
>
>
>
>
>
>
> 11.10.2012, 21:55, "Sergey Konoplev" :
>> On Thu, Oct 11, 2012 at 8:15 AM, Korisk  wrote:
>>
>>>  What's your seq_page_cost and random_page_cost?
>>>  hashes=# SELECT name, setting, reset_val FROM pg_settings WHERE setting <> 
>>> reset_val;
>>>name   |setting | reset_val
>>>  -++---
>>>   archive_command | (disabled) |
>>>   enable_bitmapscan   | off| on
>>>   enable_indexscan| off| on
>>>   enable_seqscan  | off| on
>>>   log_file_mode   | 0600   | 384
>>>   random_page_cost| 0.1| 4
>>>   seq_page_cost   | 0.1| 1
>>>   transaction_isolation   | read committed | default
>>>   unix_socket_permissions | 0777   | 511
>>
>> Could you please try to set *_page_cost to 1 and then EXPLAIN ANALYZE it 
>> again?
>>
>>> ->  Index Only Scan Backward using hashcheck_name_idx on 
>>> public.hashcheck
>>>   (cost=100.00..1398674.92 rows=25986792 width=32)
>>>   (actual time=0.104..3785.767 rows=25990002 loops=1)
>>
>> I am just guessing but it might probably be some kind of a precision
>> bug, and I would like to check this.
>>
>>>  (9 rows)
>>>
>>>  Postgresql 9.2.1 was configured and built with default settings.
>>>
>>>  Thank you.
>>
>> --
>> Sergey Konoplev
>>
>> a database and software architect
>> http://www.linkedin.com/in/grayhemp
>>
>> Jabber: gray...@gmail.com Skype: gray-hemp Phone: +14158679984



-- 
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray...@gmail.com Skype: gray-hemp Phone: +14158679984


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hash aggregation

2012-10-12 Thread Sergey Konoplev
On Thu, Oct 11, 2012 at 9:14 PM, Korisk  wrote:
> Strange situation.
> After indexscan enabling the cost is seriously decreased.

AFAIK when the planner has to choose between index scans and seq scans
and both of this options are off it uses one of this strategies anyway
but puts 100.00 as a lower cost for this (thanks Maxim Boguk
for the explanation in chat).

>->  Index Only Scan using hashcheck_name_rev_idx on public.hashcheck  
> (cost=100.00..140.96 rows=25990002 width=32) (act
> ual time=0.166..3698.776 rows=25990002 loops=1)

So when you enabled one of these options it started using it as usual.

> hashes=# set enable_indexscan=on;
> SET
> hashes=# explain  analyse verbose select name, count(name) as cnt from  
> hashcheck group by name order by name desc;

[cut]

>->  Index Only Scan using hashcheck_name_rev_idx on public.hashcheck  
> (cost=0.00..40.96 rows=25990002 width=32) (actual time=0.129.
> .3653.848 rows=25990002 loops=1)

What I can not understand is why the seq scan's estimated cost is
better the index scan's one. It depends on the number of pages in
index/relation. May be the index is heavily bloated?

Let's see the sizes:

select pg_total_relation_size('hashcheck')
select pg_total_relation_size('hashcheck_name_rev_idx');


> hashes=# set enable_seqscan=on;
> SET
> hashes=# explain  analyse verbose select name, count(name) as cnt from  
> hashcheck group by name order by name desc;
>   QUERY PLAN
>
> --
> -
>  Sort  (cost=565411.67..565412.17 rows=200 width=32) (actual 
> time=21746.799..21747.026 rows=4001 loops=1)
>Output: name, (count(name))
>Sort Key: hashcheck.name
>Sort Method: quicksort  Memory: 315kB
>->  HashAggregate  (cost=565402.03..565404.03 rows=200 width=32) (actual 
> time=21731.551..21733.277 rows=4001 loops=1)
>  Output: name, count(name)
>  ->  Seq Scan on public.hashcheck  (cost=0.00..435452.02 
> rows=25990002 width=32) (actual time=29.431..13383.812 rows=25990002 loop
> s=1)
>Output: id, name, value
>  Total runtime: 21747.356 ms
> (9 rows)
>
>
>
>
>



-- 
Sergey Konoplev

a database and software architect
http://www.linkedin.com/in/grayhemp

Jabber: gray...@gmail.com Skype: gray-hemp Phone: +14158679984


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow query: bitmap scan troubles

2012-12-04 Thread Sergey Konoplev
On Tue, Dec 4, 2012 at 9:47 AM,   wrote:
> eagerly awaiting 6pm when I can bring the DB down and start tweaking. The
> effective_work_mem setting is going from 6Gb->88Gb which I think will make
> quite a difference.

I also wonder if increasing (say x10) of default_statistics_target or
just doing ALTER TABLE SET STATISTICS for particular tables will help.
It will make planned to produce more precise estimations. Do not
forget ANALYZE afer changing it.

>
> I still can't quite wrap around my head why accessing an index is expected
> to use more disk access than doing a bitmap scan of the table itself, but I
> guess it does make a bit of sense if postgres assumes the table is more
> likely to be cached.
>
> It's all quite, quite fascinating :)
>
> I'll let you know how it goes.
>
> - Phil
>
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



--
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Occasional timeouts on TRUNCATE and simple INSERTs

2012-12-11 Thread Sergey Konoplev
On Tue, Dec 11, 2012 at 1:19 PM, Osborn, Jeff  wrote:
> I am seeing some strange performance on a new pg9.1 instance.  We are seeing 
> occasional statement timeouts on some TRUNCATEs and INSERTs.  In both cases, 
> the statements are quite simple:
>- TRUNCATE schema.table;
>   -  INSERT  INTO schema.table VALUES  ($1,2,$2,'');
>
> Sometimes these will succeed.  Occasionally I see timeouts.  The 
> statement_timeout is set to 60 seconds.  These tables are not particularly 
> large; in the case of the insert, the table only has three rows.

A most common case is when backup (pg_dump*) is running TRUNCATE has
to wait for it because it acquires an access exclusive lock on a table
and all other queries including INSERT have to wait for the TRUNCATE.
Check the backup case first.

> Our previous Postgresql 8.2 instance did not have this problem.

This is strange for me.

-- 
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Read rows deleted

2012-12-12 Thread Sergey Konoplev
Hi,

On Wed, Dec 12, 2012 at 8:26 AM, Alejandro Carrillo  wrote:
> Anybody knows a JDBC or a multiplatform code that let read the delete rows
> of a table without writing of a table file?
> Anybody knows how to create a table using a table file?

I am not sure what you mean but may be one of this links will help you:

- http://www.postgresql.org/docs/9.2/static/file-fdw.html
- http://pgxn.org/dist/odbc_fdw/.

>
> thanks



--
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists?

2012-12-19 Thread Sergey Konoplev
Hi,

On Wed, Dec 19, 2012 at 1:13 PM, Richard Neill  wrote:
>  Index Scan using tbl_tracker_performance_1_idx on tbl_tracker
> (cost=0.00..5440.83 rows=1 width=174) (actual time=0.052..0.052 rows=0
> loops=1)
>Index Cond: (parcel_id_code = 53030)

It looks like your index is bloated. Have you had a lot of
updates/deletes on rows with exit_state is null?

Try to reindex tbl_tracker_performance_1_idx.

To reindex it without locks create a new index with temporary name
concurrently, delete the old one and rename the new one using the old
name.

--
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the query planner use two full indexes, when a dedicated partial index exists?

2012-12-19 Thread Sergey Konoplev
On Wed, Dec 19, 2012 at 3:49 PM, Richard Neill  wrote:
> * The reindex solution doesn't work. I just tried it, and the query planner
> is still using the wrong indexes.

Can you show the explain analyze with tbl_tracker_performance_1_idx
straight after reindex (eg. before it has been bloated again)?

> * If the tbl_tracker_performance_1_idx had indeed become bloated, wouldn't
> that have meant that when the query planner was forced to use it (by
> deleting the alternative indexes), it would have been slow?

It is hard to say. There might be a bloating threshold after with it
will be slow. Also it depends on the index column values.

--
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance after update from SLES11 SP1 to SP2

2013-02-21 Thread Sergey Konoplev
On Thu, Feb 21, 2013 at 1:59 AM, Mark Smith  wrote:
> Software: SLES 11 SP2 3.0.58-0.6.2-default x86_64, PostgreSQL 9.0.4.

[skipped]

> Problem: We have been running PostgreSQL 9.0.4 on SLES11 SP1, last kernel in
> use was 2.6.32-43-0.4, performance has always been great. Since updating
> from SLES11 SP1 to SP2 we now experience many database 'stalls' (e.g.
> normally 'instant' queries taking many seconds, any query will be slow, just
> connecting to the database will be slow).

It reminds me a transparent huge pages defragmentation issue that was
found in recent kernels.

Transparent huge pages defragmentation could lead to unpredictable
database stalls on some Linux kernels. The recommended settings for
this are below.

db1: ~ # echo always > /sys/kernel/mm/transparent_hugepage/enabled
db1: ~ # echo madvise > /sys/kernel/mm/transparent_hugepage/defrag

I am collecting recommendations for DB server configuration by the
link below. Try to look at it also if the above wont help.

http://code.google.com/p/pgcookbook/wiki/Database_Server_Configuration

> We have trialled PostgreSQL 9.2.3
> under SLES11 SP2 with the exact same results. During these periods the
> machine is completely responsive but anything accessing the database is
> extremely slow.
>
> I have tried increasing sched_migration_cost from 50 to 500 and also
> tried setting sched_compat_yield to 1, neither of these appeared to make a
> difference. I don't have the parameter 'sched_autogroup_enabled'. Nothing
> jumps out from top/iostat/sar/pg_stat_activity however I am very far from
> expert in interpreting their output
>
> We have work underway to reduce our number of connections as although it has
> always worked ok, perhaps it makes us particularly vulnerable to
> kernel/scheduler changes.
>
> I would be very grateful for any suggestions as to the best way to diagnose
> the source of this problem and/or general recommendations?



--
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] hardware upgrade, performance degrade?

2013-03-04 Thread Sergey Konoplev
On Fri, Mar 1, 2013 at 1:52 AM, Steven Crandell
 wrote:
> As far as we were able to gather in the frantic moments of downtime,
> hundreds of queries were hanging up while trying to COMMIT.  This in turn
> caused new queries backup as they waited for locks and so on.
>
> Given that we're dealing with new hardware and the fact that this still acts
> a lot like a NUMA issue, are there other settings we should be adjusting to
> deal with possible performance problems associated with NUMA?
>
> Does this sound like something else entirely?

It does. I collected a number of kernel (and not only) tuning issues
with short explanations to prevent it from affecting database behavior
badly. Try to follow them:

https://code.google.com/p/pgcookbook/wiki/Database_Server_Configuration

--
Sergey Konoplev
Database and Software Architect
http://www.linkedin.com/in/grayhemp

Phones:
USA +1 415 867 9984
Russia, Moscow +7 901 903 0499
Russia, Krasnodar +7 988 888 1979

Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow CTE Query

2013-05-18 Thread Sergey Konoplev
On Sat, May 18, 2013 at 12:54 PM, Stefan Keller  wrote:
> I'm experiencing a very slow CTE query (see below).
>
> When I split the three aggregations into three separate views, its' decent
> fast. So I think it's due to the planner.
>
> Any ideas like reformulating the query?

Rewrite it without CTE. Planner will have more freedom in this case.
Also I would try to use LEFT JOIN ... IS NULL technique instead of NOT
EXISTS.

>
> These are the tables and views involved:
> * Table promotion with start/end date and a region, and table
> promo2mission (each 1 to dozen tupels).
> * View  all_errors (more than 20'000 tubles, based on table errors
> without tupels from table fix)
> * Table error_type (7 tupels)
>
> Here's the EXPLAIN ANALYZE log: http://explain.depesz.com/s/tbF
>
> Yours, Stefan
>
>
> CTE Query:
>
> WITH aggregation1
>  AS (SELECT p.id   AS promo_id,
> p.startdate,
> p.enddate,
> p.geom AS promogeom,
> pm.error_type,
> pm.mission_extra_coins AS extra_coins
>  FROM   (promotion p
>  join promo2mission pm
>ON (( p.id = pm.promo_id )))
>  WHERE  ( ( p.startdate <= Now() )
>   AND ( p.enddate >= Now() ) )),
>  aggregation2
>  AS (SELECT e.error_id AS missionid,
> e.schemaid,
> t.TYPE,
> e.osm_id,
> e.osm_type,
> t.description  AS title,
> t.view_type,
> t.answer_placeholder,
> t.bug_question AS description,
> t.fix_koin_count,
> t.vote_koin_count,
> e.latitude,
> e.longitude,
> e.geom AS missiongeom,
> e.txt1,
> e.txt2,
> e.txt3,
> e.txt4,
> e.txt5
>  FROM   all_errors e,
> error_type t
>  WHERE  ( ( e.error_type_id = t.error_type_id )
>   AND ( NOT ( EXISTS (SELECT 1
>   FROM   fix f
>   WHERE  ( ( ( ( f.error_id = e.error_id )
>AND ( f.osm_id =
> e.osm_id ) )
>  AND ( ( f.schemaid ) :: text 
> =
>  ( e.schemaid ) :: text ) 
> )
>AND ( ( f.complete
>AND f.valid )
>   OR ( NOT
> f.complete ) ) )) ) ) )),
>  aggregation3
>  AS (SELECT ag2.missionid AS missionidtemp,
> ag1.promo_id,
> ag1.extra_coins
>  FROM   (aggregation2 ag2
>  join aggregation1 ag1
>ON (( ( ag2.TYPE ) :: text = ( ag1.error_type ) :: text )))
>  WHERE  public._st_contains(ag1.promogeom, ag2.missiongeom))
> SELECT ag2.missionid AS id,
>ag2.schemaid,
>ag2.TYPE,
>ag2.osm_id,
>ag2.osm_type,
>ag2.title,
>ag2.description,
>ag2.latitude,
>ag2.longitude,
>ag2.view_type,
>ag2.answer_placeholder,
>ag2.fix_koin_count,
>ag2.missiongeom,
>ag2.txt1,
>ag2.txt2,
>ag2.txt3,
>ag2.txt4,
>ag2.txt5,
>ag3.promo_id,
>ag3.extra_coins
> FROM   (aggregation2 ag2
> left join aggregation3 ag3
>ON (( ag2.missionid = ag3.missionidtemp )));
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance



--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Advice on tuning slow query

2013-05-21 Thread Sergey Konoplev
On Tue, May 21, 2013 at 4:16 PM, Samuel Stearns
 wrote:
> Is there anything I can do to get the SELECT to run a little quicker.

Please carefully follow the instruction first
http://wiki.postgresql.org/wiki/Slow_Query_Questions.

I would also suggest to upgrade postgres to the latest version, as it
has a lot of performance improvements.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query performance

2013-06-13 Thread Sergey Konoplev
On Thu, Jun 13, 2013 at 12:49 AM, K P Manoj  wrote:
> One of my query treating performance issue on my production server.
> Once i run  query on my parent table with specific condition(hard coded
> value) its uses  only proper child table and its index on explain plan  ,
> but once i am using table conditions (instead of hard coded value), query
> planner is going all the child tables, Can i know where i am worng

>From the docs:

"Constraint exclusion only works when the query's WHERE clause
contains constants (or externally supplied parameters). For example, a
comparison against a non-immutable function such as CURRENT_TIMESTAMP
cannot be optimized, since the planner cannot know which partition the
function value might fall into at run time."

http://www.postgresql.org/docs/9.2/static/ddl-partitioning.html#DDL-PARTITIONING-CAVEATS

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] seqscan for 100 out of 3M rows, index present

2013-06-26 Thread Sergey Konoplev
On Wed, Jun 26, 2013 at 12:18 PM, Willy-Bas Loos  wrote:
> plan with enable_seqscan off:
>
> Aggregate  (cost=253892.48..253892.49 rows=1 width=0) (actual
> time=208.681..208.681 rows=1 loops=1)
>   ->  Nested Loop  (cost=5.87..253889.49 rows=1198 width=0) (actual
> time=69.403..208.647 rows=17 loops=1)
> ->  Index Scan using geo_blok_idx on geo g  (cost=0.00..1314.43
> rows=500 width=8) (actual time=45.776..46.147 rows=121 loops=1)
>   Index Cond: (blok = 1942)
> ->  Bitmap Heap Scan on bmp_data d  (cost=5.87..502.91 rows=179
> width=8) (actual time=1.340..1.341 rows=0 loops=121)
>   Recheck Cond: (geo_id = g.geo_id)
>   ->  Bitmap Index Scan on bmp_data_geo_idx  (cost=0.00..5.82
> rows=179 width=0) (actual time=1.206..1.206 rows=0 loops=121)
> Index Cond: (geo_id = g.geo_id)
> Total runtime: 208.850 ms
>
> On Wed, Jun 26, 2013 at 9:08 PM, Igor Neyman  wrote:
>> Aggregate  (cost=60836.71..60836.72 rows=1 width=0) (actual
>> time=481.526..481.526 rows=1 loops=1)
>>   ->  Hash Join  (cost=1296.42..60833.75 rows=1184 width=0) (actual
>> time=317.403..481.513 rows=17 loops=1)
>> Hash Cond: (d2.gid = g2.gid)
>> ->  Seq Scan on d2  (cost=0.00..47872.54 rows=3107454 width=8)
>> (actual time=0.013..231.707 rows=3107454 loops=1)
>> ->  Hash  (cost=1290.24..1290.24 rows=494 width=8) (actual
>> time=0.207..0.207 rows=121 loops=1)
>>   Buckets: 1024  Batches: 1  Memory Usage: 5kB
>>   ->  Index Scan using g_blok on g2  (cost=0.00..1290.24
>> rows=494 width=8) (actual time=0.102..0.156 rows=121 loops=1)
>> Index Cond: (k = 1942)
>> Total runtime: 481.600 ms

These are plans of two different queries. Please show the second one
(where d2, g2, etc are) with secscans off.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] how to speed up the index creation in GP?

2013-07-14 Thread Sergey Konoplev
Hi Suya,

I think you should start with it
http://www.postgresql.org/docs/9.2/static/indexes.html.

On Wed, Jul 10, 2013 at 7:03 PM, Huang, Suya  wrote:
> Hi Guys,
>
>
>
> I’m new to Postgresql, we have a Greenplum cluster and need to create many
> indexes on the database. So my question is:
>
>
>
> Is there any performance tips for creating index on Postgres?
>
> how to monitor the progress  the creation process?
>
>
>
> Thanks and best regards,
>
> Suya Huang



-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
Hello, i have a problem with planning time, I do not understand why this can
happen.

PostgreSQL 9.2.2 on x86_64-unknown-linux-gnu, compiled by gcc-4.4.real
(Debian 4.4.5-8) 4.4.5, 64-bit

# explain
# select i.item_id, u.user_id from items i
# left join users u on u.user_id = i.user_id
# where item_id = 169946840;
  QUERY PLAN

--
 Nested Loop Left Join  (cost=0.00..397.14 rows=1 width=16)
   ->  Index Scan using items_item_ux on items i  (cost=0.00..358.84 rows=1
width=16)
 Index Cond: (item_id = 169946840)
   ->  Index Only Scan using users_user_id_pkey on users u
 (cost=0.00..38.30 rows=1 width=8)
 Index Cond: (user_id = i.user_id)

time: 55919.910 ms

# set enable_mergejoin to off;

# explain
select i.item_id, u.user_id from items i
left join users u on u.user_id = i.user_id
where item_id = 169946840;
  QUERY PLAN

--
 Nested Loop Left Join  (cost=0.00..397.14 rows=1 width=16)
   ->  Index Scan using items_item_ux on items i  (cost=0.00..358.84 rows=1
width=16)
 Index Cond: (item_id = 169946840)
   ->  Index Only Scan using users_user_id_pkey on users u
 (cost=0.00..38.30 rows=1 width=8)
 Index Cond: (user_id = i.user_id)

time: 28.874 ms

-- 
Sergey Burladyan


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
01.08.2013 14:05 пользователь "Thomas Reiss" 
написал:
>
> If you leave enable_mergejoin to on, what happens if you run the explain
> two time in a row ? Do you get the same planning time ?

Yes, I get the same planning time.


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
On Thu, Aug 1, 2013 at 2:04 PM, Thomas Reiss wrote:

> Le 01/08/2013 11:55, Sergey Burladyan a écrit :
> At first look, this reminds me some catalog bloat issue. Can you provide
> the result of these queries :
> SELECT pg_size_pretty(pg_table_size('pg_class')) AS size_pg_class;
> SELECT pg_size_pretty(pg_table_size('pg_attribute')) AS size_pg_attribute;
>

SELECT pg_size_pretty(pg_table_size('pg_class')) AS size_pg_class; --- '16
MB'
SELECT pg_size_pretty(pg_table_size('pg_attribute')) AS size_pg_attribute;
--- '63 MB'

-- 
Sergey Burladyan


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
I find another query with big planning time:
explain select * from xview.user_items_v v where ( v.item_id = 132358330 );
QUERY PLAN

---
 Nested Loop Left Join  (cost=0.00..363.28 rows=1 width=44)
   Join Filter: (ief.item_id = ix.item_id)
   ->  Index Scan using items_item_ux on items ix  (cost=0.00..359.20
rows=1 width=36)
 Index Cond: (item_id = 132358330)
 Filter: ((xa_txtime IS NULL) AND (user_id > 0) AND (status_id <
20))
   ->  Index Scan using item_enabled_flags_item_id_idx on
item_enabled_flags ief  (cost=0.00..4.06 rows=1 width=8)
 Index Cond: (item_id = 132358330)
(7 rows)

Time: 44037.758 ms

looks like planning algorithm hang on 'items' table statistics. Setting
enable_mergejoin to off does not help with this query.

-- 
Sergey Burladyan


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
Sergey Burladyan  writes:

> # explain
> # select i.item_id, u.user_id from items i
> # left join users u on u.user_id = i.user_id
> # where item_id = 169946840;
> QUERY PLAN 
> --
> Nested Loop Left Join (cost=0.00..397.14 rows=1 width=16)
> -> Index Scan using items_item_ux on items i (cost=0.00..358.84 rows=1 
> width=16)
> Index Cond: (item_id = 169946840)
> -> Index Only Scan using users_user_id_pkey on users u (cost=0.00..38.30 
> rows=1 width=8)
> Index Cond: (user_id = i.user_id)
>
> time: 55919.910 ms

While running this EXPLAIN backend use disk for a long time:
 TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IO>COMMAND   

 
21638 be/4 postgres2.10 M/s9.45 M/s  0.00 % 69.04 % postgres: postgres 
x xxx.xxx.xxx.xxx(50987) EXPLAIN

Why it read and write to disk 10 megabytes per second for EXPLAIN query? Cannot 
understand what is going on here :(



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
Jeff Janes  writes:

> I'd use strace to find what file handle is being read and written, and
> lsof to figure out what file that is.

I use strace, it is more read then write:
$ cut -d '(' -f 1 /var/tmp/pg.trace | sort | uniq -c | sort -n
 49 select
708 close
   1021 open
   7356 write
 212744 read
 219650 lseek

top reads:
7859 read(150 open("base/16444/17685.129", O_RDWR|O_CREAT, 0600) = 150
9513 read(149 open("base/16444/17685.128", O_RDWR|O_CREAT, 0600) = 149
10529 read(151 open("base/16444/17685.130", O_RDWR|O_CREAT, 0600) = 151
12155 read(152 open("base/16444/17685.131", O_RDWR|O_CREAT, 0600) = 152
12768 read(154 open("base/16444/17685.133", O_RDWR|O_CREAT, 0600) = 154
16210 read(153 open("base/16444/17685.132", O_RDWR|O_CREAT, 0600) = 153

it is 'items' table:
select relname from pg_class where relfilenode = 17685;
 relname 
-
 items

each read is 8192 bytes, so for EXPLAIN query with two simple index scan, 
*without* ANALYZE postgres
read (7859 + 9513 + 10529 + 12155 + 12768 + 16210) * 8192 = 565 526 528 bytes 
from it.

> It looks like it is more write than read, which does seem strange.

Why it read something for simple EXPLAIN, without real executing query? :-)

> Any chance you can create a self-contained test case?

I think I cannot do this, it is ~1 Tb heavily load database. This is at standby 
server.

PS: two strace for quick and slow explain:

explain
select i.item_id from items i
where item_id = 169946840

$ cut -d '(' -f 1 /var/tmp/pg-all-normal.trace | sort | uniq -c
313 lseek
308 open
  2 read
 13 recvfrom
  6 sendto

explain
select i.item_id, u.user_id from items i
left join users u on u.user_id = i.user_id
where item_id = 169946840

$ cut -d '(' -f 1 /var/tmp/pg-all-slow.trace | sort | uniq -c
963 close
  1 fsync
5093393 lseek
925 open
6004995 read
 14 recvfrom
  1 rt_sigreturn
  9 select
   4361 semop
  7 sendto
  1 --- SIGUSR1 
 685605 write



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
Jeff Janes  writes:

> I think the next step would be to run gdb -p  (but don't start
> gdb until backend is in the middle of a slow explain), then:

Sorry, I am lack debug symbols, so call trace is incomplete:

explain select i.item_id, u.user_id from items i left join users u on u.user_id 
= i.user_id where item_id = 169946840

#0  0x7ff766967620 in read () from /lib/libc.so.6
#1  0x7ff7689cfc25 in FileRead ()
#2  0x7ff7689ea2f6 in mdread ()
#3  0x7ff7689cc473 in ?? ()
#4  0x7ff7689ccf54 in ReadBufferExtended ()
#5  0x7ff7688050ca in index_fetch_heap ()
#6  0x7ff76880523e in index_getnext ()
#7  0x7ff768a63306 in ?? ()
#8  0x7ff768a67624 in ?? ()
#9  0x7ff768a67d9c in ?? ()
#10 0x7ff768a68376 in mergejoinscansel ()
#11 0x7ff76896faa6 in initial_cost_mergejoin ()
#12 0x7ff768977695 in ?? ()
#13 0x7ff76897816c in add_paths_to_joinrel ()
#14 0x7ff76897981b in make_join_rel ()
#15 0x7ff768979ac9 in join_search_one_level ()
#16 0x7ff76896a3ab in standard_join_search ()
#17 0x7ff7689837c1 in query_planner ()
#18 0x7ff768985260 in ?? ()
#19 0x7ff7689870a9 in subquery_planner ()
#20 0x7ff76898736e in standard_planner ()
#21 0x7ff7689ef3ce in pg_plan_query ()
#22 0x7ff7688c94a3 in ?? ()
#23 0x7ff7688c9809 in ExplainQuery ()
#24 0x7ff7648095e2 in ?? () from 
/usr/lib/postgresql/9.2/lib/pg_stat_statements.so
#25 0x7ff7689f1f27 in ?? ()
#26 0x7ff7689f3295 in ?? ()
#27 0x7ff7689f388f in PortalRun ()
#28 0x7ff7689ef96d in ?? ()
#29 0x7ff7689f0950 in PostgresMain ()
#30 0x7ff7689aa7a3 in ?? ()
#31 0x7ff7689ad73c in PostmasterMain ()
#32 0x7ff768948e4b in main ()

#0  0x7ff766973950 in lseek64 () from /lib/libc.so.6
#1  0x7ff7689cf88d in FileSeek ()
#2  0x7ff7689ea09c in mdwrite ()
#3  0x7ff7689cb12f in ?? ()
#4  0x7ff7689cca43 in ?? ()
#5  0x7ff7689ccf54 in ReadBufferExtended ()
#6  0x7ff7688050ca in index_fetch_heap ()
#7  0x7ff76880523e in index_getnext ()
#8  0x7ff768a63306 in ?? ()
#9  0x7ff768a67624 in ?? ()
#10 0x7ff768a67d9c in ?? ()
#11 0x7ff768a68376 in mergejoinscansel ()
#12 0x7ff76896faa6 in initial_cost_mergejoin ()
#13 0x7ff768977695 in ?? ()
#14 0x7ff76897816c in add_paths_to_joinrel ()
#15 0x7ff76897981b in make_join_rel ()
#16 0x7ff768979ac9 in join_search_one_level ()
#17 0x7ff76896a3ab in standard_join_search ()
#18 0x7ff7689837c1 in query_planner ()
#19 0x7ff768985260 in ?? ()
#20 0x7ff7689870a9 in subquery_planner ()
#21 0x7ff76898736e in standard_planner ()
#22 0x7ff7689ef3ce in pg_plan_query ()
#23 0x7ff7688c94a3 in ?? ()
#24 0x7ff7688c9809 in ExplainQuery ()
#25 0x7ff7648095e2 in ?? () from 
/usr/lib/postgresql/9.2/lib/pg_stat_statements.so
#26 0x7ff7689f1f27 in ?? ()
#27 0x7ff7689f3295 in ?? ()
#28 0x7ff7689f388f in PortalRun ()
#29 0x7ff7689ef96d in ?? ()
#30 0x7ff7689f0950 in PostgresMain ()
#31 0x7ff7689aa7a3 in ?? ()
#32 0x7ff7689ad73c in PostmasterMain ()
#33 0x7ff768948e4b in main ()

#0  0x7ff766973950 in lseek64 () from /lib/libc.so.6
#1  0x7ff7689cf88d in FileSeek ()
#2  0x7ff7689ea2b9 in mdread ()
#3  0x7ff7689cc473 in ?? ()
#4  0x7ff7689ccf54 in ReadBufferExtended ()
#5  0x7ff7688050ca in index_fetch_heap ()
#6  0x7ff76880523e in index_getnext ()
#7  0x7ff768a63306 in ?? ()
#8  0x7ff768a67624 in ?? ()
#9  0x7ff768a67d9c in ?? ()
#10 0x7ff768a68376 in mergejoinscansel ()
#11 0x7ff76896faa6 in initial_cost_mergejoin ()
#12 0x7ff768977695 in ?? ()
#13 0x7ff76897816c in add_paths_to_joinrel ()
#14 0x7ff76897981b in make_join_rel ()
#15 0x7ff768979ac9 in join_search_one_level ()
#16 0x7ff76896a3ab in standard_join_search ()
#17 0x7ff7689837c1 in query_planner ()
#18 0x7ff768985260 in ?? ()
#19 0x7ff7689870a9 in subquery_planner ()
#20 0x7ff76898736e in standard_planner ()
#21 0x7ff7689ef3ce in pg_plan_query ()
#22 0x7ff7688c94a3 in ?? ()
#23 0x7ff7688c9809 in ExplainQuery ()
#24 0x7ff7648095e2 in ?? () from 
/usr/lib/postgresql/9.2/lib/pg_stat_statements.so
#25 0x7ff7689f1f27 in ?? ()
#26 0x7ff7689f3295 in ?? ()
#27 0x7ff7689f388f in PortalRun ()
#28 0x7ff7689ef96d in ?? ()
#29 0x7ff7689f0950 in PostgresMain ()
#30 0x7ff7689aa7a3 in ?? ()
#31 0x7ff7689ad73c in PostmasterMain ()
#32 0x7ff768948e4b in main ()


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-01 Thread Sergey Burladyan
I also find this trace for other query:
explain select * from xview.user_items_v v where ( v.item_id = 132358330 );

#0  0x7ff766967620 in read () from /lib/libc.so.6
#1  0x7ff7689cfc25 in FileRead ()
#2  0x7ff7689ea2f6 in mdread ()
#3  0x7ff7689cc473 in ?? ()
#4  0x7ff7689ccf54 in ReadBufferExtended ()
#5  0x7ff7688050ca in index_fetch_heap ()
#6  0x7ff76880523e in index_getnext ()
#7  0x7ff768a63306 in ?? ()
#8  0x7ff768a67624 in ?? ()
#9  0x7ff768a67d9c in ?? ()
#10 0x7ff768a688fc in scalargtsel ()
#11 0x7ff768ac5211 in OidFunctionCall4Coll ()
#12 0x7ff768998ce5 in restriction_selectivity ()
#13 0x7ff76896c71e in clause_selectivity ()
#14 0x7ff76896bf60 in clauselist_selectivity ()
#15 0x7ff76896ddfd in set_baserel_size_estimates ()
#16 0x7ff76896abf2 in ?? ()
#17 0x7ff76896bc97 in make_one_rel ()
#18 0x7ff7689837c1 in query_planner ()
#19 0x7ff768985260 in ?? ()
#20 0x7ff7689870a9 in subquery_planner ()
#21 0x7ff76898736e in standard_planner ()
#22 0x7ff7689ef3ce in pg_plan_query ()
#23 0x7ff7688c94a3 in ?? ()
#24 0x7ff7688c9809 in ExplainQuery ()
#25 0x7ff7648095e2 in ?? () from 
/usr/lib/postgresql/9.2/lib/pg_stat_statements.so
#26 0x7ff7689f1f27 in ?? ()
#27 0x7ff7689f3295 in ?? ()
#28 0x7ff7689f388f in PortalRun ()
#29 0x7ff7689ef96d in ?? ()
#30 0x7ff7689f0950 in PostgresMain ()
#31 0x7ff7689aa7a3 in ?? ()
#32 0x7ff7689ad73c in PostmasterMain ()
#33 0x7ff768948e4b in main ()

I see two code paths:
#6  0x7ff76880523e in index_getnext ()
...
#9  0x7ff768a67d9c in ?? ()
#10 0x7ff768a688fc in scalargtsel ()
...

and 

#6  0x7ff76880523e in index_getnext ()
...
#9  0x7ff768a67d9c in ?? ()
#10 0x7ff768a68376 in mergejoinscansel ()
...

If I not mistaken, may be two code paths like this here:
(1) mergejoinscansel -> scalarineqsel-> ineq_histogram_selectivity -> 
get_actual_variable_range -> index_getnext
(2) scalargtsel -> scalarineqsel -> ineq_histogram_selectivity -> 
get_actual_variable_range -> index_getnext

And may be get_actual_variable_range() function is too expensive for
call with my bloated table items with bloated index items_user_id_idx on it?



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-02 Thread Sergey Burladyan
Tom Lane  writes:

> Jeff Janes  writes:
> > On Thu, Aug 1, 2013 at 5:16 PM, Sergey Burladyan  
> > wrote:
> >> If I not mistaken, may be two code paths like this here:
> >> (1) mergejoinscansel -> scalarineqsel-> ineq_histogram_selectivity -> 
> >> get_actual_variable_range -> index_getnext
> >> (2) scalargtsel -> scalarineqsel -> ineq_histogram_selectivity -> 
> >> get_actual_variable_range -> index_getnext
>
> > Yeah, I think you are correct.
>
> mergejoinscansel does *not* call scalarineqsel, nor get_actual_variable_range.
> It calls get_variable_range, which only looks at the pg_statistic
> entries.

Hmm, I speak about 9.2.2 but in current HEAD this call still exist,
please see: http://doxygen.postgresql.org/selfuncs_8c_source.html#l02976

> I think we need to see the actual stack traces, not incomplete versions.
> It's possible that the situation here involves bloat in pg_statistic, but
> we're just leaping to conclusions if we assume that that's where the index
> fetches are occurring.

I found debug symbols and send stack trace to mail list, but it blocked
by size, try again with zip



gdb.log.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-02 Thread Sergey Burladyan
Jeff Janes  writes:

> On Fri, Aug 2, 2013 at 2:58 AM, Sergey Burladyan  wrote:
> >
> > PS: I think my main problem is here:
> > select min(user_id) from items;
> >  min
> > -
> >1
> > (1 row)
> >
> > Time: 504.520 ms
>
> That is a long time, but still 100 fold less than the planner is taking.
>
> What about max(user_id)?

max is good, only rows with user_id = 0 was updated:

select max(user_id) from items;
Time: 59.646 ms

> > also, i cannot reindex it concurrently now, because it run autovacuum: 
> > VACUUM ANALYZE public.items (to prevent wraparound)
>
> That is going to take a long time if you have the cost settings at
> their defaults.

Yes, I have custom setting, more slow, it will last about a week.

> But why is it bloated in this way? 

Don't known. It has been updated many items last week. ~ 10% of table.

> It must be visiting many thousands of dead/invisible rows before
> finding the first visible one.  But, Btree index have a mechanism to
> remove dead tuples from indexes, so it doesn't follow them over and
> over again (see "kill_prior_tuple").  So is that mechanism not
> working, or are the tuples not dead but just invisible (i.e. inserted
> by a still open transaction)?

It is deleted, but VACUUM still not completed.

BTW, it is standby server, and it query plan (block read) is very
different from master:

Hot standby:

explain (analyze,verbose,buffers) select min(user_id) from items;

'Result  (cost=0.12..0.13 rows=1 width=0) (actual time=56064.514..56064.514 
rows=1 loops=1)'
'  Output: $0'
'  Buffers: shared hit=3694164 read=6591224 written=121652'
'  InitPlan 1 (returns $0)'
'->  Limit  (cost=0.00..0.12 rows=1 width=8) (actual 
time=56064.502..56064.503 rows=1 loops=1)'
'  Output: public.items.user_id'
'  Buffers: shared hit=3694164 read=6591224 written=121652'
'  ->  Index Only Scan using items_user_id_idx on public.items  
(cost=0.00..24165743.48 rows=200673143 width=8) (actual 
time=56064.499..56064.499 rows=1 loops=1)'
'Output: public.items.user_id'
'Index Cond: (public.items.user_id IS NOT NULL)'
'Heap Fetches: 8256426'
'Buffers: shared hit=3694164 read=6591224 written=121652'
'Total runtime: 56064.571 ms'

Master:

'Result  (cost=0.12..0.13 rows=1 width=0) (actual time=202.759..202.759 rows=1 
loops=1)'
'  Output: $0'
'  Buffers: shared hit=153577 read=1'
'  InitPlan 1 (returns $0)'
'->  Limit  (cost=0.00..0.12 rows=1 width=8) (actual time=202.756..202.757 
rows=1 loops=1)'
'  Output: public.items.user_id'
'  Buffers: shared hit=153577 read=1'
'  ->  Index Only Scan using items_user_id_idx on public.items  
(cost=0.00..24166856.02 rows=200680528 width=8) (actual time=202.756..202.756 
rows=1 loops=1)'
'Output: public.items.user_id'
'Index Cond: (public.items.user_id IS NOT NULL)'
'Heap Fetches: 0'
'Buffers: shared hit=153577 read=1'
'Total runtime: 202.786 ms'

And from backup, before index|heap bloated :)

 Result  (cost=0.87..0.88 rows=1 width=0) (actual time=16.002..16.003 rows=1 
loops=1)
   Output: $0
   Buffers: shared hit=3 read=4
   InitPlan 1 (returns $0)
 ->  Limit  (cost=0.00..0.87 rows=1 width=8) (actual time=15.993..15.995 
rows=1 loops=1)
   Output: public.items.user_id
   Buffers: shared hit=3 read=4
   ->  Index Only Scan using items_user_id_idx on public.items  
(cost=0.00..169143085.72 rows=193309210 width=8) (actual time=15.987..15.987 
rows=1 loops=1)
 Output: public.items.user_id
 Index Cond: (public.items.user_id IS NOT NULL)
 Heap Fetches: 1
 Buffers: shared hit=3 read=4
 Total runtime: 16.057 ms


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Looks like merge join planning time is too big, 55 seconds

2013-08-02 Thread Sergey Burladyan
Sergey Burladyan  writes:

> Hot standby:
...
> '  ->  Index Only Scan using items_user_id_idx on public.items  
> (cost=0.00..24165743.48 rows=200673143 width=8) (actual 
> time=56064.499..56064.499 rows=1 loops=1)'
> 'Output: public.items.user_id'
> 'Index Cond: (public.items.user_id IS NOT NULL)'
> 'Heap Fetches: 8256426'
> 'Buffers: shared hit=3694164 read=6591224 written=121652'
> 'Total runtime: 56064.571 ms'
>
> Master:
>
...
> '  ->  Index Only Scan using items_user_id_idx on public.items  
> (cost=0.00..24166856.02 rows=200680528 width=8) (actual time=202.756..202.756 
> rows=1 loops=1)'
> 'Output: public.items.user_id'
> 'Index Cond: (public.items.user_id IS NOT NULL)'
> 'Heap Fetches: 0'
> 'Buffers: shared hit=153577 read=1'
> 'Total runtime: 202.786 ms'

Looks like visibility map is not replicated into slave somehow?

If it matters, Master was restarted yesterday, Standby was not.



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ORDER BY, LIMIT and indexes

2013-08-05 Thread Sergey Konoplev
On Mon, Aug 5, 2013 at 6:22 PM, Josh Berkus  wrote:
>> Or, more generally, is there some set of circumstances under which the
>> catastrophic scenario will happen?
>
> Yes:
>
> SELECT * FROM table ORDER BY id DESC LIMIT 10 OFFSET 10
>
> This is the "high offset" problem, and affects all databases which
> support applications with paginated results, including non-relational
> ones like SOLR.  The basic problem is that you can't figure out what is
> OFFSET 10 without first sorting the first 10 results.
>
> The easiest solution is to limit the number of pages your users can
> "flip through".  Generally anyone asking for page 10,000 is a bot
> screen-scraping your site, anyway.

In addition to Josh's answer I would like to mention that it might be
worth to use partial index like this

CREATE INDEX i_table_id_active ON table (is) WHERE active

in this particular case

SELECT * FROM table
WHERE active
ORDER BY id DESC
LIMIT 10 OFFSET 10

so it will prevent from long filtering tons of rows in case of long
"NOT active" gaps in the beginning of the scanning sequence.

As an alternative solution for pagination (OFFSET) problem you might
also use the "prev/next" technique, like

SELECT * FROM table
WHERE id > :current_last_id
ORDER BY id LIMIT 10

for "next", and

SELECT * FROM (
SELECT * FROM table
WHERE id < :current_first_id
ORDER BY id DESC
LIMIT 10
) AS sq ORDER BY id

for "prev". It will be very fast.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ORDER BY, LIMIT and indexes

2013-08-06 Thread Sergey Konoplev
On Mon, Aug 5, 2013 at 6:54 PM, David Johnston  wrote:
> Curious how much slower/faster these queries would run if you added:
>
> SELECT *, first_value(id) OVER (...), last_value(id) OVER (...)
> --note the window specifications need to overcome the "ORDER BY" limitation
> noted in the documentation.

To be honest I can not understand how are you going to specify partition here.

Or you are talking about wrapping the original query like this

SELECT *, first_value(id) OVER (), last_value(id) OVER () FROM (
SELECT * FROM table
WHERE id > :current_last_id
ORDER BY id LIMIT 10
) AS sq2;

?

However, in this case using min()/max() instead of
fist_value()/last_value() will be faster as it does not require to do
additional scan on subquery results.

In general I do not think it would be much slower if we are not
talking about thousands of results on one page.


-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] ORDER BY, LIMIT and indexes

2013-08-06 Thread Sergey Konoplev
On Tue, Aug 6, 2013 at 3:46 AM, Ivan Voras  wrote:
> Here are two more unexpected results. Same test table (1 mil. records,
> "id" is SERIAL PRIMARY KEY, PostgreSQL 9.1, VACUUM ANALYZE performed
> before the experiments):
>
> ivoras=# explain analyze select * from lt where id > 90 limit 10;
>QUERY PLAN
> 
>  Limit  (cost=0.00..1.71 rows=10 width=9) (actual
> time=142.669..142.680 rows=10 loops=1)
>->  Seq Scan on lt  (cost=0.00..17402.00 rows=101630 width=9)
> (actual time=142.665..142.672 rows=10 loops=1)
>  Filter: (id > 90)
>  Total runtime: 142.735 ms
> (4 rows)

[skipped]

> ivoras=# set enable_seqscan to off;
> SET
> ivoras=# explain analyze select * from lt where id > 90 limit 10;
>  QUERY PLAN
> 
>  Limit  (cost=0.00..1.74 rows=10 width=9) (actual time=0.081..0.112
> rows=10 loops=1)
>->  Index Scan using lt_pkey on lt  (cost=0.00..17644.17
> rows=101630 width=9) (actual time=0.078..0.100 rows=10 loops=1)
>  Index Cond: (id > 90)
>  Total runtime: 0.175 ms
> (4 rows)
>
> It looks like the problem is in the difference between what the
> planner expects and what the Filter or Index operations deliver:
> (cost=0.00..17402.00 rows=101630 width=9) (actual
> time=142.665..142.672 rows=10 loops=1).

This might be caused by not accurate random_page_cost setting. This
parameter gives planner a hint of how much it would cost to perform a
random page read used by index scans. It looks like you need to
decrease random_page_cost.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

Profile: http://www.linkedin.com/in/grayhemp
Phone: USA +1 (415) 867-9984, Russia +7 (901) 903-0499, +7 (988) 888-1979
Skype: gray-hemp
Jabber: gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to investiage slow insert problem

2013-08-19 Thread Sergey Konoplev
On Mon, Aug 19, 2013 at 6:44 PM, Rural Hunter  wrote:
> I'm on 9.2.4 with Ubuntu server. There are usually hundereds of connections
> doing the same insert with different data from different networks every
> minute, through pgbouncer in the same network of the database server. The
> database has been running for about one year without problem. Yesterday I
> got a problem that the connection count limit of the database server is
> reached. I checked the connections and found that there are many inserts
> hanging there. I checked the load(cpu,memory,io) of the db server but seems
> everything is fine. I also checked pg log and I only found there are one
> "incomplete message from client" error message every several minute. The I
> recycled pgbouncer and kept monitoring the connections. I found the majority
> of the inserts finish quickly but every minute there are several inserts
> left and seems hanging there . So after a while, the connection limit is
> reached again. Besides those inserts, there are no other long run queries
> and auto vacuums. I also checked the locks of the inserts and found they
> were all granted. The insert statement itself is very simple and it only
> inserts one row but there are some triggers involved. They might impact the
> performance but I have never experience any since the majority of the
> inserts are fine. The problem persisted about 1-2 hours. I didn't do
> anything except recycling pgbouncer a few times. After that period,
> everything goes back to normal. It's has been 24 hours and it didn't happen
> again.
>
> From the error message in pg log, I supect it might be the network problem
> from some clients. Could anyone point out if there are other possible
> causes? I'm also wondering what those inserts are doing actually when they
> are hanging there, such as if they are in the trigger or not. Anything I can
> get similar with the connection snapshots in db2?

What do you mean by recycling pgbouncer?

Haven't you noticed what was in the state column of the
pg_state_activity view? In 9.2 the query column in this view shows the
last statement that was executed in this connection, and it does not
mean that this statement is working at the moment of monitoring. If
the state is active, than it was working, however, my assumption is
that it was IDLE in transaction. You mentioned the "incomplete message
from client" error, so it might somehow be a network problem that led
to a hunging connection to pgbouncer, that made pgbouncer kept a
connection to postgres after transaction was started.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to investiage slow insert problem

2013-08-19 Thread Sergey Konoplev
On Mon, Aug 19, 2013 at 7:45 PM, Rural Hunter  wrote:
>> You mentioned the "incomplete message from client" error, so it might
>> somehow be a network problem that led to a hunging connection to pgbouncer,
>> that made pgbouncer kept a connection to postgres after transaction was
>> started.
>
> pgbouncer and the db server are in the same local network and there
> shouldn't be any network problem between them. I also ran ping from
> pgbouncer server to the db server and there was no problem.

Next time, when you face this again, set log_min_duration_statement to
the value less that the age of hunging inserts and debug_print_parse,
debug_print_rewritten, debug_print_plan and debug_pretty_print to
'on'. It will allow you to log what is happening with these inserts
and what takes so many time.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgresql recommendation memory

2013-11-10 Thread Sergey Konoplev
On Sat, Nov 2, 2013 at 11:54 AM, Евгений Селявка  wrote:
> DB size is about 20GB. There is no high write activity on DB. But
> periodically in postgresql log i see for example: "select 1" duration is
> about 500-1000 ms.
>
> In this period of time response time from db terribly. This period of time
> not bound with high traffic. It is not other app on the server. There is not
> specific cron job on server.

Have you shown all the modified kernel settings? Don't you use huge
pages accidentally? It might be a transparent huge pages
defragmentation issue, the symptoms look similar.

Another thing that might cause it is network. Try to monitor it at the
time of these stalls.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgresql recommendation memory

2013-11-11 Thread Sergey Konoplev
On Sun, Nov 10, 2013 at 11:48 PM, Евгений Селявка
 wrote:
> Sergey, yes this is all of my kernel setting. I don't use THP intentionally. 
> I think that i need a special library to use THP with postgresql like this 
> http://code.google.com/p/pgcookbook/wiki/Database_Server_Configuration. This 
> is my values for this kernel settings:

Then it is definitely not THP.

ps. BTW, pgcookbook has been moved to GitHub several weeks ago
https://github.com/grayhemp/pgcookbook.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgresql recommendation memory

2013-11-11 Thread Sergey Konoplev
On Mon, Nov 11, 2013 at 8:14 AM, Scott Marlowe  wrote:
> well you can hopefully reduce connections from jdbc pooling then. The
> fact that the connections are idle is good.
>
> The problem you run into is what happens when things go into
> "overload" I.e. when the db server starts to slow down, more of those
> idle connections become not idle. If all 300 are then waiting on the
> db server, it will slow to a crawl and eventually fall over.

+1.

Try to monitor your connections, for example like this

while true; do
echo -n "$(date): "
psql -XAt -c "select count(1) from pg_stat_activity"
sleep 1
done > activity.log

and its correlation with slowdowns.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query in cache

2013-11-18 Thread Sergey Konoplev
2013/11/18 Rogerio Pereira 
> I am need help, about subject "Query cache in  Postgres".
> how is it possible to put sql statements cached in postgres ?
> I did some tests and I can not get even with improved tuning
> parameters in the postgresql.conf.

Are you talking about prepared statements or about query result caching?

If former then you need to look at the PREPARE for execute statement
[1], though it is probably implemented in your data adapter, for
example like it is in DBD::Pg [2]. Also take a look at the pre_prepare
module [3], that can conveniently be used with pgbouncer.

If later then there is an extension named pgmemcache [4] that will
allow you to interact with memcached directly from postgres, so you
could implement cashing in stored functions, for example. However my
advice is to use application level caching with memcached in this
case, not the database level one.

[1] http://www.postgresql.org/docs/9.3/static/sql-prepare.html
[2] http://search.cpan.org/dist/DBD-Pg/Pg.pm#prepare
[3] https://github.com/dimitri/preprepare
[4] https://github.com/ohmu/pgmemcache/

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query - will CLUSTER help?

2013-12-19 Thread Sergey Konoplev
On Thu, Dec 12, 2013 at 9:30 AM, Sev Zaslavsky  wrote:
[...]

> Table rt_h_nbbo contains several hundred million rows.  All rows for a given
> entry_date are appended to this table in an overnight process every night -
> on the order of several million rows per day.

[...]

> I perceive an inefficiency here and I'd like your input as to how to deal
> with it: The end result of the query is 1631 rows which is on the order of
> about a couple hundred Kb of data.  Compare that to the amount of I/O that
> was done: 1634 buffers were loaded, 16Mb per page - that's about 24 Gb of
> data!  Query completed in 21 sec.  I'd like to be able to physically
> re-organize the data on disk so that the data for a given product_id on a
> entry_date is concentrated on a few pages instead of being scattered like I
> see here.

Do you perform a regular cleaning of the table with DELETEs or may be
you use UPDATEs for some another reason?

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow query - will CLUSTER help?

2013-12-19 Thread Sergey Konoplev
On Thu, Dec 19, 2013 at 12:54 PM, Sev Zaslavsky  wrote:
> On 12/19/2013 3:34 PM, Sergey Konoplev wrote:
>> On Thu, Dec 12, 2013 at 9:30 AM, Sev Zaslavsky  wrote:
>>> Table rt_h_nbbo contains several hundred million rows.  All rows for a
>>> given
>>> entry_date are appended to this table in an overnight process every night
>>> -
>>> on the order of several million rows per day.
>>
>> Do you perform a regular cleaning of the table with DELETEs or may be
>> you use UPDATEs for some another reason?
>
> At this point we're neither deleting nor updating the data once written to
> the db.

Than I can see two reasons of the problem:

1. The indexed data is too big and index search is getting worth day by day

I would try to create a partial index for one day and repeat the
EXPLAIN ANALYZE with this day. If there will be some significant
improvements then I would start creating partial indexes for every new
day before it starts and drop them after some time when they became
obsolete.

2. You are limited with IO

I would also suggest you to upgrade your storage in this case.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Recommendations for partitioning?

2013-12-30 Thread Sergey Konoplev
On Fri, Dec 20, 2013 at 7:59 AM, Alvaro Herrera
 wrote:
> Dave Johansen escribió:
>> On Thu, Dec 19, 2013 at 10:27 AM, Scott Marlowe 
>> wrote:
>
>> > That's pretty much it. What I did was to create the new month table
>> > and day tables, alter my triggers to reflect this, then move the data
>> > with insert into / select from query for each old day partition. Then
>> > once their data is moved you can just drop them. Since you changed the
>> > triggers first those tables are no long taking input so it's usually
>> > safe to drop them now.
>>
>> It would be nice if there was just a "move command", but that seems like
>> the type of model that we want and we'll probably move to that.
>
> Eh.  Why can't you just do something like
>
> WITH moved AS (
> DELETE FROM src WHERE ..
> RETURNING *
> ) INSERT INTO dst SELECT * FROM moved;

Avero, I think it could be cheaper to do this like it is shown below, correct?

psql dbname -c 'copy src to stdout' | \
psql dbname -c 'copy dst from stdin; truncate src;'

Dave, in case if you need to archive old partitions to compressed
files out of your database you can use this tool [1]. Consult with the
configuration example [2], look at the ARCHIVE_* parameters.

[1] https://github.com/grayhemp/pgcookbook/blob/master/bin/archive_tables.sh
[2] https://github.com/grayhemp/pgcookbook/blob/master/bin/config.sh.example

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] trick the query optimiser to skip some optimisations

2014-01-30 Thread Sergey Konoplev
On Wed, Jan 29, 2014 at 3:38 PM, Дмитрий Шалашов  wrote:
> I have a table called 'feed'. It's a big table accessed by many types of
> queries, so I have quite a lot of indices on it.
>
> Those that are relevant looks like this:
>
> "feed_user_id_active_id_added_idx" btree (user_id, active_id, added)
> "feed_user_id_added_idx" btree (user_id, added DESC)
> "feed_user_id_added_idx2" btree (user_id, added DESC) WHERE active_id =
> user_id AND type = 1
>
> last one is very small and tailored for the specific query.
> "added" field is timestamp, everything else is integers.
[..]
>  Limit  (cost=0.00..463.18 rows=31 width=50)
>->  Index Scan Backward using feed_user_id_active_id_added_idx on
> user_feed  (cost=0.00..851.66 rows=57 width=50)
>  Index Cond: ((user_id = 7) AND (active_id = 7))
>  Filter: (type = 1)
[...]
> Can I do something here so optimiser would use the feed_user_id_added_idx2
> index? It's around ten times smaller than the 'generic'
> feed_user_id_active_id_added_idx index.
>
> I have PostgreSQL 9.2.6 on Debian.

Could you please show EXPLAIN ANALYZE for both cases, the current one
and with feed_user_id_active_id_added_idx dropped?

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bloated tables and why is vacuum full the only option

2014-02-09 Thread Sergey Konoplev
On Fri, Feb 7, 2014 at 10:47 AM, Claudio Freire  wrote:
> What I'm seeing, though, is not that, but bloat proportional to table
> size (always stuck at about 65% bloat). What's weird, is that vacuum
> full does the trick of reducing table size and bloat back to 0%. I
> haven't had time yet to verify whether it goes back to 65% after
> vacuum full (that will take time, maybe a month).

Try pgcompact, it was designed particularily for such cases like yours
https://github.com/grayhemp/pgtoolkit.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bloated tables and why is vacuum full the only option

2014-02-09 Thread Sergey Konoplev
On Sun, Feb 9, 2014 at 2:58 PM, Claudio Freire  wrote:
> On Sun, Feb 9, 2014 at 7:32 PM, Sergey Konoplev  wrote:
>> Try pgcompact, it was designed particularily for such cases like yours
>> https://github.com/grayhemp/pgtoolkit.
>
> It's a pity that that requires several sequential scans of the tables.
> For my case, that's probably as intrusive as the exclusive locks.

Probably you should run it with --no-pgstattuple if you are talking
about these seq scans. If your tables are not TOASTed then the
approximation method of gathering statistics would work pretty good
for you.

> I noticed I didn't mention, but the tables involved are around 20-50GB in 
> size.

It is not the thing I would worry about. I regularly use it with even
bigger tables.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [BUGS] Very slow query in PostgreSQL 9.3.3

2014-03-13 Thread Sergey Konoplev
On Thu, Mar 13, 2014 at 12:26 PM,   wrote:
> *** Problem Query ***
>
> explain (analyze on, buffers on) Select * from measurement this_
>   where this_.logdate between '2007-12-19
> 23:38:41.22'::timestamp and '2007-12-20 08:01:04.22'::timestamp
> and this_.city_id=25183 order by
> this_.logdate asc, this_.peaktemp asc, this_.unitsales asc limit 1;
>
[...]
>  Total runtime: 51717.639 ms   <--- *** unacceptable ***

Try to create a multi-column index on the partition by (city_id,
logdate). Then run the original query and the query without peaktemp
and nitsales on the order by. Compare the results, and if the first
one will not be satisfying try to add these two columns to the end of
the column list of your multi-column index on the order as they appear
in your query. It should do the trick. If it wont, please, show the
plans.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Sudden crazy high CPU usage

2014-03-31 Thread Sergey Konoplev
On Mon, Mar 31, 2014 at 3:25 AM, Niels Kristian Schjødt
 wrote:
> I'm running postgresql 9.3 on a production server. An hour ago, out of the 
> "blue", I ran into an issue I have never encountered before: my server 
> started to use CPU as crazy. The server is a standard ubuntu 12.04 LTE 
> installation running only Postgres and Redis.
>
> The incident can be seen on the in numbers below:
>
> https://s3-eu-west-1.amazonaws.com/autouncle-public/other/cpu.png

The increase doesn't look so sudden. My guess is that the server got
some new activity. The advice is to setup the statistics collecting
script by the link [1] and review the results for a period of hour or
so. It shows charts of statements by CPU/IO/calls with aggregated
stats, so you could probably find out more than with pure
pg_stat_statements.

[1] 
https://github.com/grayhemp/pgcookbook/blob/master/statement_statistics_collecting_and_reporting.md

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hot standby 9.2.1 PANIC: WAL contains references to invalid pages

2014-04-19 Thread Sergey Konoplev
On Fri, Apr 18, 2014 at 1:23 AM, Vishalakshi Navaneethakrishnan
 wrote:
> if i change the wal level as archive, then this problem will go..? We are
> just using warm stand by. so shall we change the wal_level as archive..? Can
> you please reply this mail as soon as possible?

AFAIK, the problem appears when hot_standby is set on, so you need to
turn it off. Also, take a look at the link below:

http://www.databasesoup.com/2013/12/why-you-need-to-apply-todays-update.html

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best practice question

2014-04-21 Thread Sergey Konoplev
On Mon, Apr 21, 2014 at 6:19 PM, Tory M Blue  wrote:
> I am going to add a new column to a table for modify_date that needs to be
> updated every time the table is updated. Is it better to just update
> application code to set the modify_date to current_time, or create a
> Before-Update trigger on the table that will update the modify_date column
> to current_timestamp when the table is updated? I also have slony in place,
> so the trigger will need to be on master and slave. Slony will take care of
> suppressing it on the slave and enabling in the event of a switchover, but
> it is additional overhead and validation to make sure nothing failed on
> switchover.
>
> So considering that we have slony, is it better to use application code to
> update the modify_date or use a trigger? Is a trigger essentially 2 updates
> to the table? Are there any other risks in using the trigger?

In addition to the David's answer I would like to add the below.

AFAIK Slony does not make any difference here. No, trigger doesn't
mean 2 updates. It supplies its function with a NEW row variable where
you can change necessary columns and return the modified one as a
resulting one. Another risk is the case when you need to update 2
tables on different servers and have their modified_timestamp fields
in sync. Here you need to determine the new value of the column in the
application.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] tsearch2, large data and indexes

2014-04-23 Thread Sergey Konoplev
On Wed, Apr 23, 2014 at 4:08 AM, Ivan Voras  wrote:
> Ok, I found out what is happening, quoting from the documentation:
>
> "GIN indexes are not lossy for standard queries, but their performance
> depends logarithmically on the number of unique words. (However, GIN
> indexes store only the words (lexemes) oftsvector values, and not
> their weight labels. Thus a table row recheck is needed when using a
> query that involves weights.)"
>
> My query doesn't have weights but the tsvector in the table has them -
> I take it this is what is meant by "involves weights."
>
> So... there's really no way for tsearch2 to produce results based on
> the index alone, without recheck? This is... limiting.

My guess is that you could use strip() function [1] to get rid of
weights in your table or, that would probably be better, in your index
only by using expressions in it and in the query, eg.

...USING gin (strip(fts_data))

and

... WHERE strip(fts_data) @@ q

[1] http://www.postgresql.org/docs/9.3/static/textsearch-features.html

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] tsearch2, large data and indexes

2014-04-24 Thread Sergey Konoplev
On Thu, Apr 24, 2014 at 4:34 AM, Heikki Linnakangas
 wrote:
> On 04/24/2014 01:56 AM, Sergey Konoplev wrote:
>> My guess is that you could use strip() function [1] to get rid of
>> weights in your table or, that would probably be better, in your index
>> only by using expressions in it and in the query, eg.
>
> As the docs say, the GIN index does not store the weights. As such, there is
> no need to strip them. A recheck would be necessary if your query needs the
> weights, precisely because the weights are not included in the index.
>
> (In the OP's query, it's the ranking that was causing the detoasting.)

strip() is needed in the index because without it the index expression
wont match one that is in the WHERE block, and the index wont be used.
This way we could probably get rid of the "involves weights" thing,
that causes to "recheck condition", if I interpret the docs correct.

ts_rank(), for its turn, is supposed to be used in the higher node of
the plan, so there is no way for it to affect the query somehow.

But, again, it is just my guess, and it requires testing.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] tsearch2, large data and indexes

2014-04-24 Thread Sergey Konoplev
On Thu, Apr 24, 2014 at 5:34 AM, Ivan Voras  wrote:
> On 24 April 2014 13:34, Heikki Linnakangas  wrote:
>
>> As the docs say, the GIN index does not store the weights. As such, there is
>> no need to strip them. A recheck would be necessary if your query needs the
>> weights, precisely because the weights are not included in the index.
>>
>> (In the OP's query, it's the ranking that was causing the detoasting.)
>
> Thanks!
>
> My problem is that I actually need the ranking. My queries can return
> a large number of documents (tens of thousands) but I usually need
> only the first couple of pages of most relevant results (e.g. 50-100
> records). With PostgreSQL and tsearch2, this means that the tens of
> thousands of documents found via the index are then detoasted and
> ranked.

Heikki, what about the "GIN improvements part 3: ordering in index"
patch, was it committed?

http://www.postgresql.org/message-id/flat/capphfduwvqv5b0xz1dzuqaw29erdculzp2wotfjzdbs7bhp...@mail.gmail.com

Ivan, there is a hope that we could get a more effective FTS solution
that any others I have heard about with this patch.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] help: function failing

2014-10-07 Thread Sergey Konoplev
On Thu, Oct 2, 2014 at 4:00 PM, George Neuner  wrote:
> --- code 
> CREATE OR REPLACE FUNCTION gen_random()
>   RETURNS double precision AS
> $BODY$
> DECLARE
>num   float8 := 0;
>den   float8 := 281474976710655; -- 0x
>bytes bytea[6];
> BEGIN
>-- get random bytes from crypto module
>bytes := ext.gen_random_bytes(6);
>
>-- assemble a double precision value
>num := num + get_byte( bytes, 0 );
>FOR i IN 1..5 LOOP
>   num := num * 256;
>   num := num + get_byte( bytes, i );
>   END LOOP;
>
>   -- normalize value to range 0.0 .. 1.0
>   RETURN num / den;
> END;
> $BODY$
>   LANGUAGE plpgsql VOLATILE;
> --- code 
>
> The error is:
> ERROR: array value must start with "{" or dimension information
> SQL state: 22P02
> Context: PL/pgSQL function gen_random() line 8 at assignment
>
> which, if I'm counting correctly, is
> bytes := ext.gen_random_bytes(6);

Guessing on the name of ext.gen_random_bytes(6) it returns a value
that is incompatible with bytea[] array representation time from time,
so take a closer look at ext.gen_random_bytes() first. You can test
the case using DO block.

> If I comment out that line, it then tells me  get_byte()   is undefined,
> which should be impossible because it's built in.

Feels like somewhere inside ext.gen_random_bytes() you set a
search_path that allows to see get_byte() and the search_path that was
set before the gen_random() call doesn't allow it.

-- 
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (499) 346-7196, +7 (988) 888-1979
gray...@gmail.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Re: [GENERAL] Re: [pgadmin-support] Issue with a hanging apply process on the replica db after vacuum works on primary

2015-03-19 Thread Sergey Shchukin

17.03.2015 13:22, Sergey Shchukin пишет:

05.03.2015 11:25, Jim Nasby пишет:

On 2/27/15 5:11 AM, Sergey Shchukin wrote:


show max_standby_streaming_delay;
  max_standby_streaming_delay
-
  30s


We both need to be more clear about which server we're talking about 
(master or replica).


What are max_standby_streaming_delay and max_standby_archive_delay 
set to *on the replica*?


My hope is that one or both of those is set to somewhere around 8 
minutes on the replica. That would explain everything.


If that's not the case then I suspect what's happening is there's 
something running on the replica that isn't checking for interrupts 
frequently enough. That would also explain it.


When replication hangs, is the replication process using a lot of 
CPU? Or is it just sitting there? What's the process status for the 
replay process show?


Can you get a trace of the replay process on the replica when this is 
happening to see where it's spending all it's time?


How are you generating these log lines?
 Tue Feb 24 15:05:07 MSK 2015 Stream: MASTER-masterdb:79607161592048 
SLAVE:79607161550576 Replay:79607160986064 :: REPLAY 592 KBytes 
(00:00:00.398376 seconds)


Do you see the confl_* fields in pg_stat_database_conflicts on the 
*replica* increasing?


Hi Jim,

max_standby_streaming_delay and max_standby_archive_delay  both are 
30s on master and replica dbs


I don't see any specific or heavy workload during this issue with a 
hanging apply process. Just a normal queries as usual.


But I see an increased disk activity during the time when the apply 
issue is ongoing


DSK |  sdc  |  | *busy 61%*  | read   11511 
|   | write   4534 | KiB/r 46 |  |  
KiB/w  4 | MBr/s  52.78 |   | MBw/s   1.88 |  avq 
1.45 |  |  avio 0.38 ms |
DSK |  sde  |  | *busy 60% * | read   11457 
|   | write   4398 | KiB/r 46 |  |  
KiB/w  4 | MBr/s  51.97 |   | MBw/s   1.83 |  avq 
1.47 |  |  avio 0.38 ms |
DSK |  sdd  |  |*busy 60%*  | read9673 
|   | write   4538 | KiB/r 61 |  |  
KiB/w  4 | MBr/s  58.24 |   | MBw/s   1.88 |  avq 
1.47 |  |  avio 0.42 ms |
DSK |  sdj  |  | *busy 59%*  | read9576 
|   | write   4177 | KiB/r 63 |  |  
KiB/w  4 | MBr/s  59.30 |   | MBw/s   1.75 |  avq 
1.48 |  |  avio 0.43 ms |
DSK |  sdh  |  | *busy 59%*  | read9615 
|   | write   4305 | KiB/r 63 |  |  
KiB/w  4 | MBr/s  59.23 |   | MBw/s   1.80 |  avq 
1.48 |  |  avio 0.42 ms |
DSK |  sdf  |  |*busy 59% * | read9483 
|   | write   4404 | KiB/r 63 |  |  
KiB/w  4 | MBr/s  59.11 |   | MBw/s   1.83 |  avq 
1.47 |  |  avio 0.42 ms |
DSK |  sdi  |  | *busy 59%*  | read   11273 
|   | write   4173 | KiB/r 46 |  |  
KiB/w  4 | MBr/s  51.50 |   | MBw/s   1.75 |  avq 
1.43 |  |  avio 0.38 ms |
DSK |  sdg  |  | *busy 59%*  | read   11406 
|   | write   4297 | KiB/r 46 |  |  
KiB/w  4 | MBr/s  51.66 |   | MBw/s   1.80 |  avq 
1.46 |  |  avio 0.37 ms |


Although it's not seems to be an upper IO limit.

Normally disks are busy at 20-45%

DSK |  sde  |  | busy 29%  | read 6524 
|   | write  14426 | KiB/r 26 |  |  
KiB/w  5 | MBr/s  17.08 |   | MBw/s   7.78 |  avq
10.46 |  |  avio 0.14 ms |
DSK |  sdi  |  | busy 29%  | read 6590 
|   | write  14391 | KiB/r 26 |  |  
KiB/w  5 | MBr/s  17.19 |   | MBw/s   7.76 |  avq 
8.75 |  |  avio 0.14 ms |
DSK |  sdg  |  | busy 29%  | read 6547 
|   | write  14401 | KiB/r 26 |  |  
KiB/w  5 | MBr/s  16.94 |   | MBw/s   7.60 |  avq 
7.28 |  |  avio 0.14 ms |
DSK |  sdc  |  | busy 29%  | read 6835 
|   | write  14283 | KiB/r 27 |  |  
KiB/w  5 | MBr/s  18.08 |   | MBw/s   7.74 |  avq 
8.77 |  |  avio 0.14 ms |
DSK |  sdf  |  | busy 23%  | read 3808 
|   | write  14391 | KiB/r 36 |  |  
KiB/w  5 | MBr/s  13.49 |   | MBw/s   7.78 |  avq
12.88 |  |  avio 0.13 ms |
DSK |  sdd  |  | busy 23%  | read 3747 
|   | write  14229 | Ki