[PERFORM] How to determine which indexes are not using or using seldom in database

2007-04-02 Thread Denis Lishtovny
Hello All.

I have a lot of tables and indexes in database. I must to determine which
indexes are not using or using seldon in databese . I enabled all posible
statistics in config but a can`t uderstand how to do this.
Thanks.

p.s for example i need this to reduce database size for increase backup and
restore speed.



Re: [PERFORM] How to determine which indexes are not using or using seldom in database

2007-04-02 Thread Thomas Pundt
Hi,

On Monday 02 April 2007 10:12, Denis Lishtovny wrote:
| I have a lot of tables and indexes in database. I must to determine which
| indexes are not using or using seldon in databese . I enabled all posible
| statistics in config but a can`t uderstand how to do this.
| Thanks.

Try "select * from pg_stat_user_indexes;" - that should give you a good start
to look at.

| p.s for example i need this to reduce database size for increase backup and
| restore speed.

Deleting indexes won't reduce backup size noticeably (but has impact on 
restore speed), if you use pg_dump for backup.

Ciao,
Thomas

-- 
Thomas Pundt <[EMAIL PROTECTED]>  http://rp-online.de/ 

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How to determine which indexes are not using or using seldom in database

2007-04-02 Thread Andreas Kostyrka
* Denis  Lishtovny <[EMAIL PROTECTED]> [070402 09:20]:
>Hello All.
> 
>I have a lot of tables and indexes in database. I must to determine which
>indexes are not using or using seldon in databese . I enabled all posible
>statistics in config but a can`t uderstand how to do this.
>Thanks.
> 
>p.s for example i need this to reduce database size for increase backup
>and restore speed.
Indexes are not backuped, and you can increase restore speed by
temporarily dropping them. Current pg_dumps should be fine from this
aspect.

Discovering which tables are unused via the database suggests more of
a software eng. problem IMHO. And it is bound to be unprecise and
dangerous, tables might get read from:

*) triggers. That means some tables might be only consulted if user X
is doing something. Or we have full moon. Or the Chi of the DBA barked
3 times this day.

*) during application startup only (easy to solve by forcing all clients
to restart)

*) during a cron job (daily, weekly, monthly, bi-monthly)

*) only during human orginated processes.

Not a good thing to decide to drop tables just because nothing has
accessed them for half an hour. Or even a week.

Worse, some tables might have relationsships that are missing in the
database (foreign constraint forgotten, or some relationships that are
hard to express with SQL constraints).

OTOH, if you just try to get a feel what parts of the database is
active, you can start by enabling SQL statement logging, and analyze
some of that output.

Andreas


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Scaling SELECT:s with the number of disks on a stripe

2007-04-02 Thread Peter Schuller
Hello,

> If you are dealing with timed data or similar, you may consider to
> partition your table(s).

Unfortunately this is not the case; the insertion is more or less
random (not quite, but for the purpose of this problem it is).

Thanks for the pointers though. That is sure to be useful in some
other context down the road.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org



pgp2AH9xvZCzu.pgp
Description: PGP signature


[PERFORM] Re: Wrong plan sequential scan instead of an index one [8.2 solved it]

2007-04-02 Thread Gaetano Mendola
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Tom Lane wrote:
> Richard Huxton  writes:
>> Ah, but it's got no way of knowing what matches you'll get for 
>> '%anything%'. There's no easy way to get statistics for matching substrings.
> 
> 8.2 actually tries the match on the most-common-values list, if said
> list is big enough (I think the threshold is stats target = 100).
> Not sure if that will help here, though.

I didn't change the stats target and I obtain on a 8.2 engine the result I
was expecting.


test=# explain analyze SELECT COUNT(id) FROM t_oa_2_00_card WHERE pvcp in 
(select id from l_pvcp where value ilike '%pi%');
 QUERY PLAN
- 
-
 Aggregate  (cost=163228.76..163228.77 rows=1 width=8) (actual 
time=23.398..23.398 rows=1 loops=1)
   ->  Nested Loop  (cost=74.71..163020.31 rows=83380 width=8) (actual 
time=2.237..18.580 rows=7801 loops=1)
 ->  HashAggregate  (cost=2.22..2.41 rows=19 width=4) (actual 
time=0.043..0.045 rows=1 loops=1)
   ->  Seq Scan on l_pvcp  (cost=0.00..2.17 rows=19 width=4) 
(actual time=0.028..0.037 rows=1 loops=1)
 Filter: (value ~~* '%pi%'::text)
 ->  Bitmap Heap Scan on t_oa_2_00_card  (cost=72.49..8525.04 rows=4388 
width=12) (actual time=2.188..9.204 rows=7801 loops=1)
   Recheck Cond: (t_oa_2_00_card.pvcp = l_pvcp.id)
   ->  Bitmap Index Scan on i3_t_oa_2_00_card  (cost=0.00..71.39 
rows=4388 width=0) (actual time=1.768..1.768 rows=7801 loops=1)
 Index Cond: (t_oa_2_00_card.pvcp = l_pvcp.id)
 Total runtime: 23.503 ms
(10 rows)

test=# explain analyze SELECT COUNT(id) FROM t_oa_2_00_card WHERE pvcp in 
(select id from l_pvcp where value ilike 'pi');
 QUERY PLAN
- 
-
 Aggregate  (cost=38343.44..38343.45 rows=1 width=8) (actual 
time=23.386..23.387 rows=1 loops=1)
   ->  Nested Loop  (cost=76.52..38299.55 rows=17554 width=8) (actual 
time=2.246..18.576 rows=7801 loops=1)
 ->  HashAggregate  (cost=2.18..2.22 rows=4 width=4) (actual 
time=0.041..0.043 rows=1 loops=1)
   ->  Seq Scan on l_pvcp  (cost=0.00..2.17 rows=4 width=4) (actual 
time=0.026..0.035 rows=1 loops=1)
 Filter: (value ~~* 'pi'::text)
 ->  Bitmap Heap Scan on t_oa_2_00_card  (cost=74.33..9519.48 rows=4388 
width=12) (actual time=2.198..9.161 rows=7801 loops=1)
   Recheck Cond: (t_oa_2_00_card.pvcp = l_pvcp.id)
   ->  Bitmap Index Scan on i3_t_oa_2_00_card  (cost=0.00..73.24 
rows=4388 width=0) (actual time=1.779..1.779 rows=7801 loops=1)
 Index Cond: (t_oa_2_00_card.pvcp = l_pvcp.id)
 Total runtime: 23.491 ms
(10 rows)


I had to lower the random_page_cost = 2.5 in order to avoid the sequential scan 
on the big table t_oa_2_00_card.

this is a +1 to update our engines to a 8.2.


Regards
Gaetano Mendola
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGEN237UpzwH2SGd4RAo9yAJ9K7bTa5eEUjvPjk/OcAMgt+AncmQCfbkBH
FlomqoY1ASv3TDkd9L5hgG4=
=ZLS8
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Scaling SELECT:s with the number of disks on a stripe

2007-04-02 Thread Peter Schuller
Hello,

> > SELECT * FROM test WHERE value = 'xxx' LIMIT 1000;
> 
> I tested this on a 14-way software raid10 on freebsd, using pg 8.1.6, and
> couldn't reproduce anything like it. With one client I get about 200 disk
> requests per second, scaling almost exactly linearly for the first 5 or so
> clients, as expected. At 14 clients it was down to about 150 reqs/sec per
> client, but the total throughput continued to increase with additional
> concurrency up to about 60 clients, giving about 3600 reqs/sec (260 per
> disk, which is about right for 10krpm scsi disks under highly concurrent
> random loads).

Ok. That is very intersting; so there is definitely nothing
fundamental in PG that prevents the scaling (even if on FreeBSD).

> A good question. Have you tried testing the disks directly? e.g. create
> some huge files, and run a few concurrent random readers on them? That
> would test the array and the filesystem without involving postgres.

I have confirmed that I am seeing expected performance for random
short and highly concurrent reads in one large (> 200 GB) file. The
I/O is done using libaio however, so depending on implementation I
suppose the I/O scheduling behavior of the fs/raid driver might be
affected compared to having a number of concurrent threads doing
synchronous reads. I will try to confirm performance in a way that
will more closely match PostgreSQL's behavior.

I have to say though that I will be pretty surprised if the
performance is not matched in that test.

Is there any chance there is some operation system conditional code in
pg itself that might affect this behavior? Some kind of purposeful
serialization of I/O for example (even if that sounds like an
extremely strange thing to do)?

> This is entirely expected. With the larger row count, it is more likely
> (or so the planner estimates) that rows will need to be fetched from
> adjacent or at least nearby blocks, thus a plan which fetches rows in
> physical table order rather than index order would be expected to be
> superior. The planner takes into account the estimated startup cost and
> per-row cost when planning LIMIT queries; therefore it is no surprise
> that for larger limits, it switches to a plan with a higher startup cost
> but lower per-row cost.

Roger that, makes sense. I had misunderstood the meaning of the heap
scan.

> Most likely your index is small enough that large parts of it will be
> cached in RAM, so that the scan of the index to build the bitmap does
> not need to hit the disk much if at all.

Even so however, several seconds of CPU activity to scan the index for
a few tens of thousands of entries sounds a bit excessive. Or does it
not? Because at that level, the CPU bound period alone is approaching
the time it would take to seek for each entry instead. But then I
presume the amount of work is similar/the same for the other case,
except it's being done at the beginning of the query instead of before
each seek.

-- 
/ Peter Schuller

PGP userID: 0xE9758B7D or 'Peter Schuller <[EMAIL PROTECTED]>'
Key retrieval: Send an E-Mail to [EMAIL PROTECTED]
E-Mail: [EMAIL PROTECTED] Web: http://www.scode.org



pgpK1zQziyuhZ.pgp
Description: PGP signature


[PERFORM] Providing user based previleges to Postgres DB

2007-04-02 Thread ramachandra.bhaskaram

Hi All,

  Currently in one of the projects we want to restrict the unauthorized 
users to the Postgres DB.Here we are using Postgres version 8.2.0

  Can anybody tell me how can I provide the user based previleges to the 
Postgres DB so that, we can restrict the unauthorized users as well as porivde 
the access control to the users based on the set previleges by the 
administrator. 


Thanks and Regards, 
Ramachandra B.S.



The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain proprietary, confidential or privileged information. If you are not the 
intended recipient, you should not disseminate, distribute or copy this e-mail. 
Please notify the sender immediately and destroy all copies of this message and 
any attachments. 

WARNING: Computer viruses can be transmitted via email. The recipient should 
check this email and any attachments for the presence of viruses. The company 
accepts no liability for any damage caused by any virus transmitted by this 
email.
 
www.wipro.com

Re: [PERFORM] Providing user based previleges to Postgres DB

2007-04-02 Thread Carlos Moreno

[EMAIL PROTECTED] wrote:


Hi All,

  Currently in one of the projects we want to restrict the 
unauthorized users to the Postgres DB.Here we are using Postgres 
version 8.2.0


  Can anybody tell me how can I provide the user based previleges 
to the Postgres DB so that, we can restrict the unauthorized users as 
well as porivde the access control to the users based on the set 
previleges by the administrator.



The pgsql-general list might be more appropriate for this type of
question...  Still:

Are you talking restrictions based on database-users ?   If so, look
up grant and revoke in the PG documentation  (under SQL commands).

If you're talking about restricting system-users to even attempt to use
psql  (which really, would not be much of a restriction), then perhaps
you would have to assign a group-owner to the file psql and grant
execute permission to the group only (well, and owner).

HTH,

Carlos
--



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] postgres 7.4 vs. 8.x redux

2007-04-02 Thread Alex Deucher

Sorry if anyone receives this twice; it didn't seem to go through the
first time.  I'll attach the query plans to another email in case they
were causing a size limit problem.  Also here's the here's the table
description:

  Table "public.t1"
Column   |  Type  | Modifiers
---++---
num   | character varying(30)  | not null
c1| character varying(500) |
c12   | character varying(50)  |
c2| date   |
c3| date   |
c11   | character varying(20)  |
c4| integer|
c5| integer|
c6| character varying(300) |
c7| character varying(300) |
c8| date   |
c9| character varying(100) |
c10   | character varying(50)  |
c13   | integer|
Indexes:
  "t1_pkey" primary key, btree (num)
Check constraints:
  "t1_c13" CHECK (c13 > 0 AND c13 < 6)

---

I had some problems a few weeks back when I tried to rebuild my
database on a SAN volume using postgres 8.1.  The back story is as
follows:

I had a large postgres 7.4 database (about 16 GB) that was originally
on an old sun box on scsi disks.  I rebuild the database from scratch
on a new sun box on a SAN volume.  The database performed poorly, and
at the time I assumed it was due to the SAN.  Well, after building a
new server with a fast scsi RAID array and rebuilding the DB, I've
come to find that it's about as only marginally faster than the SAN
based DB.  The old 7.4 databse is still significantly faster than both
new DBs and I'm not sure why.  The databases were created from scratch
using the same table structure on each server.

Hardware:

Old server:
Sun v880 (4x1.2 Ghz CPUs, 8GB RAM, non-RAID scsi JBOD volume, postgres
7.4, SQL_ASCII DB)
Solaris 8
~45/50MBps W/R

New server (with SAN storage): sun x4100 (4x opteron cores, 8GB ram,
SAN volume, postgres 8.1, UNICODE DB)
debian etch
~65/150MBps  W/R

New server (with local scsi RAID): sun x4100 (4x opteron cores, 8GB
ram,  RAID scsi volume, postgres 8.2 tried both UNICODE and SQL_ASCII
DBs)
debian etch
~160/185 MBps W/R

Most of the queries we do are significantly slower on the new servers.
Thinking that the UTF8 format might be slowing things down, I also
tried SQL_ASCII, but the change in performance was negligible.  I've
tweaked just about every option in the config file, but nothing seems
to make a difference.  The only thing I can see that would make a
difference is the query plans.  The old 7.4 server seems to use index
scans for just about every query we throw at it.  The new servers seem
to prefer bitmap heap scans and sequential scans.  I tried adjusting
the planner options, but no matter what I did, it seems to like the
sequential and bitmap scans.  I've analyzed and vacuumed.

Does anyone have any ideas what might be going wrong?  I suppose the
next thing to try is 7.4 on the new servers, but I'd really like to
stick to the 8.x series if possible.

I've included some sample query plans below.

Thanks,

Alex

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Scaling SELECT:s with the number of disks on a stripe

2007-04-02 Thread Andrew - Supernews
On 2007-04-02, Peter Schuller <[EMAIL PROTECTED]> wrote:
> I have confirmed that I am seeing expected performance for random
> short and highly concurrent reads in one large (> 200 GB) file. The
> I/O is done using libaio however, so depending on implementation I
> suppose the I/O scheduling behavior of the fs/raid driver might be
> affected compared to having a number of concurrent threads doing
> synchronous reads. I will try to confirm performance in a way that
> will more closely match PostgreSQL's behavior.
>
> I have to say though that I will be pretty surprised if the
> performance is not matched in that test.

The next question then is whether anything in your postgres configuration
is preventing it getting useful performance from the OS. What settings
have you changed in postgresql.conf? Are you using any unusual settings
within the OS itself?

> Is there any chance there is some operation system conditional code in
> pg itself that might affect this behavior?

Unlikely.

>> Most likely your index is small enough that large parts of it will be
>> cached in RAM, so that the scan of the index to build the bitmap does
>> not need to hit the disk much if at all.
>
> Even so however, several seconds of CPU activity to scan the index for
> a few tens of thousands of entries sounds a bit excessive. Or does it
> not? Because at that level, the CPU bound period alone is approaching
> the time it would take to seek for each entry instead. But then I
> presume the amount of work is similar/the same for the other case,
> except it's being done at the beginning of the query instead of before
> each seek.

You're forgetting the LIMIT clause. For the straight index scan, the
query aborts when the LIMIT is reached having scanned only the specified
number of index rows (plus any index entries that turned out to be dead
in the heap). For the bitmap scan case, the limit can be applied only after
the heap scan is under way, therefore the index scan to build the bitmap
will need to scan ~50k rows, not the 10k specified in the limit, so the
amount of time spent scanning the index is 50 times larger than in the
straight index scan case.

However, I do suspect you have a problem here somewhere, because in my
tests the time taken to do the bitmap index scan on 50k rows, with the
index in cache, is on the order of 30ms (where the data is cached in
shared_buffers) to 60ms (where the data is cached by the OS). That's on
a 2.8GHz xeon.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] postgres 7.4 vs 8.x redux: query plans

2007-04-02 Thread Alex Deucher

and here are the query plans referenced in my last email (apologies if
you get these twice, they didn't seem to go through the first time,
perhaps due to size?).  I cut out the longer ones.

Thanks,

Alex

postgres 7.4

EXPLAIN ANALYZE select num, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10,
c11 from t1 where num like 'RT2350533%' or num like 'GH0405545%' or
num like 'KL8403192%';


  QUERY PLAN
--
Index Scan using t1_pkey, t1_pkey, t1_pkey on t1  (cost=0.00..17.93
rows=1 width=164) (actual time=0.103..0.238 rows=3 loops=1)
 Index Cond: num)::text >= 'RT2350533'::character varying) AND
((num)::text < 'RT2350534'::character varying)) OR (((num)::text >=
'GH0405545'::character varying) AND ((num)::text <
'GH0405546'::character varying)) OR (((num)::text >=
'KL8403192'::character varying) AND ((num)::text <
'KL8403193'::character varying)))
 Filter: (((num)::text ~~ 'RT2350533%'::text) OR ((num)::text ~~
'GH0405545%'::text) OR ((num)::text ~~ 'KL8403192%'::text))
Total runtime: 0.427 ms
(4 rows)


postgres 8.2

EXPLAIN ANALYZE select num, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10,
c11 from t1 where num like 'RT2350533%' or num like 'GH0405545%' or
num like 'KL8403192%';
QUERY PLAN
-
Seq Scan on t1  (cost=0.00..918295.05 rows=1 width=156) (actual
time=15.674..26225.919 rows=3 loops=1)
 Filter: (((num)::text ~~ 'RT2350533%'::text) OR ((num)::text ~~
'GH0405545%'::text) OR ((num)::text ~~ 'KL8403192%'::text))
Total runtime: 26225.975 ms
(3 rows)


posgres 7.4

EXPLAIN ANALYZE select num, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10,
c11 from t1 where num in
('AB6253262','AB6145031','AB6091431','AB6286083','AB5857086','AB5649157','AB7089381','AB5557744','AB6314478','AB6505260','AB6249847','AB5832304');



 QUERY PLAN
---
Index Scan using t1_pkey, t1_pkey, t1_pkey, t1_pkey, t1_pkey,
t1_pkey, t1_pkey, t1_pkey, t1_pkey, t1_pkey, t1_pkey, t1_pkey on t1
(cost=0.00..71.97 rows=12 width=164) (actual time=0.132..0.729 rows=12
loops=1)
 Index Cond: (((num)::text = 'AB6253262'::text) OR ((num)::text =
'AB6145031'::text) OR ((num)::text = 'AB6091431'::text) OR
((num)::text = 'AB6286083'::text) OR ((num)::text = 'AB5857086'::text)
OR ((num)::text = 'AB5649157'::text) OR ((num)::text =
'AB7089381'::text) OR ((num)::text = 'AB5557744'::text) OR
((num)::text = 'AB6314478'::text) OR ((num)::text = 'AB6505260'::text)
OR ((num)::text = 'AB6249847'::text) OR ((num)::text =
'AB5832304'::text))
Total runtime: 1.019 ms
(3 rows)

postgres 8.2

EXPLAIN ANALYZE select num, c1, c2, c3, c4, c5, c6, c7, c8, c9, c10,
c11 from t1 where num in
('AB6253262','AB6145031','AB6091431','AB6286083','AB5857086','AB5649157','AB7089381','AB5557744','AB6314478','AB6505260','AB6249847','AB5832304');

 QUERY PLAN
---
Bitmap Heap Scan on t1  (cost=28.98..53.25 rows=12 width=156) (actual
time=61.442..61.486 rows=12 loops=1)
 Recheck Cond: ((num)::text = ANY
(('{AB6253262,AB6145031,AB6091431,AB6286083,AB5857086,AB5649157,AB7089381,AB5557744,AB6314478,AB6505260,AB6249847,AB5832304}'::character
varying[])::text[]))
 ->  Bitmap Index Scan on t1_pkey  (cost=0.00..28.98 rows=12
width=0) (actual time=61.429..61.429 rows=12 loops=1)
   Index Cond: ((num)::text = ANY
(('{AB6253262,AB6145031,AB6091431,AB6286083,AB5857086,AB5649157,AB7089381,AB5557744,AB6314478,AB6505260,AB6249847,AB5832304}'::character
varying[])::text[]))
Total runtime: 61.544 ms
(5 rows)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] postgres 7.4 vs 8.x redux: query plans

2007-04-02 Thread Tom Lane
"Alex Deucher" <[EMAIL PROTECTED]> writes:
> and here are the query plans referenced in my last email (apologies if
> you get these twice, they didn't seem to go through the first time,
> perhaps due to size?).  I cut out the longer ones.

The first case looks a whole lot like 8.2 does not think it can use an
index for LIKE, which suggests strongly that you've used the wrong
locale in the 8.2 installation (ie, not C).

The second pair of plans may look a lot different but in principle they
ought to perform pretty similarly.  I think the performance differential
may at root be that string comparison is way more expensive in the 8.2
installation, which again is possible if you went from C locale to some
other locale.

In short: check out "show lc_collate" in both installations.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match