Re: [PERFORM] cost-based vacuum

2005-07-11 Thread Simon Riggs
On Fri, 2005-07-08 at 12:25 -0400, Ian Westmacott wrote:
> I am beginning to look at Postgres 8, and am particularly
> interested in cost-based vacuum/analyze.  I'm hoping someone
> can shed some light on the behavior I am seeing.
> 
> Suppose there are three threads:
> 
> writer_thread
>   every 1/15 second do
> BEGIN TRANSACTION
>   COPY table1 FROM stdin
>   ...
>   COPY tableN FROM stdin
>   perform several UPDATEs, DELETEs and INSERTs
> COMMIT
> 
> reader_thread
>   every 1/15 second do
> BEGIN TRANSACTION
>   SELECT FROM table1 ...
>   ...
>   SELECT FROM tableN ...
> COMMIT
> 
> analyze_thread
>   every 5 minutes do
> ANALYZE table1
> ...
> ANALYZE tableN
> 
> 
> Now, Postgres 8.0.3 out-of-the-box (all default configs) on a
> particular piece of hardware runs the Postgres connection for
> writer_thread at about 15% CPU (meaningless, I know, but for
> comparison) and runs the Postgres connection for reader_thread
> at about 30% CPU.  Latency for reader_thread seeing updates
> from writer_thread is well under 1/15s.  Impact of
> analyze_thread is negligible.
> 
> If I make the single configuration change of setting
> vacuum_cost_delay=1000, each iteration in analyze_thread takes
> much longer, of course.  But what I also see is that the CPU
> usage of the connections for writer_thread and reader_thread
> spike up to well over 80% each (this is a dualie) and latency
> drops to 8-10s, during the ANALYZEs.
> 
> I don't understand why this would be.  I don't think there
> are any lock issues, and I don't see any obvious I/O issues.
> Am I missing something?  Is there any way to get some
> insight into what those connections are doing?

The ANALYZE commands hold read locks on the tables you wish to write to.
If you slow them down, you merely slow down your write transactions
also, and then the read transactions that wait behind them. Every time
the ANALYZE sleeps it wakes up the other transactions, which then
realise they can't move because of locks and then wake up the ANALYZEs
for another shot. The end result is that you introduce more context-
switching, without any chance of doing more useful work while the
ANALYZEs sleep.

Don't use the vacuum_cost_delay in this situation. You might try setting
it to 0 for the analyze_thread only.

Sounds like you could speed things up by splitting everything into two
sets of tables, with writer_thread1 and writer_thread2 etc. That way
your 2 CPUs would be able to independently be able to get through more
work without locking each other out.

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] cost-based vacuum

2005-07-11 Thread Ian Westmacott
On Mon, 2005-07-11 at 07:31, Simon Riggs wrote:
> The ANALYZE commands hold read locks on the tables you wish to write to.
> If you slow them down, you merely slow down your write transactions
> also, and then the read transactions that wait behind them. Every time
> the ANALYZE sleeps it wakes up the other transactions, which then
> realise they can't move because of locks and then wake up the ANALYZEs
> for another shot. The end result is that you introduce more context-
> switching, without any chance of doing more useful work while the
> ANALYZEs sleep.

Let me make sure I understand.  ANALYZE acquires a read
lock on the table, that it holds until the operation is
complete (including any sleeps).  That read lock blocks
the extension of that table via COPY.  Is that right?

According to the 8.0 docs, ANALYZE acquires an ACCESS SHARE
lock on the table, and that conflicts only with ACCESS
EXCLUSIVE.  Thats why I didn't think I had a lock issue,
since I think COPY only needs ROW EXCLUSIVE.  Or perhaps
the transaction needs something more?

Thanks,

--Ian



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Question

2005-07-11 Thread Alejandro Lemus
In the past week, one guy of Unix Group in Colombia
say: "Postgrest in production is bat, if the power off
in any time the datas is lost why this datas is in
plain files. Postgrest no ssupport data bases with
more 1 millon of records". 
Wath tell me in this respect?, is more best Informix
as say 

Ing. Alejandro Lemus G.
Radio Taxi Aeropuerto S.A.
Avenida de las Américas # 51 - 39 Bogotá - Colombia
Tel: 571-4470694 / 571-4202600 Ext. 260 Fax: 571-2624070
email: [EMAIL PROTECTED]

__
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis! 
Regístrate ya - http://correo.espanol.yahoo.com/ 

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Question

2005-07-11 Thread Richard Huxton

Perhaps choose a better subject than "question" next time?

Alejandro Lemus wrote:

In the past week, one guy of Unix Group in Colombia
say: "Postgrest in production is bat, if the power off
in any time the datas is lost


Wrong. And it's called "PostgreSQL".

> why this datas is in

plain files. Postgrest no ssupport data bases with
more 1 millon of records". 


Wrong.


Wath tell me in this respect?, is more best Informix
as say 


Your contact in the Unix Group in Columbia obviously talks on subjects 
where he knows little. Perhaps re-evaluate anything else you've heard 
from him.


You can find details on PostgreSQL at http://www.postgresql.org/, 
including the manuals:

 http://www.postgresql.org/docs/8.0/static/index.html
The FAQ:
 http://www.postgresql.org/docs/faq/
Spanish/Brazilian communities, which might prove useful
 http://www.postgresql.org/community/international

PostgreSQL is licensed under the BSD licence, which means you can freely 
download or deploy it in a commercial setting if you desire.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] cost-based vacuum

2005-07-11 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
>> I don't understand why this would be.  I don't think there
>> are any lock issues, and I don't see any obvious I/O issues.

> The ANALYZE commands hold read locks on the tables you wish to write to.

Unless there were more commands that Ian didn't show us, he's not taking
any locks that would conflict with ANALYZE.  So I don't believe this is
the explanation.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Question

2005-07-11 Thread Magnus Hagander
> In the past week, one guy of Unix Group in Colombia
> say: "Postgrest in production is bat, if the power off in any 
> time the datas is lost why this datas is in plain files. 
> Postgrest no ssupport data bases with more 1 millon of records". 
> Wath tell me in this respect?, is more best Informix as say 

Both these statements are completely incorrect. 

Unlike some other "database systems", PostgreSQL *does* survive power
loss without any major problems. Assuming you use a metadata journailng
filesystem, and don't run with non-battery-backed write-cache (but no db
can survive that..)

And having a million records is no problem at all. You may run into
considerations when you're talking billions, but you can do that as well
- it just takes a bit more knowledge before you can do it right.

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] cost-based vacuum

2005-07-11 Thread Simon Riggs
On Mon, 2005-07-11 at 09:07 -0400, Ian Westmacott wrote:
> On Mon, 2005-07-11 at 07:31, Simon Riggs wrote:
> > The ANALYZE commands hold read locks on the tables you wish to write to.
> > If you slow them down, you merely slow down your write transactions
> > also, and then the read transactions that wait behind them. Every time
> > the ANALYZE sleeps it wakes up the other transactions, which then
> > realise they can't move because of locks and then wake up the ANALYZEs
> > for another shot. The end result is that you introduce more context-
> > switching, without any chance of doing more useful work while the
> > ANALYZEs sleep.
> 
> Let me make sure I understand.  ANALYZE acquires a read
> lock on the table, that it holds until the operation is
> complete (including any sleeps).  That read lock blocks
> the extension of that table via COPY.  Is that right?
> 
> According to the 8.0 docs, ANALYZE acquires an ACCESS SHARE
> lock on the table, and that conflicts only with ACCESS
> EXCLUSIVE.  Thats why I didn't think I had a lock issue,
> since I think COPY only needs ROW EXCLUSIVE.  Or perhaps
> the transaction needs something more?

The docs are correct, but don't show catalog and buffer locks.

...but on further reading of the code there are no catalog locks or
buffer locks held across the sleep points. So, my explanation doesn't
work as an explanation for the sleep/no sleep difference you have
observed.

Best Regards, Simon Riggs





---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Question

2005-07-11 Thread Gregory S. Williamson
As a sometimes Informix and PostgreSQL DBA, I disagree with the contentions 
below. We have many tables with 10s of millions of rows in Postgres. We have 
had (alas) power issues with our lab on more than one occasion and the 
afflicted servers have recovered like a champ, every time.

This person may not like postgres (or very much likes Informix), but he 
shouldn't conjure up spurious reasons to support his/her prejudice.

Informix is an excellent product, but it can be costly for web related 
applications. PostgeSQL is also an excellent database. Each has differences 
which may make the decision between the two of them clear. But facts are 
necessary to have a real discussion.

Greg WIlliamson
DBA
GlobeXplorer LLC

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Alejandro
Lemus
Sent: Monday, July 11, 2005 6:00 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Question


In the past week, one guy of Unix Group in Colombia
say: "Postgrest in production is bat, if the power off
in any time the datas is lost why this datas is in
plain files. Postgrest no ssupport data bases with
more 1 millon of records". 
Wath tell me in this respect?, is more best Informix
as say 

Ing. Alejandro Lemus G.
Radio Taxi Aeropuerto S.A.
Avenida de las Américas # 51 - 39 Bogotá - Colombia
Tel: 571-4470694 / 571-4202600 Ext. 260 Fax: 571-2624070
email: [EMAIL PROTECTED]

__
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis! 
Regístrate ya - http://correo.espanol.yahoo.com/ 

---(end of broadcast)---
TIP 6: explain analyze is your friend

!DSPAM:42d26e2065882109568359!


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Data Warehousing Tuning

2005-07-11 Thread Alexander Kirpa
>- Sun V250 server
>- 2*1.3GHz Sparc IIIi CPU
>- 8GB RAM
>- 8*73GB SCSI drives
>- Solaris 10
>- Postgres 8
>4) We moved the pg_xlog files off /data/postgres (disks 2-7) and into
>/opt/pg_xlog (disks 0-1), but it seemed like performance decreased, 
>so we moved them back again.
You have saturated SCSI bus.
1x160GB/s SCSI too small for 8xHDD with 30-70MB/s
Solutions:
Replace CD/DVD/tape at top 2x5" slots on 2xHDD (320 SCSI),
 install PCI 64/66 SCSI 320 controller 
 (or simple RAID1 controller for minimize
 saturation of PCI buses)
 and attach to 2xHDD. Move /opt/pg_xlog on this drives.

Best regards,
 Alexander Kirpa



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] join and query planner

2005-07-11 Thread Dario Pudlo
(first at all, sorry for my english)
Hi.
   - Does "left join" restrict the order in which the planner must join
tables? I've read about join, but i'm not sure about left join...
   - If so: Can I avoid this behavior? I mean, make the planner resolve the
query, using statistics (uniqueness, data distribution) rather than join
order.

My query looks like:
SELECT ...
  FROM a, b,
  LEFT JOIN c ON (c.key = a.key)
  LEFT JOIN d on (d.key=a.key)
  WHERE (a.key = b.key)  AND (b.column <= 100)

  b.column has a lot better selectivity, but planner insist on resolve
first c.key = a.key.

Of course, I could rewrite something like:
SELECT ...
  FROM
   (SELECT ...
FROM a,b
LEFT JOIN d on (d.key=a.key)
WHERE (b.column <= 100)
)
as aa
  LEFT JOIN c ON (c.key = aa.key)

but this is query is constructed by an application with a "multicolumn"
filter. It's dynamic.
  It means that a user could choose to look for "c.column = 1000". And
also, combinations of filters.

So, I need the planner to choose the best plan...

I've already change statistics, I clustered tables with cluster, ran vacuum
analyze, changed work_mem, shared_buffers...

Greetings. TIA.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] join and query planner

2005-07-11 Thread John A Meinel
Dario Pudlo wrote:
> (first at all, sorry for my english)
> Hi.
>- Does "left join" restrict the order in which the planner must join
> tables? I've read about join, but i'm not sure about left join...
>- If so: Can I avoid this behavior? I mean, make the planner resolve the
> query, using statistics (uniqueness, data distribution) rather than join
> order.
>
>   My query looks like:
>   SELECT ...
>   FROM a, b,
>   LEFT JOIN c ON (c.key = a.key)
>   LEFT JOIN d on (d.key=a.key)
>   WHERE (a.key = b.key)  AND (b.column <= 100)
>
>   b.column has a lot better selectivity, but planner insist on resolve
> first c.key = a.key.
>
>   Of course, I could rewrite something like:
>   SELECT ...
>   FROM
>(SELECT ...
> FROM a,b
> LEFT JOIN d on (d.key=a.key)
> WHERE (b.column <= 100)
> )
> as aa
>   LEFT JOIN c ON (c.key = aa.key)
>
>   but this is query is constructed by an application with a "multicolumn"
> filter. It's dynamic.
>   It means that a user could choose to look for "c.column = 1000". And
> also, combinations of filters.
>
>   So, I need the planner to choose the best plan...

Probably forcing the other join earlier could help:
SELECT ...
  FROM a JOIN b ON (a.key = b.key)
  LEFT JOIN c ON (c.key = a.key)
...

I think the problem is that postgresql can't break JOIN syntax very
easily. But you can make the JOIN earlier.

John
=:->
>
> I've already change statistics, I clustered tables with cluster, ran vacuum
> analyze, changed work_mem, shared_buffers...
>
> Greetings. TIA.
>
>
> ---(end of broadcast)---
> TIP 2: Don't 'kill -9' the postmaster
>



signature.asc
Description: OpenPGP digital signature


[PERFORM] Sorting on longer key is faster ?

2005-07-11 Thread jobapply
The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER BY x..

How can that be possible?

Btw: x and x||t are same ordered 

phoeniks=> explain analyze SELECT * FROM test WHERE i<20 ORDER BY x || t;
QUERY PLAN


--
 Sort  (cost=2282.65..2284.92 rows=907 width=946) (actual
time=74.982..79.114 rows=950 loops=1)
   Sort Key: (x || t)
   ->  Index Scan using i_i on test  (cost=0.00..2238.09 rows=907 width=946)
(actual time=0.077..51.015 rows=950 loops=1)
 Index Cond: (i < 20)
 Total runtime: 85.944 ms
(5 rows)

phoeniks=> explain analyze SELECT * FROM test WHERE i<20 ORDER BY x;
   QUERY PLAN

-
 Sort  (cost=2280.38..2282.65 rows=907 width=946) (actual
time=175.431..179.239 rows=950 loops=1)
   Sort Key: x
   ->  Index Scan using i_i on test  (cost=0.00..2235.82 rows=907 width=946)
(actual time=0.024..5.378 rows=950 loops=1)
 Index Cond: (i < 20)
 Total runtime: 183.317 ms
(5 rows)





phoeniks=> \d+ test
Table "public.test"
 Column |  Type   | Modifiers | Description
+-+---+-
 i  | integer |   |
 t  | text|   |
 x  | text|   |
Indexes:
"i_i" btree (i)
"x_i" btree (xpath_string(x, 'data'::text))
"x_ii" btree (xpath_string(x, 'movie/characters/character'::text))
Has OIDs: no


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Sorting on longer key is faster ?

2005-07-11 Thread John A Meinel
jobapply wrote:
> The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER BY x..
>
> How can that be possible?
>
> Btw: x and x||t are same ordered
>
> phoeniks=> explain analyze SELECT * FROM test WHERE i<20 ORDER BY x || t;
> QUERY PLAN
>

What types are x and t, I have the feeling "x || t" is actually a
boolean, so it is only a True/False sort, while ORDER BY x has to do
some sort of string comparison (which might actually be a locale
depended comparison, and strcoll can be very slow on some locales)

John
=:->

> 
> --
>  Sort  (cost=2282.65..2284.92 rows=907 width=946) (actual
> time=74.982..79.114 rows=950 loops=1)
>Sort Key: (x || t)
>->  Index Scan using i_i on test  (cost=0.00..2238.09 rows=907 width=946)
> (actual time=0.077..51.015 rows=950 loops=1)
>  Index Cond: (i < 20)
>  Total runtime: 85.944 ms
> (5 rows)
>
> phoeniks=> explain analyze SELECT * FROM test WHERE i<20 ORDER BY x;
>QUERY PLAN
> 
> -
>  Sort  (cost=2280.38..2282.65 rows=907 width=946) (actual
> time=175.431..179.239 rows=950 loops=1)
>Sort Key: x
>->  Index Scan using i_i on test  (cost=0.00..2235.82 rows=907 width=946)
> (actual time=0.024..5.378 rows=950 loops=1)
>  Index Cond: (i < 20)
>  Total runtime: 183.317 ms
> (5 rows)
>
>
>
>
>
> phoeniks=> \d+ test
> Table "public.test"
>  Column |  Type   | Modifiers | Description
> +-+---+-
>  i  | integer |   |
>  t  | text|   |
>  x  | text|   |
> Indexes:
> "i_i" btree (i)
> "x_i" btree (xpath_string(x, 'data'::text))
> "x_ii" btree (xpath_string(x, 'movie/characters/character'::text))
> Has OIDs: no
>
>
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
>
>http://www.postgresql.org/docs/faq
>



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Sorting on longer key is faster ?

2005-07-11 Thread Tom Lane
"jobapply" <[EMAIL PROTECTED]> writes:
> The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER BY x..
> How can that be possible?

Hmm, how long are the x values?  Is it possible many of them are
TOASTed?

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Sorting on longer key is faster ?

2005-07-11 Thread John A Meinel
Chris Travers wrote:
> John A Meinel wrote:
>
>> jobapply wrote:
>>
>>
>>> The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER
>>> BY x..
>>>
>>> How can that be possible?
>>>
>>> Btw: x and x||t are same ordered
>>>
>>> phoeniks=> explain analyze SELECT * FROM test WHERE i<20 ORDER BY x
>>> || t;
>>>QUERY PLAN
>>>
>>>
>>
>>
>> What types are x and t, I have the feeling "x || t" is actually a
>> boolean, so it is only a True/False sort, while ORDER BY x has to do
>> some sort of string comparison (which might actually be a locale
>> depended comparison, and strcoll can be very slow on some locales)
>>
>>
>>
> Am I reading this that wrong?  I would think that x || t would mean
> "concatenate x  and t."

Sorry, I think you are right. I was getting my operators mixed up.
>
> This is interesting.  I never through of writing a multicolumn sort this
> way

I'm also surprised that the sort is faster with a merge operation. Are
you using UNICODE as the database format? I'm just wondering if it is
doing something funny like casting it to an easier to sort type.

>
> Best Wishes,
> Chris Travers
> Metatron Technology Consulting

PS> Don't forget to Reply All so that your messages go back to the list.


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Sorting on longer key is faster ?

2005-07-11 Thread John A Meinel
jobapply wrote:
> The 2 queries are almost same, but ORDER BY x||t is FASTER than ORDER BY x..
>
> How can that be possible?
>
> Btw: x and x||t are same ordered
>
> phoeniks=> explain analyze SELECT * FROM test WHERE i<20 ORDER BY x || t;
> QUERY PLAN

I also thought of another possibility. Are there a lot of similar
entries in X? Meaning that the same value is repeated over and over? It
is possible that the sort code has a weakness when sorting equal values.

For instance, if it was doing a Hash aggregation, you would have the
same hash repeated. (It isn't I'm just mentioning a case where it might
affect something).

If it is creating a tree representation, it might cause some sort of
pathological worst-case behavior, where all entries keep adding to the
same side of the tree, rather than being more balanced.

I don't know the internals of postgresql sorting, but just some ideas.

John
=:->



signature.asc
Description: OpenPGP digital signature