[PERFORM] Reading recommendations

2005-03-30 Thread Marc Burgauer
Hi
I am looking for some references to literature. While we have used 
PostgreSQL in the past for a fair number of smaller projects, we are 
now starting to use it on a larger scale and hence need to get into 
issues of performance optimisation and tuning. While I am OK with using 
the EXPLAIN features, I am getting quite insecure when facing things 
like the options in the postgresql.conf file. For example reading the 
man page on fsync option, it tells me to "read the detailed 
documentation before using this!" I then read the Admin guide where I 
get told that the benefits of this feature are issue of debate, leaving 
me with little help as to how to make up my mind on this issue. So I 
turn to this mailing list, but starting reading in the archive realise 
that compared to the knowledge standard here, I am as wise as a baby.

I have read most of Bruce Momjian's book on PostgreSQL (Can I update my 
2001 edition somehow? ;-)
I have Sams' PostgreSQL Developer's Handbook (which is OK too), but 
offers little to nothing on operational issues.
I have read most of the admin (and user) guide, but it does not help me 
really understand the issues:
CPU_INDEX_TUPLE_COST (floating point) Sets the query optimizer’s 
estimate of the cost of processing each index tuple during an index 
scan. This is measured as a fraction of the cost of a sequential page 
fetch.
No idea what this means! (And should I feel bad for it?)
I am an application programmer with a little UNIX know-how.
What books or sources are out there that I can buy/download and that I 
should read to get to grips with the more advanced issues of running 
PostgreSQL?

More on what we do (for those interested):
We use PostgreSQL mainly with its PostGIS extension as the database 
backend for Zope-based applications. Adding PostGIS features is what 
has started to cause noticeable increase in the server load.
We're using the GIS enabled system on this platform:
PostgreSQL 7.3.4
PostGIS 0.8
Zope 2.7.5
Python 2.3.5
(Database-based functions are written in PL/PGSQL, not python!!)

on a 2-CPU (450MHz Intel P3) Compaq box (some Proliant flavour)
With a SCSI 4-disk RAID system (mirrored and striped)
SunOS 5.8 (Which I think is Solaris 8)
The server is administrated by my host (co-located). We cannot easily 
upgrade to a newer version of Solaris, because we could not find a 
driver for the disk controller used in this server. (And our host did 
not manage to write/patch one up.)

As a business, we are creating and operating on-line communities, (for 
an example go to http://www.theguidlife.net) not only from a technical 
point of view, but also supporting the communities in producing 
content.

BTW. If you are a SQL/python programmer in (or near) Lanarkshire, 
Scotland, we have a vacancy. ;-)

Cheers
Marc
--
Marc Burgauer
Sharedbase Ltd
http://www.sharedbase.com
Creating and supporting on-line communities
---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


[PERFORM] Weird index scan

2005-03-30 Thread Tambet Matiisen

I'm getting weird results for one of my queries. The actual time of this
index scan doesn't make any sense:

->  Index Scan using dok_dok_fk_i on dokumendid a  (cost=0.00..566.24
rows=184 width=8) (actual time=0.170..420806.563 rows=1 loops=1) 

dok_dok_fk_i is index on dokumendid(dok_dok_id). Currently it contains
mostly NULLs:

pos1=# select dok_dok_id, count(1) from dokumendid group by dok_dok_id;
 dok_dok_id | count
+---
| 11423
   8034 |76
(2 rows)

If I drop the index, seq scan + sort is used instead and everything is
fast again.

The PostgreSQL version:

pos1=# select version();
   version

--
 PostgreSQL 7.4.5 on i386-pc-linux-gnu, compiled by GCC i386-linux-gcc
(GCC) 3.3.4 (Debian 1:3.3.4-9)
(1 row)

The full EXPLAIN ANALYZE output:

pos1=# explain analyze select * from v_inventuuri_vahed_kaubagrupiti;
 
QUERY PLAN 


-
 Subquery Scan v_inventuuri_vahed_kaubagrupiti  (cost=50896.04..50896.61
rows=46 width=128) (actual time=437007.670..437007.817 rows=45 loops=1)
   ->  Sort  (cost=50896.04..50896.15 rows=46 width=42) (actual
time=437007.664..437007.692 rows=45 loops=1)
 Sort Key: (COALESCE(sum(ir.summa_kmta), 0::numeric))::raha
 ->  HashAggregate  (cost=50893.85..50894.77 rows=46 width=42)
(actual time=437007.229..437007.488 rows=45 loops=1)
   ->  Hash Join  (cost=5533.44..50807.93 rows=5728
width=42) (actual time=436226.533..436877.499 rows=16271 loops=1)
 Hash Cond: ("outer".kau_kau_id = "inner".kau_id)
 ->  Merge Right Join  (cost=4759.52..49858.92
rows=15696 width=26) (actual time=436117.333..436600.653 rows=16271
loops=1)
   Merge Cond: (("outer".dok_dok_id =
"inner".dok_id) AND ("outer".kau_kau_id = "inner".kau_kau_id))
   ->  Index Scan using dor_dok_kau_i on
dokumentide_read ar  (cost=0.00..42789.44 rows=480962 width=19) (actual
time=0.023..7873.117 rows=205879 loops=1)
   ->  Sort  (cost=4759.52..4798.76 rows=15696
width=19) (actual time=428381.719..428392.204 rows=16271 loops=1)
 Sort Key: a.dok_id, ir.kau_kau_id
 ->  Merge Left Join
(cost=0.00..3665.65 rows=15696 width=19) (actual time=0.245..428279.595
rows=16258 loops=1)
   Merge Cond: ("outer".dok_id =
"inner".dok_dok_id)
   ->  Nested Loop
(cost=0.00..3620.23 rows=15696 width=19) (actual time=0.063..7243.529
rows=16258 loops=1)
 ->  Index Scan using dok_pk
on dokumendid i  (cost=0.00..3.73 rows=1 width=4) (actual
time=0.030..0.035 rows=1 loops=1)
   Index Cond: (dok_id =
8034)
   Filter: (tyyp =
'IN'::bpchar)
 ->  Index Scan using
dor_dok_fk_i on dokumentide_read ir  (cost=0.00..3459.55 rows=15696
width=19) (actual time=0.023..7150.257 rows=16258 loops=1)
   Index Cond: (8034 =
dok_dok_id)
   ->  Index Scan using dok_dok_fk_i
on dokumendid a  (cost=0.00..566.24 rows=184 width=8) (actual
time=0.170..420806.563 rows=1 loops=1)
 Filter: (tyyp =
'IA'::bpchar)
 ->  Hash  (cost=757.71..757.71 rows=6487 width=24)
(actual time=109.178..109.178 rows=0 loops=1)
   ->  Hash Join  (cost=15.56..757.71 rows=6487
width=24) (actual time=1.787..85.554 rows=17752 loops=1)
 Hash Cond: ("outer".kag_kag_id =
"inner".a_kag_id)
 ->  Seq Scan on kaubad k
(cost=0.00..588.52 rows=17752 width=8) (actual time=0.005..30.952
rows=17752 loops=1)
 ->  Hash  (cost=15.35..15.35 rows=83
width=24) (actual time=1.770..1.770 rows=0 loops=1)
   ->  Hash Join  (cost=5.39..15.35
rows=83 width=24) (actual time=0.276..1.491 rows=227 loops=1)
 Hash Cond:
("outer".y_kag_id = "inner".kag_id)
 ->  Seq Scan on
kaubagruppide_kaubagrupid gg  (cost=0.00..7.09 rows=409 width=8) (actual
time=0.004..0.405 rows=409 loops=1)
 ->  Hash  (cost=5.27..5.27
rows=46 width=20) (actual time=0.259..0.259 rows=0 loops=1)
   ->  Seq Scan on
kaubagrupid g  (cost=0.00.

Re: [PERFORM] VACUUM on duplicate DB gives FSM and total pages discrepancies

2005-03-30 Thread Alvaro Herrera
On Tue, Mar 29, 2005 at 05:52:58PM -0700, Karim Nassar wrote:
> I recently pg_dumpall'd my DB from a (used for testing) v20z install of
> postgresql 8.0.1, and restored it to my production (but not yet in
> service) v40z running the same version. The test DB has had multiple
> millions of rows created/dropped during testing. The results of VACUUM
> VERBOSE are giving me pause:

The FSM only stores pages that have some free space.  If the database
has only been through a restore, then probably there aren't many of
those.  After you start playing with the data some more pages will need
registering.  So the behavior you are seeing is expected.

-- 
Alvaro Herrera (<[EMAIL PROTECTED]>)
"We are who we choose to be", sang the goldfinch
when the sun is high (Sandman)

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Reading recommendations

2005-03-30 Thread Michael Fuhr
On Wed, Mar 30, 2005 at 12:07:29PM +0100, Marc Burgauer wrote:
> 
> What books or sources are out there that I can buy/download and that I 
> should read to get to grips with the more advanced issues of running 
> PostgreSQL?

See the Power PostgreSQL Performance & Configuration documents:

http://www.powerpostgresql.com/Docs/

> BTW. If you are a SQL/python programmer in (or near) Lanarkshire, 
> Scotland, we have a vacancy. ;-)

Allow telecommute from across the pond and I might be interested :-)

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Reading recommendations

2005-03-30 Thread Richard_D_Levine


[EMAIL PROTECTED] wrote on 03/30/2005 10:58:21 AM:

>
> Allow telecommute from across the pond and I might be interested :-)

Please post phone bills to this list.

>
> --
> Michael Fuhr
> http://www.fuhr.org/~mfuhr/
>
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Reading recommendations

2005-03-30 Thread Mohan, Ross
VOIP over BitTorrent?  

;-)


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: Wednesday, March 30, 2005 11:27 AM
To: Michael Fuhr
Cc: Marc Burgauer; pgsql-performance@postgresql.org; [EMAIL PROTECTED]
Subject: Re: [PERFORM] Reading recommendations




[EMAIL PROTECTED] wrote on 03/30/2005 10:58:21 AM:

>
> Allow telecommute from across the pond and I might be interested :-)

Please post phone bills to this list.

>
> --
> Michael Fuhr

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Reading recommendations

2005-03-30 Thread Steve Wampler
Mohan, Ross wrote:
> VOIP over BitTorrent?  

Now *that* I want to see.  Aught to be at least as interesting
as the "TCP/IP over carrier pigeon" experiment - and more
challenging to boot!


-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Reading recommendations

2005-03-30 Thread Mohan, Ross
Yea, the upside is that you get better than the 1 byte/hour rate for 
pigeon-net. 

Downside is that simply because you use BiTorrent, the RIAA accuses you of 
everything from CD piracy to shipping pr*n to cyberterrorism, and you spend 
the next four years in Gitmo, comparing notes with your cellmates in Camp 
X-Ray, 
and watching pigeons fly overhead. 


-Original Message-
From: Steve Wampler [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, March 30, 2005 11:52 AM
To: Mohan, Ross
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Reading recommendations


Mohan, Ross wrote:
> VOIP over BitTorrent?

Now *that* I want to see.  Aught to be at least as interesting as the "TCP/IP 
over carrier pigeon" experiment - and more challenging to boot!


-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Dynamic query perormance

2005-03-30 Thread Keith Worthington
Hi All,

I am developing a simple set returning function as my first step towards more
complicated processes.  I would like to understand the implications of using
the dynamic query capability.

I have built two forms of an identically performing function.  The first uses
a simple IF-THEN-ELSIF-THEN-ELSE structure to choose which query to run.  The
second builds the query dynamically using the FOR-IN-EXECUTE structure and a
CASE statement.

The documentation
(http://www.postgresql.org/docs/8.0/interactive/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING)
indicates that a dynamic query (EXECUTE) is replanned for every LOOP iteration.

  This is like the previous form, except that the source
  SELECT statement is specified as a string expression,
  which is evaluated and replanned on each entry to the
  FOR loop. This allows the programmer to choose the speed
  of a preplanned query or the flexibility of a dynamic
  query, just as with a plain EXECUTE statement.

That seems like a potential performance problem.  I don't understand why the
query would be planned for every LOOP iteration when the LOOP is over the
record set.

Your comments are appreciated.

Kind Regards,
Keith


CREATE OR REPLACE FUNCTION func_item_list("varchar")
  RETURNS SETOF VARCHAR AS
$BODY$
   DECLARE
  v_status ALIAS FOR $1;
  r_item_id RECORD;
   BEGIN
--Build the record set using the appropriate query.
  IF lower(v_status) = 'active' THEN
 FOR r_item_id IN SELECT tbl_item.id
FROM tbl_item
   WHERE NOT tbl_item.inactive
   ORDER BY tbl_item.id
 LOOP
RETURN NEXT r_item_id;
 END LOOP;
  ELSIF lower(v_status) = 'inactive' THEN
 FOR r_item_id IN SELECT tbl_item.id
FROM tbl_item
   WHERE tbl_item.inactive
   ORDER BY tbl_item.id
 LOOP
RETURN NEXT r_item_id;
 END LOOP;
  ELSE
 FOR r_item_id IN SELECT tbl_item.id
FROM tbl_item
   ORDER BY tbl_item.id
 LOOP
RETURN NEXT r_item_id;
 END LOOP;
  END IF;
  RETURN;
   END;
$BODY$
  LANGUAGE 'plpgsql' VOLATILE;

SELECT * FROM func_item_list('Active');



CREATE OR REPLACE FUNCTION func_item_list("varchar")
  RETURNS SETOF VARCHAR AS
$BODY$
   DECLARE
  v_status ALIAS FOR $1;
  r_item_id RECORD;
   BEGIN
--Build the record set using a dynamically built query.
  FOR r_item_id IN EXECUTE 'SELECT tbl_item.id
  FROM tbl_item' ||
  CASE WHEN lower(v_status) = 'active' THEN
  ' WHERE NOT tbl_item.inactive '
   WHEN lower(v_status) = 'inactive' THEN
  ' WHERE tbl_item.inactive '
   ELSE
  ' '
  END  ||
   ' ORDER BY tbl_item.id'
  LOOP
 RETURN NEXT r_item_id;
  END LOOP;
  RETURN;
   END;
$BODY$
  LANGUAGE 'plpgsql' VOLATILE;

SELECT * FROM func_item_list('AcTiVe');


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Weird index scan

2005-03-30 Thread G u i d o B a r o s i o
Hi,

1) seems that the table is a view, I am wrong? If this is true, please   
give a query to that table, and try to guess if there is already a bottleneck 
there.

2) Add to the query an order by and try to find if it works better.

3) If you drop the index, and no other index exists, it will always use a 
seqscan or other method to gather the rows. No other index is plausible to be 
used there? (perhaps order by indexedcolumn may help).

A hint, drop that index, identify a usable index, and set enable_seqscan to 
off; on your session (or as a global value on the conf file)

Best wishes,
Guido

> 
> I'm getting weird results for one of my queries. The actual time of this
> index scan doesn't make any sense:
> 
> ->  Index Scan using dok_dok_fk_i on dokumendid a  (cost=0.00..566.24
> rows=184 width=8) (actual time=0.170..420806.563 rows=1 loops=1) 
> 
> dok_dok_fk_i is index on dokumendid(dok_dok_id). Currently it contains
> mostly NULLs:
> 
> pos1=# select dok_dok_id, count(1) from dokumendid group by dok_dok_id;
>  dok_dok_id | count
> +---
> | 11423
>8034 |76
> (2 rows)
> 
> If I drop the index, seq scan + sort is used instead and everything is
> fast again.
> 
> The PostgreSQL version:
> 
> pos1=# select version();
>version
> 
> --
>  PostgreSQL 7.4.5 on i386-pc-linux-gnu, compiled by GCC i386-linux-gcc
> (GCC) 3.3.4 (Debian 1:3.3.4-9)
> (1 row)
> 
> The full EXPLAIN ANALYZE output:
> 
> pos1=# explain analyze select * from v_inventuuri_vahed_kaubagrupiti;
>  
> QUERY PLAN 
> 
> 
> -
>  Subquery Scan v_inventuuri_vahed_kaubagrupiti  (cost=50896.04..50896.61
> rows=46 width=128) (actual time=437007.670..437007.817 rows=45 loops=1)
>->  Sort  (cost=50896.04..50896.15 rows=46 width=42) (actual
> time=437007.664..437007.692 rows=45 loops=1)
>  Sort Key: (COALESCE(sum(ir.summa_kmta), 0::numeric))::raha
>  ->  HashAggregate  (cost=50893.85..50894.77 rows=46 width=42)
> (actual time=437007.229..437007.488 rows=45 loops=1)
>->  Hash Join  (cost=5533.44..50807.93 rows=5728
> width=42) (actual time=436226.533..436877.499 rows=16271 loops=1)
>  Hash Cond: ("outer".kau_kau_id = "inner".kau_id)
>  ->  Merge Right Join  (cost=4759.52..49858.92
> rows=15696 width=26) (actual time=436117.333..436600.653 rows=16271
> loops=1)
>Merge Cond: (("outer".dok_dok_id =
> "inner".dok_id) AND ("outer".kau_kau_id = "inner".kau_kau_id))
>->  Index Scan using dor_dok_kau_i on
> dokumentide_read ar  (cost=0.00..42789.44 rows=480962 width=19) (actual
> time=0.023..7873.117 rows=205879 loops=1)
>->  Sort  (cost=4759.52..4798.76 rows=15696
> width=19) (actual time=428381.719..428392.204 rows=16271 loops=1)
>  Sort Key: a.dok_id, ir.kau_kau_id
>  ->  Merge Left Join
> (cost=0.00..3665.65 rows=15696 width=19) (actual time=0.245..428279.595
> rows=16258 loops=1)
>Merge Cond: ("outer".dok_id =
> "inner".dok_dok_id)
>->  Nested Loop
> (cost=0.00..3620.23 rows=15696 width=19) (actual time=0.063..7243.529
> rows=16258 loops=1)
>  ->  Index Scan using dok_pk
> on dokumendid i  (cost=0.00..3.73 rows=1 width=4) (actual
> time=0.030..0.035 rows=1 loops=1)
>Index Cond: (dok_id =
> 8034)
>Filter: (tyyp =
> 'IN'::bpchar)
>  ->  Index Scan using
> dor_dok_fk_i on dokumentide_read ir  (cost=0.00..3459.55 rows=15696
> width=19) (actual time=0.023..7150.257 rows=16258 loops=1)
>Index Cond: (8034 =
> dok_dok_id)
>->  Index Scan using dok_dok_fk_i
> on dokumendid a  (cost=0.00..566.24 rows=184 width=8) (actual
> time=0.170..420806.563 rows=1 loops=1)
>  Filter: (tyyp =
> 'IA'::bpchar)
>  ->  Hash  (cost=757.71..757.71 rows=6487 width=24)
> (actual time=109.178..109.178 rows=0 loops=1)
>->  Hash Join  (cost=15.56..757.71 rows=6487
> width=24) (actual time=1.787..85.554 rows=17752 loops=1)
>  Hash Cond: ("outer".kag_kag_id =
> "inner".a_kag_id)
>  ->  Seq Scan on kaubad k
> (cost=0.00..588.52 rows=17752 width=8) (a

Re: [PERFORM] Reading recommendations

2005-03-30 Thread Bruno Wolff III
On Wed, Mar 30, 2005 at 16:39:47 -,
  "Mohan, Ross" <[EMAIL PROTECTED]> wrote:
> VOIP over BitTorrent?  

Plain VOIP shouldn't be a problem. And if you want to do tricky things
you can use Asterisk on both ends. Asterisk is open source (GPL, duel
licensed from Digium) and runs on low powered linux boxes. A card that
talks to your existing analog phones and your existing phone line
costs $200. You don't need special cards if you have IP phones or a headset
connected to your computer and don't use your local phone company for
the calls.

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Dynamic query perormance

2005-03-30 Thread John Arbash Meinel
Keith Worthington wrote:
Hi All,
I am developing a simple set returning function as my first step towards more
complicated processes.  I would like to understand the implications of using
the dynamic query capability.
I have built two forms of an identically performing function.  The first uses
a simple IF-THEN-ELSIF-THEN-ELSE structure to choose which query to run.  The
second builds the query dynamically using the FOR-IN-EXECUTE structure and a
CASE statement.
The documentation
(http://www.postgresql.org/docs/8.0/interactive/plpgsql-control-structures.html#PLPGSQL-RECORDS-ITERATING)
indicates that a dynamic query (EXECUTE) is replanned for every LOOP iteration.
 This is like the previous form, except that the source
 SELECT statement is specified as a string expression,
 which is evaluated and replanned on each entry to the
 FOR loop. This allows the programmer to choose the speed
 of a preplanned query or the flexibility of a dynamic
 query, just as with a plain EXECUTE statement.
That seems like a potential performance problem.  I don't understand why the
query would be planned for every LOOP iteration when the LOOP is over the
record set.

Reading the documentation and looking at the example, I don't think
you're query will be re-planned for each entry in the loop.
I think it will be planned each time the FOR loop is started.
If you have the EXECUTE *inside* the LOOP, then it would be re-planned
for each entry.
At least that is the case for a normal EXECUTE without any for loop.
Each time the function is called, the statement is re-planned. Versus
without EXECUTE when the planning is done at function declaration time.
I would guess that the FOR .. IN EXECUTE .. LOOP runs the EXECUTE one
time, and generates the results which it then loops over. Because that
is what FOR .. IN SELECT .. LOOP does (you don't re-evaluate the SELECT
for each item in the result set).
On the other hand, I don't know of any way to test this, unless you have
a query that you know takes a long time to plan, and can compare the
performance of FOR IN EXECUTE versus FOR IN SELECT.
John
=:->
Your comments are appreciated.
Kind Regards,
Keith




signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Dynamic query perormance

2005-03-30 Thread PFC

  which is evaluated and replanned on each entry to the
  FOR loop. This allows the programmer to choose the speed
	On each entry is not the same as on each iteration. It would means "every  
time the loop is started"...

Regards,
PFC
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Reading recommendations

2005-03-30 Thread Richard_D_Levine
It was very challenging.  I worked on the credit window sizing and
retransmission timer estimation algorithms.  We took into account weather
patterns, size and age of the bird, feeding times, and the average number
of times a bird circles before determining magnetic north.  Interestingly,
packet size had little effect in the final algorithms.

[EMAIL PROTECTED] wrote on 03/30/2005 11:52:13 AM:

> Mohan, Ross wrote:
> > VOIP over BitTorrent?
>
> Now *that* I want to see.  Aught to be at least as interesting
> as the "TCP/IP over carrier pigeon" experiment - and more
> challenging to boot!
>

It was very challenging.  I worked on the credit window sizing and
retransmission timer estimation algorithms.  We took into account weather
patterns, size and age of the bird, feeding times, and the average number
of times a bird circles before determining magnetic north.  Interestingly,
packet size had little effect in the final algorithms.

I would love to share them with all of you, but they're classified.

>
> --
> Steve Wampler -- [EMAIL PROTECTED]
> The gods that smiled on your birth are now laughing out loud.
>
> ---(end of broadcast)---
> TIP 4: Don't 'kill -9' the postmaster


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Left Outer Join much faster than non-outer Join?

2005-03-30 Thread rm_pg

Can anyone please help me make my JOIN find the right index to use?

It seems strange to me that in the two queries listed below, the 
LEFT OUTER JOIN can find the most efficient index to use, while 
the unadorned JOIN can not.   The result is that my query is 
orders of magnitude slower than it seems it should be.



The table "tlid_smaller" (\d and explain analyze shown below) is a
large table contining integer IDs just like the fact table of any
traditional star-schema warehouse.

The tables *_lookup are simply tables that map strings to IDs, with
unique IDs associating strings to the IDs.

The table "tlid_smaller" has an index on (streetname_id, city_id) that
is extremely efficient at finding the desired row.  When I use a "LEFT
OUTER JOIN", the optimizer happily sees that it can use this index.
This is shown in the first explain analyze below.  However when I
simply do a "JOIN" the optimizer does not use this index and rather
does a hash join comparing thousands of rows.

Note that the cost estimate using the good index is much better 
(16.94 vs 29209.16 thousands of times better).  Any ideas why
the non-outer join didn't use it?






fli=# explain analyze
 select *
 from streetname_lookup as sl
 join city_lookup as cl on (true)
 left outer join tlid_smaller as ts on (sl.geo_streetname_id = 
ts.geo_streetname_id and cl.geo_city_id=ts.geo_city_id)
 where  str_name='alamo' and  city='san antonio' and state='TX'
;
fli-# fli-# fli-# fli-# fli-# fli-# 
  QUERY PLAN
  \

---
 Nested Loop Left Join  (cost=0.00..16.94 rows=1 width=74) (actual 
time=0.115..0.539 rows=78 loops=1)
   ->  Nested Loop  (cost=0.00..9.03 rows=1 width=42) (actual time=0.077..0.084 
rows=1 loops=1)
 ->  Index Scan using streetname_lookup__str_name on streetname_lookup 
sl  (cost=0.00..3.01 rows=1 width=19) (actual time=0.042..0.044 rows=1 loops=1)
   Index Cond: (str_name = 'alamo'::text)
 ->  Index Scan using city_lookup__name on city_lookup cl  
(cost=0.00..6.01 rows=1 width=23) (actual time=0.026..0.028 rows=1 loops=1)
   Index Cond: ((city = 'san antonio'::text) AND (state = 
'TX'::text))
   ->  Index Scan using tlid_smaller__street_city on tlid_smaller ts  
(cost=0.00..7.86 rows=3 width=32) (actual time=0.029..0.176 rows=78 loops=1)
 Index Cond: (("outer".geo_streetname_id = ts.geo_streetname_id) AND 
("outer".geo_city_id = ts.geo_city_id))
 Total runtime: 0.788 ms
(9 rows)


fli=#
fli=# explain analyze
 select *
 from streetname_lookup as sl
 join city_lookup as cl on (true)
 join tlid_smaller as ts on (sl.geo_streetname_id = ts.geo_streetname_id 
and cl.geo_city_id=ts.geo_city_id)
 where  str_name='alamo' and  city='san antonio' and state='TX'
;
fli-# fli-# fli-# fli-# fli-# fli-# 
 QUERY PLAN 
  \

-
 Hash Join  (cost=6.01..29209.16 rows=1 width=74) (actual time=9.421..28.154 
rows=78 loops=1)
   Hash Cond: ("outer".geo_city_id = "inner".geo_city_id)
   ->  Nested Loop  (cost=0.00..29202.88 rows=52 width=51) (actual 
time=0.064..23.296 rows=4151 loops=1)
 ->  Index Scan using streetname_lookup__str_name on streetname_lookup 
sl  (cost=0.00..3.01 rows=1 width=19) (actual time=0.025..0.032 rows=1 loops=1)
   Index Cond: (str_name = 'alamo'::text)
 ->  Index Scan using tlid_smaller__street_zipint on tlid_smaller ts  
(cost=0.00..28994.70 rows=16413 width=32) (actual time=0.028..8.153 rows=4151 
loops=1)
   Index Cond: ("outer".geo_streetname_id = ts.geo_streetname_id)
   ->  Hash  (cost=6.01..6.01 rows=1 width=23) (actual time=0.073..0.073 rows=0 
loops=1)
 ->  Index Scan using city_lookup__name on city_lookup cl  
(cost=0.00..6.01 rows=1 width=23) (actual time=0.065..0.067 rows=1 loops=1)
   Index Cond: ((city = 'san antonio'::text) AND (state = 
'TX'::text))
 Total runtime: 28.367 ms
(11 rows)

fli=#

fli=#



fli=# \d tlid_smaller
Table "geo.tlid_smaller"
  Column   |  Type   | Modifiers
---+-+---
 tlid  | integer |
 geo_streetname_id | integer |
 geo_streettype_id | integer |
 geo_city_id   | integer |
 zipint| integer |
 tigerfile | integer |
 low   | integer |
 high  | integer |
Indexes:
"tlid_smaller__city" btree (geo_city_id)
"tlid_smaller__street_city" btree (geo_streetname_id, g

Re: [PERFORM] Left Outer Join much faster than non-outer Join?

2005-03-30 Thread Ron Mayer

Setting join_collapse_limit=1 improves my performance dramatically.

Even on a query with only 3 tables.

This surprised me, since there are only 3 tables being joined, I would
have assumed that the optimizer would have done the exhaustive search
and not used geqo stuff - and that this exhaustive search would have
found the good plan.

Any reason it didn't?   Explain analyze results shown below.



On Wed, 30 Mar 2005 [EMAIL PROTECTED] wrote:
> 
> Can anyone please help me make my JOIN find the right index to use?
>

fli=# set join_collapse_limit=1;
SET
fli=# explain analyze
 select *
 from streetname_lookup as sl
 join city_lookup as cl on (true)
 join tlid_smaller as ts on (sl.geo_streetname_id = ts.geo_streetname_id 
and cl.geo_city_id=ts.geo_city_id)
 where  str_name='alamo' and  city='san antonio' and state='TX'
;
fli-# fli-# fli-# fli-# fli-# fli-# 
  QUERY PLAN
  \

---
 Nested Loop  (cost=0.00..16.94 rows=1 width=74) (actual time=0.116..0.528 
rows=78 loops=1)
   ->  Nested Loop  (cost=0.00..9.03 rows=1 width=42) (actual time=0.079..0.086 
rows=1 loops=1)
 ->  Index Scan using streetname_lookup__str_name on streetname_lookup 
sl  (cost=0.00..3.01 rows=1 width=19) (actual time=0.042..0.044 rows=1 loops=1)
   Index Cond: (str_name = 'alamo'::text)
 ->  Index Scan using city_lookup__name on city_lookup cl  
(cost=0.00..6.01 rows=1 width=23) (actual time=0.026..0.028 rows=1 loops=1)
   Index Cond: ((city = 'san antonio'::text) AND (state = 
'TX'::text))
   ->  Index Scan using tlid_smaller__street_city on tlid_smaller ts  
(cost=0.00..7.86 rows=3 width=32) (actual time=0.031..0.181 rows=78 loops=1)
 Index Cond: (("outer".geo_streetname_id = ts.geo_streetname_id) AND 
("outer".geo_city_id = ts.geo_city_id))
 Total runtime: 0.709 ms
(9 rows)


[with the default join_collapse_limit]---
> fli=# explain analyze
>  select *
>  from streetname_lookup as sl
>  join city_lookup as cl on (true)
>  join tlid_smaller as ts on (sl.geo_streetname_id = ts.geo_streetname_id 
> and cl.geo_city_id=ts.geo_city_id)
>  where  str_name='alamo' and  city='san antonio' and state='TX'
> ;
> fli-# fli-# fli-# fli-# fli-# fli-#   
>QUERY PLAN 
>   \
> 
> -
>  Hash Join  (cost=6.01..29209.16 rows=1 width=74) (actual time=9.421..28.154 
> rows=78 loops=1)
>Hash Cond: ("outer".geo_city_id = "inner".geo_city_id)
>->  Nested Loop  (cost=0.00..29202.88 rows=52 width=51) (actual 
> time=0.064..23.296 rows=4151 loops=1)
>  ->  Index Scan using streetname_lookup__str_name on 
> streetname_lookup sl  (cost=0.00..3.01 rows=1 width=19) (actual 
> time=0.025..0.032 rows=1 loops=1)
>Index Cond: (str_name = 'alamo'::text)
>  ->  Index Scan using tlid_smaller__street_zipint on tlid_smaller ts  
> (cost=0.00..28994.70 rows=16413 width=32) (actual time=0.028..8.153 rows=4151 
> loops=1)
>Index Cond: ("outer".geo_streetname_id = ts.geo_streetname_id)
>->  Hash  (cost=6.01..6.01 rows=1 width=23) (actual time=0.073..0.073 
> rows=0 loops=1)
>  ->  Index Scan using city_lookup__name on city_lookup cl  
> (cost=0.00..6.01 rows=1 width=23) (actual time=0.065..0.067 rows=1 loops=1)
>Index Cond: ((city = 'san antonio'::text) AND (state = 
> 'TX'::text))
>  Total runtime: 28.367 ms
> (11 rows)
> 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Reading recommendations

2005-03-30 Thread Steve Wampler
[EMAIL PROTECTED] wrote:

>>Mohan, Ross wrote:
>>
>>>VOIP over BitTorrent?
>>
>>Now *that* I want to see.  Aught to be at least as interesting
>>as the "TCP/IP over carrier pigeon" experiment - and more
>>challenging to boot!
>>
> 
> 
> It was very challenging.  I worked on the credit window sizing and
> retransmission timer estimation algorithms.  We took into account weather
> patterns, size and age of the bird, feeding times, and the average number
> of times a bird circles before determining magnetic north.  Interestingly,
> packet size had little effect in the final algorithms.
> 
> I would love to share them with all of you, but they're classified.

Ah, but VOIPOBT requires many people all saying the same thing at the
same time.  The synchronization alone (since you need to distribute
these people adequately to avoid overloading a trunk line...) is probably
sufficiently hard to make it interesting.  Then there are the problems of
different accents, dilects, and languages ;)

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL on Solaris 8 and ufs

2005-03-30 Thread Brandon Metcalf
I just wanted to follow up and let everyone know that the biggest
improvement in performance came from moving the pg_xlog directory to
another filesystem (different set of disks) separate from the data
directory.

Thanks for the suggestions.

-- 
Brandon

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] fine tuning for logging server

2005-03-30 Thread Florin Andrei
Hardware: relatively modern Intel CPU, OS and database each on its own
IDE hard-drive (separate IDE cables). Enough memory, i think, but i
can't add too much (not beyond 1GB).
Software: Linux-2.6, pgsql-8.0.1

Function: Essentially a logging server. There are two applications (like
syslog) on the same box that are logging to pgsql, each one to its own
database. There are a few tables in one DB, and exactly one table in the
other.
Most of the time, the apps are just doing mindless INSERTs to the DB.
Every now and then, an admin performs some SELECTs via a PHP interface.

Objective: Make the DB as fast as possible. Of course i'd like the
SELECTs to be fast, but the INSERTs take precedence. It's gotta be able
to swallow as many messages per second as possible given the hardware.

Question: What are the pgsql parameters that need to be tweaked? What
are the guidelines for such a situation?

-- 
Florin Andrei

http://florin.myip.org/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] fine tuning for logging server

2005-03-30 Thread John Arbash Meinel
Florin Andrei wrote:
Hardware: relatively modern Intel CPU, OS and database each on its own
IDE hard-drive (separate IDE cables). Enough memory, i think, but i
can't add too much (not beyond 1GB).
Software: Linux-2.6, pgsql-8.0.1
Function: Essentially a logging server. There are two applications (like
syslog) on the same box that are logging to pgsql, each one to its own
database. There are a few tables in one DB, and exactly one table in the
other.
Most of the time, the apps are just doing mindless INSERTs to the DB.
Every now and then, an admin performs some SELECTs via a PHP interface.
Objective: Make the DB as fast as possible. Of course i'd like the
SELECTs to be fast, but the INSERTs take precedence. It's gotta be able
to swallow as many messages per second as possible given the hardware.
Question: What are the pgsql parameters that need to be tweaked? What
are the guidelines for such a situation?

Put pg_xlog onto the same drive as the OS, not the drive with the database.
Do as many inserts per transaction that you can get away with.
100-1000 is pretty good.
Keep the number of indexes and foreign key references low to keep
INSERTS fast.
Keep a few indexes around to keep SELECTs reasonable speedy.
If you are doing lots and lots of logging, need only archival and slow
access for old data, but fast access on new data, consider partitioning
your table, and then using a view to join them back together.
If you are only having a couple processing accessing the db at any given
time, you can probably increase work_mem and maintenance_work_mem a bit.
If you have 1G ram, maybe around 50M for work_mem. But really this is
only if you have 1-3 selects going on at a time.
With 2 disks, and fixed hardware, it's a lot more about configuring your
schema and the application. If you want more performance, adding more
disks is probably the first thing to do.
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] fine tuning for logging server

2005-03-30 Thread Florin Andrei
On Wed, 2005-03-30 at 17:50 -0800, Florin Andrei wrote:

> Function: Essentially a logging server. There are two applications (like
> syslog) on the same box that are logging to pgsql, each one to its own
> database. There are a few tables in one DB, and exactly one table in the
> other.
> Most of the time, the apps are just doing mindless INSERTs to the DB.
> Every now and then, an admin performs some SELECTs via a PHP interface.

For performance reasons, i was thinking to keep the tables append-only,
and simply rotate them out every so often (daily?) and delete those
tables that are too old. Is that a good idea?

-- 
Florin Andrei

http://florin.myip.org/


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] fine tuning for logging server

2005-03-30 Thread John Arbash Meinel
Florin Andrei wrote:
On Wed, 2005-03-30 at 17:50 -0800, Florin Andrei wrote:

Function: Essentially a logging server. There are two applications (like
syslog) on the same box that are logging to pgsql, each one to its own
database. There are a few tables in one DB, and exactly one table in the
other.
Most of the time, the apps are just doing mindless INSERTs to the DB.
Every now and then, an admin performs some SELECTs via a PHP interface.

For performance reasons, i was thinking to keep the tables append-only,
and simply rotate them out every so often (daily?) and delete those
tables that are too old. Is that a good idea?

If you aren't doing updates, then I'm pretty sure the data stays packed
pretty well. I don't know that you need daily rotations, but you
certainly could consider some sort of rotation schedule.
The biggest performance improvement, though, is probably to group
inserts into transactions.
I had an application (in a different db, but it should be relevant),
where using a transaction changed the time from 6min -> 6 sec.
It was just thrashing on all the little inserts that it had to fsync to
disk.
How fast is fast? How many log messages are you expecting? 1/s 100/s 1000/s?
I think the hardware should be capable of the 10-100 range if things are
properly configured. Naturally that depends on all sorts of factors, but
it should give you an idea.
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] fine tuning for logging server

2005-03-30 Thread Florin Andrei
On Wed, 2005-03-30 at 19:59 -0600, John Arbash Meinel wrote:

> Put pg_xlog onto the same drive as the OS, not the drive with the database.

I forgot to mention: the OS drive is purposefully made very slow - the
write cache is turned off and the FS is Ext3 with data=journal. Is then
still ok to put pg_xlog on it?

The reason: if the power cord is yanked, the OS _must_ boot back up in
good condition. If the DB is corrupted, whatever, nuke it then re-
initialize it. But the OS must survive act-of-god events.

No, there is no uninterruptible power supply. It sucks, but that's how
it is. I cannot change that.

-- 
Florin Andrei

http://florin.myip.org/


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] fine tuning for logging server

2005-03-30 Thread Florin Andrei
On Wed, 2005-03-30 at 20:11 -0600, John Arbash Meinel wrote:
> Florin Andrei wrote:
> >
> >For performance reasons, i was thinking to keep the tables append-only,
> >and simply rotate them out every so often (daily?) and delete those
> >tables that are too old. Is that a good idea?
> >
> If you aren't doing updates, then I'm pretty sure the data stays packed
> pretty well. I don't know that you need daily rotations, but you
> certainly could consider some sort of rotation schedule.

(sorry for re-asking, i'm coming from a mysql mindset and i still have a
lot to learn about pgsql)

So, it is indeed a bad idea to delete rows from tables, right? Better
just rotate to preserve the performance.

Daily rotation may simplify the application logic - then i'll know that
each table is one day's worth of data.

> The biggest performance improvement, though, is probably to group
> inserts into transactions.

Yes, i know that. I have little control over the apps, though. I'll see
what i can do.

> How fast is fast? How many log messages are you expecting? 1/s 100/s 1000/s?

More is better. 
I guess i'll put it together and give it a spin and see just how far it
goes.

I actually have some controls over the data that's being sent (in some
places i can limit the number of events/second), so that might save me
right there.

-- 
Florin Andrei

http://florin.myip.org/


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] fine tuning for logging server

2005-03-30 Thread John Arbash Meinel
Florin Andrei wrote:
On Wed, 2005-03-30 at 19:59 -0600, John Arbash Meinel wrote:

Put pg_xlog onto the same drive as the OS, not the drive with the database.

I forgot to mention: the OS drive is purposefully made very slow - the
write cache is turned off and the FS is Ext3 with data=journal. Is then
still ok to put pg_xlog on it?
The reason: if the power cord is yanked, the OS _must_ boot back up in
good condition. If the DB is corrupted, whatever, nuke it then re-
initialize it. But the OS must survive act-of-god events.
No, there is no uninterruptible power supply. It sucks, but that's how
it is. I cannot change that.

You don't want write cache for pg_xlog either. And you could always
create a second partition that used reiserfs, or something like that.
If you have to survine "act-of-god" you probably should consider making
the system into a RAID1 instead of 2 separate drives (software RAID
should be fine).
'Cause a much worse act-of-god is having a drive crash. No matter what
you do in software, a failed platter will prevent you from booting. RAID
1 at least means 2 drives have to die.
If you need insert speed, and can't do custom transactions at the
application side, you could try creating a RAM disk for the insert
table, and then create a cron job that bulk pulls it out of that table
and inserts it into the rest of the system. That should let you get a
super-fast insert speed, and the bulk copies should stay reasonably fast.
Just realize that if your cron job stops running, your machine will
slowly eat up all of it's ram, and really not play nice. I think adding
an extra hard-drive is probably the best way to boost performance and
reliability, but if you have a $0 budget, this is a possibility.
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] fine tuning for logging server

2005-03-30 Thread John Arbash Meinel
Florin Andrei wrote:
On Wed, 2005-03-30 at 20:11 -0600, John Arbash Meinel wrote:

Florin Andrei wrote:

For performance reasons, i was thinking to keep the tables append-only,
and simply rotate them out every so often (daily?) and delete those
tables that are too old. Is that a good idea?

If you aren't doing updates, then I'm pretty sure the data stays packed
pretty well. I don't know that you need daily rotations, but you
certainly could consider some sort of rotation schedule.

(sorry for re-asking, i'm coming from a mysql mindset and i still have a
lot to learn about pgsql)
So, it is indeed a bad idea to delete rows from tables, right? Better
just rotate to preserve the performance.

The only problems are if you get a lot of old tuples in places you don't
want them. If you are always appending new values that are increasing,
and you are deleting from the other side, I think vacuum will do a fine
job at cleaning up. It's deleting/updating every 3rd entry that starts
to cause holes (though probably vacuum still does a pretty good job).
Daily rotation may simplify the application logic - then i'll know that
each table is one day's worth of data.

I don't think it is necessary, but if you like it, go for it. I would
tend to think that you would want a "today" table, and a "everything
else" table, as it simplifies your queries, and lets you have foreign
keys (though if you are from mysql, you may not be used to using them.)
The biggest performance improvement, though, is probably to group
inserts into transactions.

Yes, i know that. I have little control over the apps, though. I'll see
what i can do.

You could always add a layer inbetween. Or look at my mention of a fast
temp table, with a periodic cron job to pull in the new data. You can
run cron as fast as 1/min which might be just right depending on your needs.
It also means that you could ignore foreign keys and indexes on the temp
table, and only evaluate them on the main table.

How fast is fast? How many log messages are you expecting? 1/s 100/s 1000/s?

More is better. 
I guess i'll put it together and give it a spin and see just how far it
goes.
I actually have some controls over the data that's being sent (in some
places i can limit the number of events/second), so that might save me
right there.

Good luck. And remember, tuning your queries can be just as important.
(Though if you are doing append only inserts, there probably isn't much
that you can do).
If all you are doing is append only logging, the fastest thing is
probably just a flat file. You could have something that comes along
later to move it into the database. It doesn't really sound like you are
using any features a database provides. (normalization, foreign keys,
indexes, etc.)
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Left Outer Join much faster than non-outer Join?

2005-03-30 Thread Tom Lane
[EMAIL PROTECTED] writes:
>  select *
>  from streetname_lookup as sl
>  join city_lookup as cl on (true)
>  left outer join tlid_smaller as ts on (sl.geo_streetname_id = 
> ts.geo_streetname_id and cl.geo_city_id=ts.geo_city_id)
>  where  str_name='alamo' and  city='san antonio' and state='TX'
> ;

That's a fairly odd query; why don't you have any join condition between
streetname_lookup and city_lookup?

The planner won't consider Cartesian joins unless forced to, which is
why it fails to consider the join order "((sl join cl) join ts)" unless
you have an outer join in the mix.  I think that's generally a good
heuristic, and am disinclined to remove it ...

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Left Outer Join much faster than non-outer Join?

2005-03-30 Thread Ron Mayer
Tom Lane wrote:
[EMAIL PROTECTED] writes:
select *
from streetname_lookup as sl
join city_lookup as cl on (true)
left outer join tlid_smaller as ts on (sl.geo_streetname_id = 
ts.geo_streetname_id and cl.geo_city_id=ts.geo_city_id)
where  str_name='alamo' and  city='san antonio' and state='TX'
;
That's a fairly odd query; 
I think it's a very common type of query in data warehousing.
It's reasonably typical of a traditional star schema where
"streetname_lookup" and "city_lookup" are dimension tables
and "tlid_smaller" is the central fact table.
why don't you have any join condition between
streetname_lookup and city_lookup?
Those two tables shared no data.   They merely get the "id"s
for looking things up in the much larger central table.
Unique indexes on the city_lookup and street_lookup make the
cartesian join harmless (they each return only 1 value); and
the huge fact table has a multi-column index that takes both
of the ids from those lookups.
With the tables I have (shown below), how else could one
efficiently fetch the data for "Main St" "San Francisco"?
  streetname_lookup
  (for every street name used in the country)
  streetid  |  name  | type
  --++--
  1 |  Main  | St
  2 |  1st   | St
  city_lookup
  (for every city name used in the country)
  cityid  |  name   | state
  +-+--
  1   |  Boston | MA
  2   |  Alameda| CA
  tlid_smaller
  (containing a record for every city block in the country)
   city_id | street_id  | addresses | demographics, etc.
   ++---+--
1  |1   | 100 block | [lots of columns]
1  |1   | 200 block | [lots of columns]
1  |1   | 300 block | [lots of columns]
1  |2   | 100 block | [lots of columns]
1  |2   | 100 block | [lots of columns]
The planner won't consider Cartesian joins unless forced to, which is
why it fails to consider the join order "((sl join cl) join ts)" unless
you have an outer join in the mix.  I think that's generally a good
heuristic, and am disinclined to remove it ...
IMHO it's a shame it doesn't even consider it when the estimated
results are very small.   I think often joins that merely look up
IDs would be useful to consider for the purpose of making potential
multi-column indexes (as shown in the previous email's explain
analyze result where the cartesian join was 30X faster than the
other approach since it could use the multi-column index on the
very large table).
Ron
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Reading recommendations

2005-03-30 Thread Mohan, Ross
I can see that PG'ers have a wicked sense of humor. 

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steve Wampler
Sent: Wednesday, March 30, 2005 3:58 PM
To: [EMAIL PROTECTED]
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Reading recommendations


[EMAIL PROTECTED] wrote:

>>Mohan, Ross wrote:
>>
>>>VOIP over BitTorrent?
>>
>>Now *that* I want to see.  Aught to be at least as interesting as the 
>>"TCP/IP over carrier pigeon" experiment - and more challenging to 
>>boot!
>>
> 
> 
> It was very challenging.  I worked on the credit window sizing and 
> retransmission timer estimation algorithms.  We took into account 
> weather patterns, size and age of the bird, feeding times, and the 
> average number of times a bird circles before determining magnetic 
> north.  Interestingly, packet size had little effect in the final 
> algorithms.
> 
> I would love to share them with all of you, but they're classified.

Ah, but VOIPOBT requires many people all saying the same thing at the same 
time.  The synchronization alone (since you need to distribute these people 
adequately to avoid overloading a trunk line...) is probably sufficiently hard 
to make it interesting.  Then there are the problems of different accents, 
dilects, and languages ;)

-- 
Steve Wampler -- [EMAIL PROTECTED]
The gods that smiled on your birth are now laughing out loud.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster