[PERFORM] help on explain analyse in psql 7.1.3 (linux)

2005-04-07 Thread S.Thanga Prakash
hi,

I am using psql 7.1.3

I didn't find option analyse in explain command..

how to get time taken by SQL procedure/query?

regards,
stp..

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] help on explain analyse in psql 7.1.3 (linux)

2005-04-07 Thread Christopher Kings-Lynne
I didn't find option analyse in explain command..
how to get time taken by SQL procedure/query?
Explain analyze was added in 7.2 - you really need to upgrade...
You can use \timing in psql to get an approximation...
Chris
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] help on time calculation

2005-04-07 Thread Christopher Kings-Lynne
   how to find the time taken by an query/stored procedure?
In psql, use \timing for an approximate time.
I am using psql 7.1.3 in linux 7.2
how to execute 'explain analyse' in the psql? Is it supported at 7.1.3 ?
Explain analyze is NOT supported in PostgreSQL 7.1.  You really should 
upgrade your PostgreSQL to version 8.0.

Chris
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] help on time calculation

2005-04-07 Thread S.Thanga Prakash
hi,

   how to find the time taken by an query/stored procedure?

I am using psql 7.1.3 in linux 7.2

how to execute 'explain analyse' in the psql? Is it supported at 7.1.3 ?


looking forward for replies..
regards,
stp.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] help on explain analyse in psql 7.1.3 (linux)

2005-04-07 Thread S.Thanga Prakash
hi,

thanks for immediate response..

regards,
stp..

On Thu, 7 Apr 2005, Christopher Kings-Lynne wrote:

> > I didn't find option analyse in explain command..
> > 
> > how to get time taken by SQL procedure/query?
> 
> Explain analyze was added in 7.2 - you really need to upgrade...
> 
> You can use \timing in psql to get an approximation...
> 
> Chris
> 
> ---(end of broadcast)---
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
> 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] How to improve db performance with $7K?

2005-04-07 Thread Douglas J. Trainor
A good one page discussion on the future of SCSI and SATA can
be found in the latest CHIPS (The Department of the Navy Information
Technology Magazine, formerly CHIPS AHOY) in an article by
Patrick G.  Koehler and Lt. Cmdr. Stan Bush.
Click below if you don't mind being logged visiting Space and Naval
Warfare Systems Center Charleston:
http://www.chips.navy.mil/archives/05_Jan/web_pages/scuzzy.htm
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] COPY Hacks (WAS: RE: Postgresql vs SQLserver for this application ?)

2005-04-07 Thread Harald Fuchs
In article <[EMAIL PROTECTED]>,
Mischa <[EMAIL PROTECTED]> writes:

> This thread seems to be focusing in on COPY efficiency,
> I'd like to ask something I got no answer to, a few months ago.

> Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> I accidentally strung together several \n-terminated input lines,
> and sent them to the server with a single "putline".

> To my (happy) surprise, I ended up with exactly that number of rows
> in the target table.

> Is this a bug? Is this fundamental to the protocol?

> Since it hasn't been documented (but then, "endcopy" isn't documented),
> I've been shy of investing in perf testing such mass copy calls.
> But, if it DOES work, it should be reducing the number of network 
> roundtrips.

> So. Is it a feechur? Worth stress-testing? Could be VERY cool.

Using COPY from DBD::Pg _is_ documented - presumed you use DBD::Pg
version 1.41 released just today.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] 4 way JOIN using aliases

2005-04-07 Thread Keith Worthington
Hi All,

Thanks to all on the NOVICE list that gave me help I now have a query running
that returns the results I am after. :-)  Now of course I want it to run
faster.  Currently it clocks in at ~160ms.  I have checked over the indexes
and I belive that the tables are indexed properly.  The largest table,
tbl_item, only has 2000 rows.  Is it possible to reduce the time of this query
further?  I have included the output of EXPLAIN ANALYZE below the query. 
Unfortunately I am still struggling trying to learn how to interpret the
output.  TIA

 SELECT tbl_item.id AS item_id,
tbl_item.item_type,
tbl_item.inactive AS item_inactive,
tbl_item.description AS item_description,
CASE WHEN tbl_item.item_class=0 THEN 'Non-Stock'
 WHEN tbl_item.item_class=1 THEN 'Stock'
 WHEN tbl_item.item_class=2 THEN 'Description'
 WHEN tbl_item.item_class=3 THEN 'Assembly'
 WHEN tbl_item.item_class=4 THEN 'Service'
 WHEN tbl_item.item_class=5 THEN 'Labor'
 WHEN tbl_item.item_class=6 THEN 'Activity'
 WHEN tbl_item.item_class=7 THEN 'Charge'
 ELSE 'Unrecognized'
END AS item_class,
tbl_item.sales_gl_account AS acct_sales_gl_nmbr,
sales_desc.description AS acct_sales_gl_name,
tbl_item.inventory_gl_account AS acct_inv_gl_nmbr,
inv_desc.description AS acct_inv_gl_name,
tbl_item.cogs_gl_account  AS acct_cogs_gl_nmbr,
cogs_desc.description AS acct_cogs_gl_name,
CASE WHEN tbl_item.costing_method=0 THEN 'Average'
 WHEN tbl_item.costing_method=1 THEN 'FIFO'
 WHEN tbl_item.costing_method=2 THEN 'LIFO'
 ELSE 'Unrecognized'
END AS acct_cost_method,
tbl_mesh.mesh_size,
tbl_mesh.unit_of_measure AS mesh_uom,
tbl_mesh.mesh_type,
tbl_item.purchase_description,
tbl_item.last_unit_cost AS purchase_unit_cost,
tbl_item.purchase_uom AS purchase_uom,
tbl_item.reorder_point AS purchase_point,
tbl_item.reorder_quantity AS purchase_quantity,
tbl_item.sales_description,
tbl_item.last_unit_cost/peachtree.tbl_item.ptos_uom_factor AS
   sales_unit_cost,
tbl_item.unit_of_measure AS sales_uom,
tbl_item.weight AS sales_weight,
tbl_current.last_count
   + tbl_current.received
   - tbl_current.shipped AS inv_on_hand,
tbl_current.allocated AS inv_committed,
tbl_current.last_count
   + tbl_current.received
   - tbl_current.shipped
   - tbl_current.allocated AS inv_available,
tbl_current.on_order AS inv_on_order
   FROM tbl_item
   LEFT JOIN tbl_mesh
 ON ( tbl_item.id = tbl_mesh.item_id )
   JOIN tbl_gl_account AS sales_desc
 ON ( tbl_item.sales_gl_account = sales_desc.account_id )
   JOIN tbl_gl_account AS inv_desc
 ON ( tbl_item.inventory_gl_account = inv_desc.account_id )
   JOIN tbl_gl_account AS cogs_desc
 ON ( tbl_item.cogs_gl_account = cogs_desc.account_id )
   LEFT JOIN tbl_current
 ON ( tbl_item.id = tbl_current.item_id )
  ORDER BY tbl_item.id;


Sort  (cost=5749.75..5758.98 rows=3691 width=333) (actual
time=154.923..156.070 rows=1906 loops=1)
  Sort Key: tbl_item.id
  ->  Hash Left Join  (cost=2542.56..5194.32 rows=3691 width=333) (actual
time=30.475..146.074 rows=1906 loops=1)
Hash Cond: (("outer".id)::text = ("inner".item_id)::text)
->  Hash Join  (cost=15.85..366.14 rows=3691 width=313) (actual
time=2.292..82.281 rows=1906 loops=1)
  Hash Cond: (("outer".sales_gl_account)::text =
("inner".account_id)::text)
  ->  Hash Join  (cost=11.18..305.81 rows=3749 width=290) (actual
time=1.632..61.052 rows=1906 loops=1)
Hash Cond: (("outer".cogs_gl_account)::text =
("inner".account_id)::text)
->  Hash Join  (cost=6.50..244.60 rows=3808 width=267)
(actual time=1.034..40.873 rows=1906 loops=1)
  Hash Cond: (("outer".inventory_gl_account)::text =
("inner".account_id)::text)
  ->  Hash Left Join  (cost=1.82..182.50 rows=3868
width=244) (actual time=0.407..20.878 rows=1936 loops=1)
Hash Cond: (("outer".id)::text =
("inner".item_id)::text)
->  Seq Scan on tbl_item  (cost=0.00..160.68
rows=3868 width=224) (actual time=0.131..5.022 rows=1936 loops=1)
->  Hash  (cost=1.66..1.66 rows=66 width=34)
(actual time=0.236..0.236 rows=0 loops=1)
  ->  Seq Scan on tbl_mesh 
(cost=0.00..1.66 rows=66 width=34) (actual time=0.031..0.149 rows=66 loops=1)
  ->  Hash  (cost=4.14..4.14 rows=214 width=32)
(actual time=0.573..0.573 rows=0 loops=1)
->  Seq Scan on tbl_gl_account inv_desc 
(cost=0.00..4.14 rows=214 width=32) (actual time=0.005..0.317 rows=214 loops=

Re: [PERFORM] Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)

2005-04-07 Thread Bruno Wolff III
On Wed, Apr 06, 2005 at 18:09:37 -0400,
  Tom Lane <[EMAIL PROTECTED]> wrote:
> 
> Can anyone suggest a more general rule?  Do we need for example to
> consider whether the relation membership is the same in two clauses
> that might be opposite sides of a range restriction?  It seems like
> 
>   a.x > b.y AND a.x < b.z
> 
> probably can be treated as a range restriction on a.x for this purpose,
> but I'm much less sure that the same is true of
> 
>   a.x > b.y AND a.x < c.z
> 
> Thoughts?

I think it makes sense to guess that a smaller fraction of the rows will
be returned when a column value is bounded above and below than if it
is only bounded on one side, even if the bounds aren't fixed. You can
certainly be wrong. The difference between this and the normal case is that
column statistics aren't normally going to be that useful.

If date/time ranges are the common use for this construct, it might be better
to create date and/or time range types that use rtree or gist indexes.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)

2005-04-07 Thread Tom Lane
Bruno Wolff III <[EMAIL PROTECTED]> writes:
>   Tom Lane <[EMAIL PROTECTED]> wrote:
>> Can anyone suggest a more general rule?

> I think it makes sense to guess that a smaller fraction of the rows will
> be returned when a column value is bounded above and below than if it
> is only bounded on one side, even if the bounds aren't fixed. You can
> certainly be wrong.

Yeah, the whole thing is only a heuristic anyway.  I've been coming
around to the view that relation membership shouldn't matter, because
of cases like

WHERE a.x > b.y AND a.x < 42

which surely should be taken as a range constraint.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] Any way to speed this up?

2005-04-07 Thread Joel Fradkin








Running this explain on windows box, but production on linux
both 8.0.1

The MSSQL is beating me out for some reason on this query.

The linux box is much more powerful, I may have to increase
the cache, but I am pretty sure its not an issue yet.

It has 8 gig internal memory any recommendation on the cache
size to use?

 

explain analyze select * from viwassoclist where clientnum =
'SAKS'

 

"Merge Join  (cost=59871.79..60855.42 rows=7934
width=112) (actual time=46906.000..48217.000 rows=159959 loops=1)"

"  Merge Cond: ("outer".locationid =
"inner".locationid)"

"  ->  Sort  (cost=393.76..394.61
rows=338 width=48) (actual time=62.000..62.000 rows=441 loops=1)"

"    Sort Key: l.locationid"

"    -> 
Index Scan using ix_location on tbllocation l  (cost=0.00..379.56 rows=338
width=48) (actual time=15.000..62.000 rows=441 loops=1)"

" 
Index Cond: ('SAKS'::text = (clientnum)::text)"

"  ->  Sort  (cost=59478.03..59909.58
rows=172618 width=75) (actual time=46844.000..46985.000 rows=159960
loops=1)"

"    Sort Key: a.locationid"

"    -> 
Merge Right Join  (cost=0.00..39739.84 rows=172618 width=75) (actual
time=250.000..43657.000 rows=176431 loops=1)"

" 
Merge Cond: ((("outer".clientnum)::text = ("inner".clientnum)::text)
AND ("outer".id = "inner".jobtitleid))"

" 
->  Index Scan using ix_tbljobtitle_id on tbljobtitle jt 
(cost=0.00..194.63 rows=6391 width=37) (actual time=32.000..313.000 rows=5689
loops=1)"

"   
Filter: (1 = presentationid)"

" 
->  Index Scan using ix_tblassoc_jobtitleid on tblassociate a 
(cost=0.00..38218.08 rows=172618 width=53) (actual time=31.000..41876.000 rows=176431
loops=1)"

"   
Index Cond: ((clientnum)::text = 'SAKS'::text)"

"Total runtime: 48500.000 ms"

 

CREATE OR REPLACE VIEW viwassoclist AS 

 SELECT a.clientnum, a.associateid, a.associatenum, a.lastname,
a.firstname, jt.value AS jobtitle, l.name AS "location", l.locationid
AS mainlocationid, l.divisionid, l.regionid, l.districtid, (a.lastname::text ||
', '::text) || a.firstname::text AS assocname, a.isactive, a.isdeleted

   FROM tblassociate a

   LEFT JOIN tbljobtitle jt ON a.jobtitleid = jt.id
AND jt.clientnum::text = a.clientnum::text AND 1 = jt.presentationid

   JOIN tbllocation l ON a.locationid = l.locationid
AND l.clientnum::text = a.clientnum::text;

 

 

CREATE TABLE tblassociate

(

  clientnum varchar(16) NOT NULL,

  associateid int4 NOT NULL,

  associatenum varchar(10),

  firstname varchar(50),

  middleinit varchar(5),

  lastname varchar(50),

  ssn varchar(18),

  dob timestamp,

  address varchar(100),

  city varchar(50),

  state varchar(50),

  country varchar(50),

  zip varchar(10),

  homephone varchar(14),

  cellphone varchar(14),

  pager varchar(14),

  associateaccount varchar(50),

  doh timestamp,

  dot timestamp,

  rehiredate timestamp,

  lastdayworked timestamp,

  staffexecid int4,

  jobtitleid int4,

  locationid int4,

  deptid int4,

  positionnum int4,

  worktypeid int4,

  sexid int4,

  maritalstatusid int4,

  ethnicityid int4,

  weight float8,

  heightfeet int4,

  heightinches int4,

  haircolorid int4,

  eyecolorid int4,

  isonalarmlist bool NOT NULL DEFAULT false,

  isactive bool NOT NULL DEFAULT true,

  ismanager bool NOT NULL DEFAULT false,

  issecurity bool NOT NULL DEFAULT false,

  createdbyid int4,

  isdeleted bool NOT NULL DEFAULT false,

  militarybranchid int4,

  militarystatusid int4,

  patrontypeid int4,

  identificationtypeid int4,

  workaddress varchar(200),

  testtypeid int4,

  testscore int4,

  pin int4,

  county varchar(50),

  CONSTRAINT pk_tblassociate PRIMARY KEY (clientnum, associateid),

  CONSTRAINT ix_tblassociate UNIQUE (clientnum, associatenum)

)

 

CREATE TABLE tbljobtitle

(

  clientnum varchar(16) NOT NULL,

  id int4 NOT NULL,

  value varchar(50),

  code varchar(16),

  isdeleted bool DEFAULT false,

  presentationid int4 NOT NULL DEFAULT 1,

  CONSTRAINT pk_tbljobtitle PRIMARY KEY (clientnum, id,
presentationid)

)

 

CREATE TABLE tbllocation

(

  clientnum varchar(16) NOT NULL,

  locationid int4 NOT NULL,

  districtid int4 NOT NULL,

  regionid int4 NOT NULL,

  divisionid int4 NOT NULL,

  locationnum varchar(8),

  name varchar(50),

  clientlocnum varchar(50),

  address varchar(100),

  address2 varchar(100),

  city varchar(50),

  state varchar(2) NOT NULL DEFAULT 'zz'::character
varying,

  zip varchar(10),

  countryid int4,

  phone varchar(15),

  fax varchar(15),

  payname varchar(40),

  contact char(36),

  active bool NOT NULL DEFAULT true,

  coiprogram text,

  coilimit text,

  coiuser varchar(255),

  coidatetime varchar(32),

  ec_note_field varchar(1050),

  locationtypeid int4,

  open_time timestamp,

  close_time timestamp,

  insurance_loc_id varchar(50),

  lpregionid int4,

  sic int4,

  CONSTRAINT pk_tbllocation PRIMARY KEY (clientnum, locationid),

  CONSTRAINT ix_tbllocation_1

[PERFORM] Building postmaster with Profiling Support WAS "Tweaking a C Function I wrote"

2005-04-07 Thread Mohan, Ross
Adam - 

Is compiling postmaster with profiling support just a flag
in the build/make? Or is there something more involved? 

I'd like to be able to do this in the future and so am
curious about means/methods. 

If this is a RTFM, just let me know that (am currently 
Reading The F Manual), but if you have any "special sauce"
here, that'd be of great interest. 

Thanks

-Ross

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adam Palmblad
Sent: Wednesday, April 06, 2005 7:23 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Tweaking a C Function I wrote


I wanted to see if I could squeeze any more performance out of a C set 
returning function I wrote.  As such, I looked to a profiler.  Is it 
possible to get profile information on the function I wrote?  I've got 
postmaster and my function compiled with profiling support, and can find 
the gmon.out files... can I actually look at the call tree that occurs 
when my function is being executed or will I be limited to viewing calls 
to functions in the postmaster binary?

-Adam


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread Keith Worthington
On Thu, 07 Apr 2005 11:13:57 -0400, Joel Fradkin wrote
[snip]
> "  ->  Sort  (cost=393.76..394.61 rows=338 width=48) (actual
> time=62.000..62.000 rows=441 loops=1)"
> 
> "Sort Key: l.locationid"
> 
> "->  Index Scan using ix_location on tbllocation l
> 
> (cost=0.00..379.56 rows=338 width=48) (actual time=15.000..62.000 rows=441
> loops=1)"
> 
> "  Index Cond: ('SAKS'::text = (clientnum)::text)"
> 
> "  ->  Sort  (cost=59478.03..59909.58 rows=172618 width=75) (actual
> time=46844.000..46985.000 rows=159960 loops=1)"
> 
> "Sort Key: a.locationid"
[snip]
> 
> CREATE TABLE tblassociate
[snip]
> 
>   CONSTRAINT pk_tblassociate PRIMARY KEY (clientnum, associateid),
> 
>   CONSTRAINT ix_tblassociate UNIQUE (clientnum, associatenum)
> 
[snip]
> 
> Joel Fradkin

Joel,

I am REALLY new at this and struggling to understand EXPLAIN ANALYZE output
but for what it is worth it looks like the sort on a.locationid is taking up a
lot of the time.  I do not see an index on that column.  I would suggest
indexing tblassociate.locationid and seeing if that helps.

Kind Regards,
Keith

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] How to improve db performance with $7K?

2005-04-07 Thread Richard_D_Levine
Another simple question: Why is SCSI more expensive?  After the
eleventy-millionth controller is made, it seems like SCSI and SATA are
using a controller board and a spinning disk.  Is somebody still making
money by licensing SCSI technology?

Rick

[EMAIL PROTECTED] wrote on 04/06/2005 11:58:33 PM:

> You asked for it!  ;-)
>
> If you want cheap, get SATA.  If you want fast under
> *load* conditions, get SCSI.  Everything else at this
> time is marketing hype, either intentional or learned.
> Ignoring dollars, expect to see SCSI beat SATA by 40%.
>
>  * * * What I tell you three times is true * * *
>
> Also, compare the warranty you get with any SATA
> drive with any SCSI drive.  Yes, you still have some
> change leftover to buy more SATA drives when they
> fail, but... it fundamentally comes down to some
> actual implementation and not what is printed on
> the cardboard box.  Disk systems are bound by the
> rules of queueing theory.  You can hit the sales rep
> over the head with your queueing theory book.
>
> Ultra320 SCSI is king of the hill for high concurrency
> databases.  If you're only streaming or serving files,
> save some money and get a bunch of SATA drives.
> But if you're reading/writing all over the disk, the
> simple first-come-first-serve SATA heuristic will
> hose your performance under load conditions.
>
> Next year, they will *try* bring out some SATA cards
> that improve on first-come-first-serve, but they ain't
> here now.  There are a lot of rigged performance tests
> out there...  Maybe by the time they fix the queueing
> problems, serial Attached SCSI (a/k/a SAS) will be out.
> Looks like Ultra320 is the end of the line for parallel
> SCSI, as Ultra640 SCSI (a/k/a SPI-5) is dead in the
> water.
>
> Ultra320 SCSI.
> Ultra320 SCSI.
> Ultra320 SCSI.
>
> Serial Attached SCSI.
> Serial Attached SCSI.
> Serial Attached SCSI.
>
> For future trends, see:
> http://www.incits.org/archive/2003/in031163/in031163.htm
>
> douglas
>
> p.s. For extra credit, try comparing SATA and SCSI drives
> when they're 90% full.
>
> On Apr 6, 2005, at 8:32 PM, Alex Turner wrote:
>
> > I guess I'm setting myself up here, and I'm really not being ignorant,
> > but can someone explain exactly how is SCSI is supposed to better than
> > SATA?
> >
> > Both systems use drives with platters.  Each drive can physically only
> > read one thing at a time.
> >
> > SATA gives each drive it's own channel, but you have to share in SCSI.
> >  A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but
> > SCSI can only do 320MB/sec across the entire array.
> >
> > What am I missing here?
> >
> > Alex Turner
> > netEconomist
>
>
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if
your
>   joining column's datatypes do not match


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread John Arbash Meinel
Joel Fradkin wrote:
Running this explain on windows box, but production on linux both 8.0.1
The MSSQL is beating me out for some reason on this query.
The linux box is much more powerful, I may have to increase the cache,
but I am pretty sure its not an issue yet.
It has 8 gig internal memory any recommendation on the cache size to use?

explain analyze select * from viwassoclist where clientnum = 'SAKS'

"Merge Join  (cost=59871.79..60855.42 rows=7934 width=112) (actual
time=46906.000..48217.000 rows=159959 loops=1)"
The first thing I noticed was this. Notice that the estimated rows is
8k, the actual rows is 160k. Which means the planner is mis-estimating
the selectivity of your merge.
"  ->  Sort  (cost=59478.03..59909.58 rows=172618 width=75) (actual
time=46844.000..46985.000 rows=159960 loops=1)"
"Sort Key: a.locationid"
This sort actually isn't taking very long. It starts at 46800 and runs
until 47000 so it takes < 1 second.
"->  Merge Right Join  (cost=0.00..39739.84 rows=172618
width=75) (actual time=250.000..43657.000 rows=176431 loops=1)"
"  Merge Cond: ((("outer".clientnum)::text =
("inner".clientnum)::text) AND ("outer".id = "inner".jobtitleid))"
"  ->  Index Scan using ix_tbljobtitle_id on tbljobtitle
jt  (cost=0.00..194.63 rows=6391 width=37) (actual
time=32.000..313.000 rows=5689 loops=1)"
"Filter: (1 = presentationid)"
"  ->  Index Scan using ix_tblassoc_jobtitleid on
tblassociate a  (cost=0.00..38218.08 rows=172618 width=53) (actual
time=31.000..41876.000 rows=176431 loops=1)"
"Index Cond: ((clientnum)::text = 'SAKS'::text)"
This is where the actual expense is. The merge right join starts at 250,
and runs until 43000. Which seems to be caused primarily by the index
scan of tblassociate. How many rows are in tblassociate? I'm assuming
quite a bit, since the planner thinks an index scan is faster than seq
scan for 170k rows. (If you have > 2M this is probably accurate)
I don't really know how long this should take, but 38s for 172k rows
seems a little long.
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] help on explain analyse in psql 7.1.3 (linux)

2005-04-07 Thread Tom Lane
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
>> I didn't find option analyse in explain command..
>> 
>> how to get time taken by SQL procedure/query?

> Explain analyze was added in 7.2 - you really need to upgrade...

> You can use \timing in psql to get an approximation...

7.1 psql hasn't got \timing either ...

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] How to improve db performance with $7K?

2005-04-07 Thread Alex Turner
Based on the reading I'm doing, and somebody please correct me if I'm
wrong, it seems that SCSI drives contain an on disk controller that
has to process the tagged queue.  SATA-I doesn't have this.  This
additional controller, is basicaly an on board computer that figures
out the best order in which to process commands.  I believe you are
also paying for the increased tolerance that generates a better speed.
 If you compare an 80Gig 7200RPM IDE drive to a WD Raptor 76G 10k RPM
to a Seagate 10k.6 drive to a Seagate Cheatah 15k drive, each one
represents a step up in parts and technology, thereby generating a
cost increase (at least thats what the manufactures tell us).  I know
if you ever held a 15k drive in your hand, you can notice a
considerable weight difference between it and a 7200RPM IDE drive.

Alex Turner
netEconomist

On Apr 7, 2005 11:37 AM, [EMAIL PROTECTED]
<[EMAIL PROTECTED]> wrote:
> Another simple question: Why is SCSI more expensive?  After the
> eleventy-millionth controller is made, it seems like SCSI and SATA are
> using a controller board and a spinning disk.  Is somebody still making
> money by licensing SCSI technology?
> 
> Rick
> 
> [EMAIL PROTECTED] wrote on 04/06/2005 11:58:33 PM:
> 
> > You asked for it!  ;-)
> >
> > If you want cheap, get SATA.  If you want fast under
> > *load* conditions, get SCSI.  Everything else at this
> > time is marketing hype, either intentional or learned.
> > Ignoring dollars, expect to see SCSI beat SATA by 40%.
> >
> >  * * * What I tell you three times is true * * *
> >
> > Also, compare the warranty you get with any SATA
> > drive with any SCSI drive.  Yes, you still have some
> > change leftover to buy more SATA drives when they
> > fail, but... it fundamentally comes down to some
> > actual implementation and not what is printed on
> > the cardboard box.  Disk systems are bound by the
> > rules of queueing theory.  You can hit the sales rep
> > over the head with your queueing theory book.
> >
> > Ultra320 SCSI is king of the hill for high concurrency
> > databases.  If you're only streaming or serving files,
> > save some money and get a bunch of SATA drives.
> > But if you're reading/writing all over the disk, the
> > simple first-come-first-serve SATA heuristic will
> > hose your performance under load conditions.
> >
> > Next year, they will *try* bring out some SATA cards
> > that improve on first-come-first-serve, but they ain't
> > here now.  There are a lot of rigged performance tests
> > out there...  Maybe by the time they fix the queueing
> > problems, serial Attached SCSI (a/k/a SAS) will be out.
> > Looks like Ultra320 is the end of the line for parallel
> > SCSI, as Ultra640 SCSI (a/k/a SPI-5) is dead in the
> > water.
> >
> > Ultra320 SCSI.
> > Ultra320 SCSI.
> > Ultra320 SCSI.
> >
> > Serial Attached SCSI.
> > Serial Attached SCSI.
> > Serial Attached SCSI.
> >
> > For future trends, see:
> > http://www.incits.org/archive/2003/in031163/in031163.htm
> >
> > douglas
> >
> > p.s. For extra credit, try comparing SATA and SCSI drives
> > when they're 90% full.
> >
> > On Apr 6, 2005, at 8:32 PM, Alex Turner wrote:
> >
> > > I guess I'm setting myself up here, and I'm really not being ignorant,
> > > but can someone explain exactly how is SCSI is supposed to better than
> > > SATA?
> > >
> > > Both systems use drives with platters.  Each drive can physically only
> > > read one thing at a time.
> > >
> > > SATA gives each drive it's own channel, but you have to share in SCSI.
> > >  A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but
> > > SCSI can only do 320MB/sec across the entire array.
> > >
> > > What am I missing here?
> > >
> > > Alex Turner
> > > netEconomist
> >
> >
> > ---(end of broadcast)---
> > TIP 9: the planner will ignore your desire to choose an index scan if
> your
> >   joining column's datatypes do not match
> 
>

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Building postmaster with Profiling Support WAS "Tweaking a C Function I wrote"

2005-04-07 Thread Tom Lane
"Mohan, Ross" <[EMAIL PROTECTED]> writes:
> Is compiling postmaster with profiling support just a flag
> in the build/make? Or is there something more involved? 

cd .../src/backend
make PROFILE="-pg -DLINUX_PROFILE" all
reinstall binary

You don't need -DLINUX_PROFILE if not on Linux, of course.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread Tom Lane
"Joel Fradkin" <[EMAIL PROTECTED]> writes:
> Running this explain on windows box, but production on linux both 8.0.1

Are you using any nondefault optimizer settings?  The vast bulk of the
time is going into the indexscan on tblassociate (almost 42 out of the
48 seconds), and I'm a bit surprised it didn't choose a seqscan and sort
instead.  Or even more likely, forget the merge joins altogether and use
hash joins --- the other tables are plenty small enough to fit in hash
tables.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread Tom Lane
John Arbash Meinel <[EMAIL PROTECTED]> writes:
>> "  ->  Sort  (cost=59478.03..59909.58 rows=172618 width=75) (actual
>> time=46844.000..46985.000 rows=159960 loops=1)"
>> 
>> "Sort Key: a.locationid"
>> 

> This sort actually isn't taking very long. It starts at 46800 and runs
> until 47000 so it takes < 1 second.

>> "->  Merge Right Join  (cost=0.00..39739.84 rows=172618
>> width=75) (actual time=250.000..43657.000 rows=176431 loops=1)"

You're not reading it quite right.  The first "actual" number is the
time at which the first result row was delivered, which for a sort is
after the completion of (the bulk of) the sorting work.  What you
really need to look at is the difference between the completion times
of the node and its immediate input(s).  In this case I'd blame the
sort for 46985.000 - 43657.000 msec.

Come to think of it, though, you should not be putting a whole lot of
trust in EXPLAIN ANALYZE numbers taken on Windows, because they are
based on gettimeofday which has absolutely awful resolution on that
platform.  (There's a workaround for this in our CVS, but it's not in
8.0.*.)  I think we can still conclude that the indexscan on
tblassociate is most of the cost, but I wouldn't venture to say that
it's exactly such-and-such percent.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] How to improve db performance with $7K?

2005-04-07 Thread Richard_D_Levine
Yep, that's it, as well as increased quality control.  I found this from
Seagate:

http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf

With this quote (note that ES stands for Enterprise System and PS stands
for Personal System):

There is significantly more silicon on ES products. The following
comparison comes from a study done in 2000:
 the ES ASIC gate count is more than 2x a PS drive,
 the embedded SRAM space for program code is 2x,
 the permanent flash memory for program code is 2x,
 data SRAM and cache SRAM space is more than 10x.
The complexity of the SCSI/FC interface compared to the
IDE/ATA interface shows up here due in part to the more
complex system architectures in which ES drives find themselves.
ES interfaces support multiple initiators or hosts. The
drive must keep track of separate sets of information for each
host to which it is attached, e.g., maintaining the processor
pointer sets for multiple initiators and tagged commands.
The capability of SCSI/FC to efficiently process commands
and tasks in parallel has also resulted in a higher overhead
âkernelâ structure for the firmware. All of these complexities
and an overall richer command set result in the need for a
more expensive PCB to carry the electronics.

Rick

Alex Turner <[EMAIL PROTECTED]> wrote on 04/07/2005 10:46:31 AM:

> Based on the reading I'm doing, and somebody please correct me if I'm
> wrong, it seems that SCSI drives contain an on disk controller that
> has to process the tagged queue.  SATA-I doesn't have this.  This
> additional controller, is basicaly an on board computer that figures
> out the best order in which to process commands.  I believe you are
> also paying for the increased tolerance that generates a better speed.
>  If you compare an 80Gig 7200RPM IDE drive to a WD Raptor 76G 10k RPM
> to a Seagate 10k.6 drive to a Seagate Cheatah 15k drive, each one
> represents a step up in parts and technology, thereby generating a
> cost increase (at least thats what the manufactures tell us).  I know
> if you ever held a 15k drive in your hand, you can notice a
> considerable weight difference between it and a 7200RPM IDE drive.
>
> Alex Turner
> netEconomist
>
> On Apr 7, 2005 11:37 AM, [EMAIL PROTECTED]
> <[EMAIL PROTECTED]> wrote:
> > Another simple question: Why is SCSI more expensive?  After the
> > eleventy-millionth controller is made, it seems like SCSI and SATA are
> > using a controller board and a spinning disk.  Is somebody still making
> > money by licensing SCSI technology?
> >
> > Rick
> >
> > [EMAIL PROTECTED] wrote on 04/06/2005 11:58:33 PM:
> >
> > > You asked for it!  ;-)
> > >
> > > If you want cheap, get SATA.  If you want fast under
> > > *load* conditions, get SCSI.  Everything else at this
> > > time is marketing hype, either intentional or learned.
> > > Ignoring dollars, expect to see SCSI beat SATA by 40%.
> > >
> > >  * * * What I tell you three times is true * * *
> > >
> > > Also, compare the warranty you get with any SATA
> > > drive with any SCSI drive.  Yes, you still have some
> > > change leftover to buy more SATA drives when they
> > > fail, but... it fundamentally comes down to some
> > > actual implementation and not what is printed on
> > > the cardboard box.  Disk systems are bound by the
> > > rules of queueing theory.  You can hit the sales rep
> > > over the head with your queueing theory book.
> > >
> > > Ultra320 SCSI is king of the hill for high concurrency
> > > databases.  If you're only streaming or serving files,
> > > save some money and get a bunch of SATA drives.
> > > But if you're reading/writing all over the disk, the
> > > simple first-come-first-serve SATA heuristic will
> > > hose your performance under load conditions.
> > >
> > > Next year, they will *try* bring out some SATA cards
> > > that improve on first-come-first-serve, but they ain't
> > > here now.  There are a lot of rigged performance tests
> > > out there...  Maybe by the time they fix the queueing
> > > problems, serial Attached SCSI (a/k/a SAS) will be out.
> > > Looks like Ultra320 is the end of the line for parallel
> > > SCSI, as Ultra640 SCSI (a/k/a SPI-5) is dead in the
> > > water.
> > >
> > > Ultra320 SCSI.
> > > Ultra320 SCSI.
> > > Ultra320 SCSI.
> > >
> > > Serial Attached SCSI.
> > > Serial Attached SCSI.
> > > Serial Attached SCSI.
> > >
> > > For future trends, see:
> > > http://www.incits.org/archive/2003/in031163/in031163.htm
> > >
> > > douglas
> > >
> > > p.s. For extra credit, try comparing SATA and SCSI drives
> > > when they're 90% full.
> > >
> > > On Apr 6, 2005, at 8:32 PM, Alex Turner wrote:
> > >
> > > > I guess I'm setting myself up here, and I'm really not being
ignorant,
> > > > but can someone explain exactly how is SCSI is supposed to better
than
> > > > SATA?
> > > >
> > > > Both systems use drives with platters.  Each drive can physically
only
> > > > read one thing at a time.
> > > >
> > > > SATA gives e

Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread Joel Fradkin
shared_buffers = 8000   # min 16, at least max_connections*2, 8KB
each
work_mem = 8192#1024# min 64, size in KB
max_fsm_pages = 3   # min max_fsm_relations*16, 6 bytes each
effective_cache_size = 4 #1000  # typically 8KB each
random_page_cost = 1.2#4# units are one sequential page
fetch cost

These are the items I changed.
In the development box I turned random page cost to .2 because I figured it
would all be faster using an index as all my data is at a minimum being
selected by clientnum.

But the analyze I sent in is from these settings above on a windows box.
If I was running the analyze (pgadmin) on a windows box but connecting to a
linux box would the times be accurate or do I have to run the analyze on the
linux box for that to happen?

I am a little unclear why I would need an index on associate by location as
I thought it would be using indexes in location and jobtitle for their
joins.
I did not say where locationid = x in my query on the view.
I have so much to learn about SQL.
Joel




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread Tom Lane
"Joel Fradkin" <[EMAIL PROTECTED]> writes:
> random_page_cost = 1.2#4  # units are one sequential page
> fetch cost

That is almost certainly overoptimistic; it's causing the planner to
use indexscans when it shouldn't.  Try 2 or 3 or thereabouts.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread John Arbash Meinel
Joel Fradkin wrote:
shared_buffers = 8000   # min 16, at least max_connections*2, 8KB
each
work_mem = 8192#1024# min 64, size in KB
max_fsm_pages = 3   # min max_fsm_relations*16, 6 bytes each
effective_cache_size = 4 #1000  # typically 8KB each
random_page_cost = 1.2#4# units are one sequential page
fetch cost
These are the items I changed.
In the development box I turned random page cost to .2 because I figured it
would all be faster using an index as all my data is at a minimum being
selected by clientnum.

You're random page cost is *way* too low. I would probably change this
to no less that 2.0.
But the analyze I sent in is from these settings above on a windows box.
If I was running the analyze (pgadmin) on a windows box but connecting to a
linux box would the times be accurate or do I have to run the analyze on the
linux box for that to happen?

EXPLAIN ANALYZE is done on the server side, so it doesn't matter what
you use to connect to it. The \timing flag occurs on the local side, and
is thus influenced my network latency (but it only tells you the time
for the whole query anyway).
I am a little unclear why I would need an index on associate by location as
I thought it would be using indexes in location and jobtitle for their
joins.
I did not say where locationid = x in my query on the view.
I have so much to learn about SQL.
Joel
CREATE OR REPLACE VIEW viwassoclist AS
 SELECT a.clientnum, a.associateid, a.associatenum, a.lastname,
a.firstname, jt.value AS jobtitle, l.name AS "location", l.locationid
AS mainlocationid, l.divisionid, l.regionid, l.districtid,
(a.lastname::text || ', '::text) || a.firstname::text AS assocname,
a.isactive, a.isdeleted
   FROM tblassociate a
   LEFT JOIN tbljobtitle jt ON a.jobtitleid = jt.id AND
jt.clientnum::text = a.clientnum::text AND 1 = jt.presentationid
   JOIN tbllocation l ON a.locationid = l.locationid AND
l.clientnum::text = a.clientnum::text;
  ^^^
The locationid is defined in your view. This is the part that postgres
uses to merge all of the different tables together, it doesn't really
matter whether you restrict it with a WHERE clause.
Try just setting your random page cost back to something more
reasonable, and try again.
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread Joel Fradkin
Here is the result after putting it back to 4 the original value (I had done
that prior to your suggestion of using 2 or 3) to see what might change.
I also vacummed and thought I saw records deleted in associate, which I
found odd as this is a test site and no new records were added or deleted.

"Merge Join  (cost=86788.09..87945.00 rows=10387 width=112) (actual
time=19703.000..21154.000 rows=159959 loops=1)"
"  Merge Cond: ("outer".locationid = "inner".locationid)"
"  ->  Sort  (cost=1245.50..1246.33 rows=332 width=48) (actual
time=62.000..62.000 rows=441 loops=1)"
"Sort Key: l.locationid"
"->  Index Scan using ix_location on tbllocation l
(cost=0.00..1231.60 rows=332 width=48) (actual time=15.000..62.000 rows=441
loops=1)"
"  Index Cond: ('SAKS'::text = (clientnum)::text)"
"  ->  Sort  (cost=85542.59..86042.39 rows=199922 width=75) (actual
time=19641.000..19955.000 rows=159960 loops=1)"
"Sort Key: a.locationid"
"->  Merge Right Join  (cost=60850.40..62453.22 rows=199922
width=75) (actual time=13500.000..14734.000 rows=176431 loops=1)"
"  Merge Cond: (("outer".id = "inner".jobtitleid) AND
("outer"."?column4?" = "inner"."?column10?"))"
"  ->  Sort  (cost=554.11..570.13 rows=6409 width=37) (actual
time=94.000..94.000 rows=6391 loops=1)"
"Sort Key: jt.id, (jt.clientnum)::text"
"->  Seq Scan on tbljobtitle jt  (cost=0.00..148.88
rows=6409 width=37) (actual time=0.000..63.000 rows=6391 loops=1)"
"  Filter: (1 = presentationid)"
"  ->  Sort  (cost=60296.29..60796.09 rows=199922 width=53)
(actual time=13406.000..13859.000 rows=176431 loops=1)"
"Sort Key: a.jobtitleid, (a.clientnum)::text"
"->  Seq Scan on tblassociate a  (cost=0.00..38388.79
rows=199922 width=53) (actual time=62.000..10589.000 rows=176431 loops=1)"
"  Filter: ((clientnum)::text = 'SAKS'::text)"
"Total runtime: 22843.000 ms"

Joel Fradkin
 
-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Thursday, April 07, 2005 11:43 AM
To: Joel Fradkin
Cc: 'PostgreSQL Perform'
Subject: Re: [PERFORM] Any way to speed this up? 

"Joel Fradkin" <[EMAIL PROTECTED]> writes:
> random_page_cost = 1.2#4  # units are one sequential page
> fetch cost

That is almost certainly overoptimistic; it's causing the planner to
use indexscans when it shouldn't.  Try 2 or 3 or thereabouts.

regards, tom lane


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Any way to speed this up?

2005-04-07 Thread John Arbash Meinel
Joel Fradkin wrote:
Here is the result after putting it back to 4 the original value (I had done
that prior to your suggestion of using 2 or 3) to see what might change.
I also vacummed and thought I saw records deleted in associate, which I
found odd as this is a test site and no new records were added or deleted.

Well, that looks 2x as fast, right?
You might try
SET enable_mergejoin TO off;
Just to see if you can force a hash-join and see how long that takes.
You might also try increasing work_mem.
You can do that just in the current session with
SET work_mem TO ;
John
=:->


signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)

2005-04-07 Thread Mischa
Quoting Tom Lane <[EMAIL PROTECTED]>:

> Yeah, the whole thing is only a heuristic anyway.  I've been coming
> around to the view that relation membership shouldn't matter, because
> of cases like
> 
>   WHERE a.x > b.y AND a.x < 42
> 
> which surely should be taken as a range constraint.

Out of curiosity, will the planner induce "b.y < 42" out of this?

-- 
"Dreams come true, not free."


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Building postmaster with Profiling Support WAS "Tweaking a C

2005-04-07 Thread Adam Palmblad
The recompile was done by the sysadmin, but I believe the flags are -pg  
-DLINUX_PROFILING for profiling, and -g for debug symbols.
This leaves gmon.out files around, which you can then do a "gprof 
/usr/bin/postmaster gmon.out" to see whats going on.

My problem is that this gives me data on what functions are being called 
with respect to the postmaster binary, but I don't know
which of my functions - in my shared library - in my C procedure are 
taking the most time.

-Adam
Mohan, Ross wrote:
Adam - 

Is compiling postmaster with profiling support just a flag
in the build/make? Or is there something more involved? 

I'd like to be able to do this in the future and so am
curious about means/methods. 

If this is a RTFM, just let me know that (am currently 
Reading The F Manual), but if you have any "special sauce"
here, that'd be of great interest. 

Thanks
-Ross
-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Adam Palmblad
Sent: Wednesday, April 06, 2005 7:23 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Tweaking a C Function I wrote
I wanted to see if I could squeeze any more performance out of a C set 
returning function I wrote.  As such, I looked to a profiler.  Is it 
possible to get profile information on the function I wrote?  I've got 
postmaster and my function compiled with profiling support, and can find 
the gmon.out files... can I actually look at the call tree that occurs 
when my function is being executed or will I be limited to viewing calls 
to functions in the postmaster binary?

-Adam
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly
 


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Recognizing range constraints (was Re: [PERFORM] Plan for relatively simple query seems to be very inefficient)

2005-04-07 Thread Jim C. Nasby
On Wed, Apr 06, 2005 at 06:35:10PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <[EMAIL PROTECTED]> writes:
> > On Wed, Apr 06, 2005 at 06:09:37PM -0400, Tom Lane wrote:
> >> Can anyone suggest a more general rule?  Do we need for example to
> >> consider whether the relation membership is the same in two clauses
> >> that might be opposite sides of a range restriction?  It seems like
> >> 
> >> a.x > b.y AND a.x < b.z
> 
> > In a case like this, you could actually look at the  data in b and see
> > what the average range size is.
> 
> Not with the current statistics --- you'd need some kind of cross-column
> statistics involving both y and z.  (That is, I doubt it would be
> helpful to estimate the average range width by taking the difference of
> independently-calculated mean values of y and z ...)  But yeah, in
> principle it would be possible to make a non-default estimate.

Actually, it might be possible to take a SWAG at it using the histogram
and correlation stats.

You know... since getting universally useful cross-platform stats seems
to be pretty pie-in-the-sky, would it be possible to generate more
complex stats on the fly from a sampling of a table? If you're looking
at a fairly sizeable table ISTM it would be worth sampling the rows on
10 or 20 random pages to see what you get. In this case, you'd want to
know the average difference between two fields. Other queries might want
something different.
-- 
Jim C. Nasby, Database Consultant   [EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Recognizing range constraints (was Re: Plan for relatively simple query seems to be very inefficient)

2005-04-07 Thread Tom Lane
Mischa <[EMAIL PROTECTED]> writes:
> Quoting Tom Lane <[EMAIL PROTECTED]>:
>> WHERE a.x > b.y AND a.x < 42

> Out of curiosity, will the planner induce "b.y < 42" out of this?

No.  There's some smarts about transitive equality, but none about
transitive inequalities.  Offhand I'm not sure if it'd be useful to add
such.  The transitive-equality code pulls its weight because you so
often have situations like

create view v as select a.x, ... from a join b on (a.x = b.y);

select * from v where x = 42;

but I'm less able to think of common use-cases for transitive
inequality ...

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] help on explain analyse in psql 7.1.3 (linux)

2005-04-07 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


Christopher Kings-Lynne wrote:
> Explain analyze was added in 7.2 - you really need to upgrade...
>
> You can use \timing in psql to get an approximation...

Actually, \timing was not added until 7.2 either! So, the
original poster really, really needs to upgrade... :)

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200504072129
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-BEGIN PGP SIGNATURE-

iD8DBQFCVd69vJuQZxSWSsgRAvRHAJ9T1uxfWEnHSNI/+iiiHiJ2I1IGUgCggMYb
tjDwzfseK3aDAKHI5Ko1S/Q=
=AvKY
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: COPY Hacks (WAS: RE: [PERFORM] Postgresql vs SQLserver for this application ?)

2005-04-07 Thread Greg Sabino Mullane

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


> Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> I accidentally strung together several \n-terminated input lines,
> and sent them to the server with a single "putline".
...
> So. Is it a feechur? Worth stress-testing? Could be VERY cool.

As explained elsewhere, not really a feature, more of a side-effect.
Keep in mind, however, that any network round-trip time saved has to
be balanced against some additional overhead of constructing the
combined strings in Perl before sending them over. Most times COPY
is used to parse a newline-separated file anyway. If you have a slow
network connection to the database, it *might* be a win, but my
limited testing shows that it is not an advantage for a "normal"
connection: I added 1 million rows via COPY using the normal way
(1 million pg_putline calls), via pg_putline of 1000 rows at a
time, and via 10,000 rows at a time. They all ran in 22 seconds,
with no statistical difference between them. (This was the "real" time,
the system time was actually much lower for the combined calls).

It can't hurt to test things out on your particular system and see
if it makes a real difference: it certainly does no harm as long as
you make sure the string you send always *end* in a newline.

- --
Greg Sabino Mullane [EMAIL PROTECTED]
PGP Key: 0x14964AC8 200504072201
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-BEGIN PGP SIGNATURE-

iD8DBQFCVeZrvJuQZxSWSsgRAoP+AJ9jTNetePMwKv9rdyu6Lz+BjSiDOQCguoSU
ie9TaeIxUuvd5fhjFueacvM=
=1hWn
-END PGP SIGNATURE-



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] multi-line copy (was: Re: COPY Hacks)

2005-04-07 Thread Mischa Sandberg

Quoting Greg Sabino Mullane <[EMAIL PROTECTED]>:

> > Using COPY ... FROM STDIN via the Perl DBI (DBD::Pg) interface,
> > I accidentally strung together several \n-terminated input lines,
> > and sent them to the server with a single "putline".
> ...
> > So. Is it a feechur? Worth stress-testing? Could be VERY cool.
> 
> As explained elsewhere, not really a feature, more of a side-effect.
> Keep in mind, however, that any network round-trip time saved has to
> be balanced against some additional overhead of constructing the
> combined strings in Perl before sending them over. Most times COPY
> is used to parse a newline-separated file anyway. If you have a slow
> network connection to the database, it *might* be a win, but my
> limited testing shows that it is not an advantage for a "normal"
> connection: I added 1 million rows via COPY using the normal way
> (1 million pg_putline calls), via pg_putline of 1000 rows at a
> time, and via 10,000 rows at a time. They all ran in 22 seconds,
> with no statistical difference between them. (This was the "real" time,
> the system time was actually much lower for the combined calls).
> 
> It can't hurt to test things out on your particular system and see
> if it makes a real difference: it certainly does no harm as long as
> you make sure the string you send always *end* in a newline.

Many thanks for digging into it.

For the app I'm working with, the time delay between rows being posted 
is /just/ enough to exceed the TCP Nagle delay, so every row goes across
in its own packet :-( Reducing the number of network roundtrips 
by a factor of 40 is enough to cut elapsed time in half.
The cost of join("",@FortyRows), which produces a 1-4K string, is what's
negligible in this case.

-- 
"Dreams come true, not free" -- S.Sondheim, ITW


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq