[PERFORM] Query across a date range

2005-01-07 Thread David Jaquay
Summary: Doing a two or three table join for a date range performs
worse than doing the same query individually for each date in the
range.

What works: Doing a query just on a single date or a date range
(against just one table) runs quick; 'explain' says it uses an index
scan.  Doing a query on a single date for one store or for one market
uses all index scans, and runs quick as well.

The problem: Doing a query for a date range on a particular store or
market, though, for a date range of more than a few days does a
sequential scan of sales_tickets, and performs worse than doing one
single date query for each date.  My 'explain' for one such query is
below.

Background: I have two or three tables involved in a query.  One table
is holds stores (7 rows at present), one holds sales tickets (about 5
million) and one holds line items (about 10 million).  It's test data
that I've generated and loaded using '\copy from'.  Each has a primary
key, and line items have two dates, written and delivered, that are
indexed individually.  Store has a market id; a market contains
multiple stores (in my case, 2 or 3).  Each sales ticket has 1-3 line
items.

Is there a way to tell postgres to use an index scan on sales_tickets?  

Curiously, in response to recent postings in the "Low Performance for
big hospital server" thread, when I flatten the tables by putting
storeid into line_items, it runs somewhat faster in all cases, and
much faster in some; (I have times, if anyone is interested).

Thanks,
Dave


mydb=> explain select * from line_items t, sales_tickets s where
writtenDate >= '12/01/2002' and writtenDate <= '12/31/2002' and
t.ticketId = s.ticketId and s.storeId = 1;
QUERY PLAN
---
 Hash Join  (cost=93865.46..114054.74 rows=19898 width=28)
   Hash Cond: ("outer".ticketId = "inner".ticketId)
   ->  Index Scan using line_items_written on line_items t 
(cost=0.00..3823.11 rows=158757 width=16)
 Index Cond: ((writtendate >= '2002-12-01'::date) AND
(writtendate <= '2002-12-31'::date))
   ->  Hash  (cost=89543.50..89543.50 rows=626783 width=12)
 ->  Seq Scan on sales_tickets s  (cost=0.00..89543.50
rows=626783 width=12)
   Filter: (storeid = 1)
(7 rows)

mydb=> explain select * from line_items t, sales_tickets s where
writtenDate = '12/01/2002' and t.ticketId = s.ticketId and s.storeid =
1;
 QUERY PLAN
-
 Nested Loop  (cost=0.00..16942.25 rows=697 width=28)
   ->  Index Scan using line_items_written on soldtrx t 
(cost=0.00..121.97 rows=5554 width=16)
 Index Cond: (writtendate = '2002-12-01'::date)
   ->  Index Scan using sales_tickets_pkey on sales_tickets s 
(cost=0.00..3.02 rows=1 width=12)
 Index Cond: ("outer".ticketId = s.ticketId)
 Filter: (storeid = 1)
(6 rows)


The tables:

create table stores   -- 7 rows
(
storeId  integer not null,
marketId integer not null
);

create table sales_tickets   -- 500,000 rows
(
ticketId integer primary key,
storeId  integer not null,
custId   integer not null
);

create table line_items   -- 1,000,000 rows
(
lineItemIdinteger primary key,
ticketId  integer references sales_tickets,
writtenDate   datenot null,
deliveredDate datenot null
);

create index line_items_written   on line_items (writtenDate);
create index line_items_delivered on line_items (deliveredDate);

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Query across a date range

2005-01-07 Thread Josh Berkus
David,

> The problem: Doing a query for a date range on a particular store or
> market, though, for a date range of more than a few days does a
> sequential scan of sales_tickets, and performs worse than doing one
> single date query for each date.  My 'explain' for one such query is
> below.

Can you run EXPLAIN ANALYZE instead of just EXPLAIN?  That will show you the 
discrepancy between estimated and actual costs, and probably show you what 
needs fixing.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Query across a date range

2005-01-07 Thread Tom Lane
David Jaquay <[EMAIL PROTECTED]> writes:
> Summary: Doing a two or three table join for a date range performs
> worse than doing the same query individually for each date in the
> range.

Could we see EXPLAIN ANALYZE, not just EXPLAIN, results?

Also, have you ANALYZEd lately?  If the estimated row counts are at all
accurate, I doubt that forcing a nestloop indexscan would improve the
situation.

Also, what PG version is this?

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Query across a date range

2005-01-07 Thread David Jaquay
'explain analyze' output is below.  I have done analyze recently, and
am using pg 7.4.2 on SuSE 9.1.  I'd be curious to know how to "a
nestloop indexscan" to try it out.

Thanks,
Dave

mydb=> explain analyze select * from line_items t, sales_tickets s
where writtenDate >= '12/01/2002' and writtenDate <= '12/31/2002' and
t.ticketid = s.ticketId and s.storeId = 1;
 QUERY PLAN

 Hash Join  (cost=93865.46..114054.74 rows=19898 width=28) (actual
time=25419.088..32140.217 rows=23914 loops=1)
   Hash Cond: ("outer".ticketid = "inner".ticketid)
   ->  Index Scan using line_items_written on line_items t 
(cost=0.00..3823.11 rows=158757 width=16) (actual
time=100.621..3354.818 rows=169770 loops=1)
 Index Cond: ((writtendate >= '2002-12-01'::date) AND
(writtendate <= '2002-12-31'::date))
   ->  Hash  (cost=89543.50..89543.50 rows=626783 width=12) (actual
time=22844.146..22844.146 rows=0 loops=1)
 ->  Seq Scan on sales_tickets s  (cost=0.00..89543.50
rows=626783 width=12) (actual time=38.017..19387.447 rows=713846
loops=1)
   Filter: (storeid = 1)
 Total runtime: 32164.948 ms
(8 rows)




On Fri, 7 Jan 2005 11:35:11 -0800, Josh Berkus  wrote:
> Can you run EXPLAIN ANALYZE instead of just EXPLAIN?  That will show you the
> discrepancy between estimated and actual costs, and probably show you what
> needs fixing.

Also, Tom Lane wrote:
> Could we see EXPLAIN ANALYZE, not just EXPLAIN, results?
> 
> Also, have you ANALYZEd lately?  If the estimated row counts are at all
> accurate, I doubt that forcing a nestloop indexscan would improve the
> situation.
> 
> Also, what PG version is this?

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Query across a date range

2005-01-07 Thread Josh Berkus
Dave,

Ah 

>  ->  Seq Scan on sales_tickets s  (cost=0.00..89543.50
> rows=626783 width=12) (actual time=38.017..19387.447 rows=713846
> loops=1)

This is just more than 1/2 the time of your query.   The issue is that you're 
pulling 713,000 rows (PG estimates 626,000 which is in the right ballpark) 
and PG thinks that this is enough rows where a seq scan is faster.  It could 
be right.

You can test that, force an indexscan by doing:
SET enable_seqscan = FALSE;

Also, please remember to run each query 3 times and report the time of the 
*last* run to us.   We don't want differences in caching to throw off your 
evaulation.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Does "HYPERTHREADING" do any harm if we use with RH9.0 and postgresql?

2005-01-07 Thread amrit
I use RH 9.0 with postgresql 7.3.2 and 4 Gb ram  [server spec. Dual Xeon 3.0]
and someone mention that the hyperthreading will not help but if I let it there
will it be any harm to the system?
Any comment please.
Amrit
Thailand


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Does "HYPERTHREADING" do any harm if we use with RH9.0 and postgresql?

2005-01-07 Thread Josh Berkus
Amrit,

> I use RH 9.0 with postgresql 7.3.2 and 4 Gb ram  [server spec. Dual Xeon
> 3.0] and someone mention that the hyperthreading will not help but if I let
> it there will it be any harm to the system?
> Any comment please.

Sometimes.  Run a test and take a look at your CS (context switch) levels on 
VMSTAT.   If they're high, turn HT off.

If it's a dedicated PG system, though, just turn HT off.  We can't use it.

Also, upgrade PostgreSQL to 7.3.8 at least.   7.3.2 is known-buggy.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Does "HYPERTHREADING" do any harm if we use with RH9.0

2005-01-07 Thread Steve Poe
Josh Berkus wrote:
Amrit,
 

I use RH 9.0 with postgresql 7.3.2 and 4 Gb ram  [server spec. Dual Xeon
3.0] and someone mention that the hyperthreading will not help but if I let
it there will it be any harm to the system?
Any comment please.
   

Sometimes.  Run a test and take a look at your CS (context switch) levels on 
VMSTAT.   If they're high, turn HT off.

If it's a dedicated PG system, though, just turn HT off.  We can't use it.
Also, upgrade PostgreSQL to 7.3.8 at least.   7.3.2 is known-buggy.
 

Sorry for the "dumb" question, but what would be considered high 
regarding CS levels?  We just upgraded our server's to dual 2.8Ghz Xeon 
CPUs from dual Xeon 1.8Ghz which unfortunately HT built-in.  We also 
upgraded our database from version 7.3.4 to 7.4.2

Thanks.
Steve Poe

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[PERFORM] TEXT field and Postgresql Perfomance

2005-01-07 Thread Loren M. Lang
Do large TEXT or VARCHAR entries in postgresql cause any performance
degradation when a query is being executed to search for data in a table
where the TEXT/VARCHAR fields aren't being searched themselves?

Since, according to the postgresql docs, theirs no performance
difference between VARCHAR and TEXT, I'm assuming VARCHAR is identical
to TEXT entries with a restriction set on the length.  And since TEXT
can be of any possible size, then they must be stored independently of
the rest of the table which is probably all stored in a fixed size rows
since all or nearly all of the other types have a specific size
including CHAR.  Therefore TEXT entries must be in some other hash table
that only needs to be looked up when that column is referenced.  If this
is the case then all other row data will need to be read in for an
unindexed query, but the TEXT columns will only be read if their being
searched though or queried.  And if they're only being queried, then only
the rows that matched will need the TEXT columns read in which should
have minimal impact on performance even if they contain kilobytes of
information.

-- 
I sense much NT in you.
NT leads to Bluescreen.
Bluescreen leads to downtime.
Downtime leads to suffering.
NT is the path to the darkside.
Powerful Unix is.

Public Key: ftp://ftp.tallye.com/pub/lorenl_pubkey.asc
Fingerprint: B3B9 D669 69C9 09EC 1BCD  835A FAF3 7A46 E4A3 280C
 

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] TEXT field and Postgresql Perfomance

2005-01-07 Thread Bruno Wolff III
On Fri, Jan 07, 2005 at 19:36:47 -0800,
  "Loren M. Lang" <[EMAIL PROTECTED]> wrote:
> Do large TEXT or VARCHAR entries in postgresql cause any performance
> degradation when a query is being executed to search for data in a table
> where the TEXT/VARCHAR fields aren't being searched themselves?

Yes in that the data is more spread out because of the wider rows and that
results in more disk blocks being looked at to get the desired data.

> Since, according to the postgresql docs, theirs no performance
> difference between VARCHAR and TEXT, I'm assuming VARCHAR is identical
> to TEXT entries with a restriction set on the length.  And since TEXT
> can be of any possible size, then they must be stored independently of

No.

> the rest of the table which is probably all stored in a fixed size rows

No, Postgres uses variable length records.

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] TEXT field and Postgresql Perfomance

2005-01-07 Thread Alex Turner
I guess my question that would follow is, when does it work best to
start using BLOBs/CLOBs (I forget if pg has CLOBs) instead of
textfields because your table is going to balloon in disk blocks if
you have large amounts of data, and all fields you want to search on
would have to be indexed, increasing insert time substantialy.

Does it ever pay to use text and not CLOB unless your text is going to
be short, in which case why not just varchar, leading to the thought
that the text datatype is just bad?

Alex Turner
NetEconomist


On Fri, 7 Jan 2005 22:03:23 -0600, Bruno Wolff III <[EMAIL PROTECTED]> wrote:
> On Fri, Jan 07, 2005 at 19:36:47 -0800,
>   "Loren M. Lang" <[EMAIL PROTECTED]> wrote:
> > Do large TEXT or VARCHAR entries in postgresql cause any performance
> > degradation when a query is being executed to search for data in a table
> > where the TEXT/VARCHAR fields aren't being searched themselves?
> 
> Yes in that the data is more spread out because of the wider rows and that
> results in more disk blocks being looked at to get the desired data.
> 
> > Since, according to the postgresql docs, theirs no performance
> > difference between VARCHAR and TEXT, I'm assuming VARCHAR is identical
> > to TEXT entries with a restriction set on the length.  And since TEXT
> > can be of any possible size, then they must be stored independently of
> 
> No.
> 
> > the rest of the table which is probably all stored in a fixed size rows
> 
> No, Postgres uses variable length records.
> 
> ---(end of broadcast)---
> TIP 7: don't forget to increase your free space map settings
>

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] TEXT field and Postgresql Perfomance

2005-01-07 Thread Michael Fuhr
On Fri, Jan 07, 2005 at 10:03:23PM -0600, Bruno Wolff III wrote:
> On Fri, Jan 07, 2005 at 19:36:47 -0800, "Loren M. Lang" <[EMAIL PROTECTED]> 
> wrote:
> 
> > Since, according to the postgresql docs, theirs no performance
> > difference between VARCHAR and TEXT, I'm assuming VARCHAR is identical
> > to TEXT entries with a restriction set on the length.  And since TEXT
> > can be of any possible size, then they must be stored independently of
> 
> No.
> 
> > the rest of the table which is probably all stored in a fixed size rows
> 
> No, Postgres uses variable length records.

A discussion of TOAST and ALTER TABLE SET STORAGE might be appropriate
here, but I'll defer that to somebody who understands such things
better than me.

-- 
Michael Fuhr
http://www.fuhr.org/~mfuhr/

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] TEXT field and Postgresql Perfomance

2005-01-07 Thread Tom Lane
Bruno Wolff III <[EMAIL PROTECTED]> writes:
> On Fri, Jan 07, 2005 at 19:36:47 -0800,
>   "Loren M. Lang" <[EMAIL PROTECTED]> wrote:
>> Do large TEXT or VARCHAR entries in postgresql cause any performance
>> degradation when a query is being executed to search for data in a table
>> where the TEXT/VARCHAR fields aren't being searched themselves?

> Yes in that the data is more spread out because of the wider rows and that
> results in more disk blocks being looked at to get the desired data.

You are overlooking the effects of TOAST.  Fields wider than a kilobyte
or two will be pushed out-of-line and will thereby not impose a penalty
on queries that only access the other fields in the table.

(If Loren's notion of "large" is "a hundred bytes" then there may be a
measurable impact.  If it's "a hundred K" then there won't be.)

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]