[PERFORM] Query across a date range
Summary: Doing a two or three table join for a date range performs worse than doing the same query individually for each date in the range. What works: Doing a query just on a single date or a date range (against just one table) runs quick; 'explain' says it uses an index scan. Doing a query on a single date for one store or for one market uses all index scans, and runs quick as well. The problem: Doing a query for a date range on a particular store or market, though, for a date range of more than a few days does a sequential scan of sales_tickets, and performs worse than doing one single date query for each date. My 'explain' for one such query is below. Background: I have two or three tables involved in a query. One table is holds stores (7 rows at present), one holds sales tickets (about 5 million) and one holds line items (about 10 million). It's test data that I've generated and loaded using '\copy from'. Each has a primary key, and line items have two dates, written and delivered, that are indexed individually. Store has a market id; a market contains multiple stores (in my case, 2 or 3). Each sales ticket has 1-3 line items. Is there a way to tell postgres to use an index scan on sales_tickets? Curiously, in response to recent postings in the "Low Performance for big hospital server" thread, when I flatten the tables by putting storeid into line_items, it runs somewhat faster in all cases, and much faster in some; (I have times, if anyone is interested). Thanks, Dave mydb=> explain select * from line_items t, sales_tickets s where writtenDate >= '12/01/2002' and writtenDate <= '12/31/2002' and t.ticketId = s.ticketId and s.storeId = 1; QUERY PLAN --- Hash Join (cost=93865.46..114054.74 rows=19898 width=28) Hash Cond: ("outer".ticketId = "inner".ticketId) -> Index Scan using line_items_written on line_items t (cost=0.00..3823.11 rows=158757 width=16) Index Cond: ((writtendate >= '2002-12-01'::date) AND (writtendate <= '2002-12-31'::date)) -> Hash (cost=89543.50..89543.50 rows=626783 width=12) -> Seq Scan on sales_tickets s (cost=0.00..89543.50 rows=626783 width=12) Filter: (storeid = 1) (7 rows) mydb=> explain select * from line_items t, sales_tickets s where writtenDate = '12/01/2002' and t.ticketId = s.ticketId and s.storeid = 1; QUERY PLAN - Nested Loop (cost=0.00..16942.25 rows=697 width=28) -> Index Scan using line_items_written on soldtrx t (cost=0.00..121.97 rows=5554 width=16) Index Cond: (writtendate = '2002-12-01'::date) -> Index Scan using sales_tickets_pkey on sales_tickets s (cost=0.00..3.02 rows=1 width=12) Index Cond: ("outer".ticketId = s.ticketId) Filter: (storeid = 1) (6 rows) The tables: create table stores -- 7 rows ( storeId integer not null, marketId integer not null ); create table sales_tickets -- 500,000 rows ( ticketId integer primary key, storeId integer not null, custId integer not null ); create table line_items -- 1,000,000 rows ( lineItemIdinteger primary key, ticketId integer references sales_tickets, writtenDate datenot null, deliveredDate datenot null ); create index line_items_written on line_items (writtenDate); create index line_items_delivered on line_items (deliveredDate); ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Query across a date range
David, > The problem: Doing a query for a date range on a particular store or > market, though, for a date range of more than a few days does a > sequential scan of sales_tickets, and performs worse than doing one > single date query for each date. My 'explain' for one such query is > below. Can you run EXPLAIN ANALYZE instead of just EXPLAIN? That will show you the discrepancy between estimated and actual costs, and probably show you what needs fixing. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Query across a date range
David Jaquay <[EMAIL PROTECTED]> writes: > Summary: Doing a two or three table join for a date range performs > worse than doing the same query individually for each date in the > range. Could we see EXPLAIN ANALYZE, not just EXPLAIN, results? Also, have you ANALYZEd lately? If the estimated row counts are at all accurate, I doubt that forcing a nestloop indexscan would improve the situation. Also, what PG version is this? regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Query across a date range
'explain analyze' output is below. I have done analyze recently, and am using pg 7.4.2 on SuSE 9.1. I'd be curious to know how to "a nestloop indexscan" to try it out. Thanks, Dave mydb=> explain analyze select * from line_items t, sales_tickets s where writtenDate >= '12/01/2002' and writtenDate <= '12/31/2002' and t.ticketid = s.ticketId and s.storeId = 1; QUERY PLAN Hash Join (cost=93865.46..114054.74 rows=19898 width=28) (actual time=25419.088..32140.217 rows=23914 loops=1) Hash Cond: ("outer".ticketid = "inner".ticketid) -> Index Scan using line_items_written on line_items t (cost=0.00..3823.11 rows=158757 width=16) (actual time=100.621..3354.818 rows=169770 loops=1) Index Cond: ((writtendate >= '2002-12-01'::date) AND (writtendate <= '2002-12-31'::date)) -> Hash (cost=89543.50..89543.50 rows=626783 width=12) (actual time=22844.146..22844.146 rows=0 loops=1) -> Seq Scan on sales_tickets s (cost=0.00..89543.50 rows=626783 width=12) (actual time=38.017..19387.447 rows=713846 loops=1) Filter: (storeid = 1) Total runtime: 32164.948 ms (8 rows) On Fri, 7 Jan 2005 11:35:11 -0800, Josh Berkus wrote: > Can you run EXPLAIN ANALYZE instead of just EXPLAIN? That will show you the > discrepancy between estimated and actual costs, and probably show you what > needs fixing. Also, Tom Lane wrote: > Could we see EXPLAIN ANALYZE, not just EXPLAIN, results? > > Also, have you ANALYZEd lately? If the estimated row counts are at all > accurate, I doubt that forcing a nestloop indexscan would improve the > situation. > > Also, what PG version is this? ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Query across a date range
Dave, Ah > -> Seq Scan on sales_tickets s (cost=0.00..89543.50 > rows=626783 width=12) (actual time=38.017..19387.447 rows=713846 > loops=1) This is just more than 1/2 the time of your query. The issue is that you're pulling 713,000 rows (PG estimates 626,000 which is in the right ballpark) and PG thinks that this is enough rows where a seq scan is faster. It could be right. You can test that, force an indexscan by doing: SET enable_seqscan = FALSE; Also, please remember to run each query 3 times and report the time of the *last* run to us. We don't want differences in caching to throw off your evaulation. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PERFORM] Does "HYPERTHREADING" do any harm if we use with RH9.0 and postgresql?
I use RH 9.0 with postgresql 7.3.2 and 4 Gb ram [server spec. Dual Xeon 3.0] and someone mention that the hyperthreading will not help but if I let it there will it be any harm to the system? Any comment please. Amrit Thailand ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Does "HYPERTHREADING" do any harm if we use with RH9.0 and postgresql?
Amrit, > I use RH 9.0 with postgresql 7.3.2 and 4 Gb ram [server spec. Dual Xeon > 3.0] and someone mention that the hyperthreading will not help but if I let > it there will it be any harm to the system? > Any comment please. Sometimes. Run a test and take a look at your CS (context switch) levels on VMSTAT. If they're high, turn HT off. If it's a dedicated PG system, though, just turn HT off. We can't use it. Also, upgrade PostgreSQL to 7.3.8 at least. 7.3.2 is known-buggy. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Does "HYPERTHREADING" do any harm if we use with RH9.0
Josh Berkus wrote: Amrit, I use RH 9.0 with postgresql 7.3.2 and 4 Gb ram [server spec. Dual Xeon 3.0] and someone mention that the hyperthreading will not help but if I let it there will it be any harm to the system? Any comment please. Sometimes. Run a test and take a look at your CS (context switch) levels on VMSTAT. If they're high, turn HT off. If it's a dedicated PG system, though, just turn HT off. We can't use it. Also, upgrade PostgreSQL to 7.3.8 at least. 7.3.2 is known-buggy. Sorry for the "dumb" question, but what would be considered high regarding CS levels? We just upgraded our server's to dual 2.8Ghz Xeon CPUs from dual Xeon 1.8Ghz which unfortunately HT built-in. We also upgraded our database from version 7.3.4 to 7.4.2 Thanks. Steve Poe ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] TEXT field and Postgresql Perfomance
Do large TEXT or VARCHAR entries in postgresql cause any performance degradation when a query is being executed to search for data in a table where the TEXT/VARCHAR fields aren't being searched themselves? Since, according to the postgresql docs, theirs no performance difference between VARCHAR and TEXT, I'm assuming VARCHAR is identical to TEXT entries with a restriction set on the length. And since TEXT can be of any possible size, then they must be stored independently of the rest of the table which is probably all stored in a fixed size rows since all or nearly all of the other types have a specific size including CHAR. Therefore TEXT entries must be in some other hash table that only needs to be looked up when that column is referenced. If this is the case then all other row data will need to be read in for an unindexed query, but the TEXT columns will only be read if their being searched though or queried. And if they're only being queried, then only the rows that matched will need the TEXT columns read in which should have minimal impact on performance even if they contain kilobytes of information. -- I sense much NT in you. NT leads to Bluescreen. Bluescreen leads to downtime. Downtime leads to suffering. NT is the path to the darkside. Powerful Unix is. Public Key: ftp://ftp.tallye.com/pub/lorenl_pubkey.asc Fingerprint: B3B9 D669 69C9 09EC 1BCD 835A FAF3 7A46 E4A3 280C ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] TEXT field and Postgresql Perfomance
On Fri, Jan 07, 2005 at 19:36:47 -0800, "Loren M. Lang" <[EMAIL PROTECTED]> wrote: > Do large TEXT or VARCHAR entries in postgresql cause any performance > degradation when a query is being executed to search for data in a table > where the TEXT/VARCHAR fields aren't being searched themselves? Yes in that the data is more spread out because of the wider rows and that results in more disk blocks being looked at to get the desired data. > Since, according to the postgresql docs, theirs no performance > difference between VARCHAR and TEXT, I'm assuming VARCHAR is identical > to TEXT entries with a restriction set on the length. And since TEXT > can be of any possible size, then they must be stored independently of No. > the rest of the table which is probably all stored in a fixed size rows No, Postgres uses variable length records. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] TEXT field and Postgresql Perfomance
I guess my question that would follow is, when does it work best to start using BLOBs/CLOBs (I forget if pg has CLOBs) instead of textfields because your table is going to balloon in disk blocks if you have large amounts of data, and all fields you want to search on would have to be indexed, increasing insert time substantialy. Does it ever pay to use text and not CLOB unless your text is going to be short, in which case why not just varchar, leading to the thought that the text datatype is just bad? Alex Turner NetEconomist On Fri, 7 Jan 2005 22:03:23 -0600, Bruno Wolff III <[EMAIL PROTECTED]> wrote: > On Fri, Jan 07, 2005 at 19:36:47 -0800, > "Loren M. Lang" <[EMAIL PROTECTED]> wrote: > > Do large TEXT or VARCHAR entries in postgresql cause any performance > > degradation when a query is being executed to search for data in a table > > where the TEXT/VARCHAR fields aren't being searched themselves? > > Yes in that the data is more spread out because of the wider rows and that > results in more disk blocks being looked at to get the desired data. > > > Since, according to the postgresql docs, theirs no performance > > difference between VARCHAR and TEXT, I'm assuming VARCHAR is identical > > to TEXT entries with a restriction set on the length. And since TEXT > > can be of any possible size, then they must be stored independently of > > No. > > > the rest of the table which is probably all stored in a fixed size rows > > No, Postgres uses variable length records. > > ---(end of broadcast)--- > TIP 7: don't forget to increase your free space map settings > ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] TEXT field and Postgresql Perfomance
On Fri, Jan 07, 2005 at 10:03:23PM -0600, Bruno Wolff III wrote: > On Fri, Jan 07, 2005 at 19:36:47 -0800, "Loren M. Lang" <[EMAIL PROTECTED]> > wrote: > > > Since, according to the postgresql docs, theirs no performance > > difference between VARCHAR and TEXT, I'm assuming VARCHAR is identical > > to TEXT entries with a restriction set on the length. And since TEXT > > can be of any possible size, then they must be stored independently of > > No. > > > the rest of the table which is probably all stored in a fixed size rows > > No, Postgres uses variable length records. A discussion of TOAST and ALTER TABLE SET STORAGE might be appropriate here, but I'll defer that to somebody who understands such things better than me. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] TEXT field and Postgresql Perfomance
Bruno Wolff III <[EMAIL PROTECTED]> writes: > On Fri, Jan 07, 2005 at 19:36:47 -0800, > "Loren M. Lang" <[EMAIL PROTECTED]> wrote: >> Do large TEXT or VARCHAR entries in postgresql cause any performance >> degradation when a query is being executed to search for data in a table >> where the TEXT/VARCHAR fields aren't being searched themselves? > Yes in that the data is more spread out because of the wider rows and that > results in more disk blocks being looked at to get the desired data. You are overlooking the effects of TOAST. Fields wider than a kilobyte or two will be pushed out-of-line and will thereby not impose a penalty on queries that only access the other fields in the table. (If Loren's notion of "large" is "a hundred bytes" then there may be a measurable impact. If it's "a hundred K" then there won't be.) regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]