Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Yeb Havinga
Hello David, The table aggregates 237 million rows from its child tables. The sluggishness comes from this part of the query: m.taken BETWEEN /* Start date. */ (extract( YEAR FROM m.taken )||'-01-01')::date AND /* End date. Calculated by checking to see if the end da

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Thom Brown
On 20 May 2010 06:06, David Jarvis wrote: > Hi, > > I recently switched to PostgreSQL from MySQL so that I can use PL/R for data > analysis. The query in MySQL form (against a more complex table structure) > takes ~5 seconds to run. The query in PostgreSQL I have yet to let finish, > as it takes o

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Matthew Wakeling
On Wed, 19 May 2010, David Jarvis wrote: extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND That portion of the WHERE clause cannot use an index on m.taken. Postgres does not look inside functions (like extract) to see if something indexable is present. To get an index to work, you c

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Tom Lane
Matthew Wakeling writes: > On Wed, 19 May 2010, David Jarvis wrote: >> extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND > That portion of the WHERE clause cannot use an index on m.taken. Postgres > does not look inside functions (like extract) to see if something > indexable is present. T

Re: [PERFORM] merge join killing performance

2010-05-20 Thread Tom Lane
Scott Marlowe writes: > So, Tom, so you think it's possible that the planner isn't noticing > all those nulls and thinks it'll just take a row or two to get to the > value it needs to join on? Could be. I don't have time right now to chase through the code, but that sounds like a plausible theor

Re: [PERFORM] merge join killing performance

2010-05-20 Thread Scott Marlowe
On Thu, May 20, 2010 at 8:28 AM, Tom Lane wrote: > Scott Marlowe writes: >> So, Tom, so you think it's possible that the planner isn't noticing >> all those nulls and thinks it'll just take a row or two to get to the >> value it needs to join on? > > Could be.  I don't have time right now to chas

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, I have posted an image of the user inputs here: http://i.imgur.com/MUkuZ.png The problem is that I am given a range of days (Dec 22 - Mar 22) over a range of years (1900 - 2009) and the range of days can span from one year to the next. This is not the same as saying Dec 22, 1900 to Mar 22, 2

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Thom Brown
On 20 May 2010 17:36, David Jarvis wrote: > Hi, Thom. > > The query is given two items: > > Range of years > Range of days > > I need to select all data between the range of days (e.g., Dec 22 - Mar 22) > over the range of years (e.g., 1950 - 1970), such as shown here: > > http://i.imgur.com/MUkuZ

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Thom Brown
On 20 May 2010 19:36, Thom Brown wrote: > On 20 May 2010 17:36, David Jarvis wrote: >> Hi, Thom. >> >> The query is given two items: >> >> Range of years >> Range of days >> >> I need to select all data between the range of days (e.g., Dec 22 - Mar 22) >> over the range of years (e.g., 1950 - 197

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Tom Lane
Thom Brown writes: > On 20 May 2010 17:36, David Jarvis wrote: > Okay, get your app to convert the month-date to a day of year, so we > have year_start, year_end, day_of_year_start, day_of_year_end > and your where clause would something like this: > WHERE extract(YEAR from m.taken) BETWEEN yea

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Thom Brown
On 20 May 2010 20:02, Tom Lane wrote: > Thom Brown writes: >> On 20 May 2010 17:36, David Jarvis wrote: >> Okay, get your app to convert the month-date to a day of year, so we >> have year_start, year_end, day_of_year_start, day_of_year_end > >> and your where clause would something like this: >

Re: [PERFORM] [BUGS] Query causing explosion of temp space with join involving partitioning

2010-05-20 Thread Tom Lane
Krzysztof Nienartowicz writes: > surveys-> SELECT t1.SURVEY_PK, t1.SOURCE_PK, t1.TSTYPE, t1.VALS > surveys-> FROM sources t0 ,TS t1 where > surveys-> (t0.SURVEYID = 16 AND t0.SRCID >= 203510110032281 AND > t0.SRCID <= 203520107001677 and t0.SURVEYID = t1.SURVEY_PK AND t0.SRCID = > t1.SOURCE_

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
When using MySQL, the performance was okay (~5 seconds per query) using: date( concat_ws( '-', y.year, m.month, d.day ) ) between -- Start date. date( concat_ws( '-', y.year, $P{Month1}, $P{Day1} ) ) AND -- End date. Calculated by checking to see if the end date wraps -- into the

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Tom Lane
David Jarvis writes: > I was thinking that I could add three more columns to the measurement table: > year_taken, month_taken, day_taken > Then index those. That should allow me to avoid extracting years, months, > and days from the *m.taken* date column. You could, but I don't think there's any

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
What if I were to have the application pass in two sets of date ranges? For the condition of Dec 22 to Mar 22: Dec 22 would become: - Dec 22 - Dec 31 Mar 22 would become: - Jan 1 - Mar 22 The first range would always be for the current year; the second range would always be for the year

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Tom Lane
David Jarvis writes: > What if I were to have the application pass in two sets of date ranges? > For the condition of Dec 22 to Mar 22: > Dec 22 would become: >- Dec 22 - Dec 31 > Mar 22 would become: >- Jan 1 - Mar 22 I think what you're essentially describing here is removing the OR fro

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
I was hoping to eliminate this part of the query: (cast(extract( YEAR FROM m.taken ) + greatest( -1 * sign( (extract( YEAR FROM m.taken )||'-12-31')::date - (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 ) AS text)||'-12-31')::date That uses

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Tom Lane
David Jarvis writes: > I was hoping to eliminate this part of the query: > (cast(extract( YEAR FROM m.taken ) + greatest( -1 * > sign( > (extract( YEAR FROM m.taken )||'-12-31')::date - > (extract( YEAR FROM m.taken )||'-01-01')::date ), 0 > ) AS t

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, I was still referring to the measurement table. You have an index on > stationid, but still seem to be getting a sequential scan. Maybe the planner > does not realise that you are selecting a small number of stations. Posting > an EXPLAIN ANALYSE would really help here. > Here is the result f

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
The greatest() expression reduces to either the current year (year + 0) or the next year (year + 1) by taking the sign of the difference in start/end days. This allows me to derive an end date, such as: Dec 22, 1900 to Mar 22, 1901 Then I check if the measured date falls between those two dates.

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Stephen Frost
* David Jarvis (thanga...@gmail.com) wrote: > I was hoping to eliminate this part of the query: > (cast(extract( YEAR FROM m.taken ) + greatest( -1 * > sign( > (extract( YEAR FROM m.taken )||'-12-31')::date - > (extract( YEAR FROM m.taken )||'-01-01')::date

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Yeb Havinga
Tom Lane wrote: David Jarvis writes: I was hoping to eliminate this part of the query: (cast(extract( YEAR FROM m.taken ) + greatest( -1 * sign( (extract( YEAR FROM m.taken )||'-12-31')::date - (extract( YEAR FROM m.taken )||'-01-01')::date ), 0

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Stephen Frost
* David Jarvis (thanga...@gmail.com) wrote: > There are 72 child tables, each having a year index and a station index, > which are defined as follows: S, my thoughts: Partition by something that makes sense... Typically, I'd say that you would do it by the category id and when the measuremen

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Stephen Frost
* David Jarvis (thanga...@gmail.com) wrote: > I was still referring to the measurement table. You have an index on > > stationid, but still seem to be getting a sequential scan. Maybe the planner > > does not realise that you are selecting a small number of stations. Posting > > an EXPLAIN ANALYSE

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, ~300 million measurements ~12000 stations (not 7 as I mentioned before) ~5500 cities some serious data tho, at least. Basically, PG is sequentially scanning > through all of the tables in your partitioning setup. What is > constraint_exclusion set to? What version of PG is this? Do th

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, check (taken >= '1913-12-01' and taken <= '1913-12-31') > I don't think I want to constrain by year, for a few reasons: 1. There are a lot of years -- over 110. 2. There will be more years added (both in the future for 2010 and in the past as I get data from other sources). Currently

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, Something in here really smells fishy to me. Those extract's above are > working on values which are from the table.. Why aren't you using these > functions to figure out how to construct the actual dates based on the > values provided by the *user*..? > Because I've only been using Postgre

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
I took out the date conditions: SELECT m.* FROM climate.measurement m WHERE m.category_id = 1 and m.station_id = 2043 This uses the station indexes: "Result (cost=0.00..21781.18 rows=8090 width=28)" " -> Append (cost=0.00..21781.18 rows=8090 width=28)" "-> Seq Scan on measur

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread Matthew Wakeling
On Thu, 20 May 2010, David Jarvis wrote: I took out the date conditions: SELECT m.* FROM climate.measurement m WHERE m.category_id = 1 and m.station_id = 2043 This uses the station indexes: Yes, because there is only one station_id selected. That's exactly what an index is for. Then c

Re: [PERFORM] Optimize date query for large child tables: GiST or GIN?

2010-05-20 Thread David Jarvis
Hi, (An EXPLAIN ANALYSE would be better here). Look at the expected number of > stations "Nested Loop (cost=0.00..994.94 rows=4046 width=4) (actual time=0.053..41.173 rows=78 loops=1)" " Join Filter: ((6371.009::double precision * sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::