Hello David,
The table aggregates 237 million rows from its child tables. The
sluggishness comes from this part of the query:
m.taken BETWEEN
/* Start date. */
(extract( YEAR FROM m.taken )||'-01-01')::date AND
/* End date. Calculated by checking to see if the end da
On 20 May 2010 06:06, David Jarvis wrote:
> Hi,
>
> I recently switched to PostgreSQL from MySQL so that I can use PL/R for data
> analysis. The query in MySQL form (against a more complex table structure)
> takes ~5 seconds to run. The query in PostgreSQL I have yet to let finish,
> as it takes o
On Wed, 19 May 2010, David Jarvis wrote:
extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND
That portion of the WHERE clause cannot use an index on m.taken. Postgres
does not look inside functions (like extract) to see if something
indexable is present. To get an index to work, you c
Matthew Wakeling writes:
> On Wed, 19 May 2010, David Jarvis wrote:
>> extract( YEAR FROM m.taken ) BETWEEN 1900 AND 2009 AND
> That portion of the WHERE clause cannot use an index on m.taken. Postgres
> does not look inside functions (like extract) to see if something
> indexable is present. T
Scott Marlowe writes:
> So, Tom, so you think it's possible that the planner isn't noticing
> all those nulls and thinks it'll just take a row or two to get to the
> value it needs to join on?
Could be. I don't have time right now to chase through the code, but
that sounds like a plausible theor
On Thu, May 20, 2010 at 8:28 AM, Tom Lane wrote:
> Scott Marlowe writes:
>> So, Tom, so you think it's possible that the planner isn't noticing
>> all those nulls and thinks it'll just take a row or two to get to the
>> value it needs to join on?
>
> Could be. I don't have time right now to chas
Hi,
I have posted an image of the user inputs here:
http://i.imgur.com/MUkuZ.png
The problem is that I am given a range of days (Dec 22 - Mar 22) over a
range of years (1900 - 2009) and the range of days can span from one year to
the next. This is not the same as saying Dec 22, 1900 to Mar 22, 2
On 20 May 2010 17:36, David Jarvis wrote:
> Hi, Thom.
>
> The query is given two items:
>
> Range of years
> Range of days
>
> I need to select all data between the range of days (e.g., Dec 22 - Mar 22)
> over the range of years (e.g., 1950 - 1970), such as shown here:
>
> http://i.imgur.com/MUkuZ
On 20 May 2010 19:36, Thom Brown wrote:
> On 20 May 2010 17:36, David Jarvis wrote:
>> Hi, Thom.
>>
>> The query is given two items:
>>
>> Range of years
>> Range of days
>>
>> I need to select all data between the range of days (e.g., Dec 22 - Mar 22)
>> over the range of years (e.g., 1950 - 197
Thom Brown writes:
> On 20 May 2010 17:36, David Jarvis wrote:
> Okay, get your app to convert the month-date to a day of year, so we
> have year_start, year_end, day_of_year_start, day_of_year_end
> and your where clause would something like this:
> WHERE extract(YEAR from m.taken) BETWEEN yea
On 20 May 2010 20:02, Tom Lane wrote:
> Thom Brown writes:
>> On 20 May 2010 17:36, David Jarvis wrote:
>> Okay, get your app to convert the month-date to a day of year, so we
>> have year_start, year_end, day_of_year_start, day_of_year_end
>
>> and your where clause would something like this:
>
Krzysztof Nienartowicz writes:
> surveys-> SELECT t1.SURVEY_PK, t1.SOURCE_PK, t1.TSTYPE, t1.VALS
> surveys-> FROM sources t0 ,TS t1 where
> surveys-> (t0.SURVEYID = 16 AND t0.SRCID >= 203510110032281 AND
> t0.SRCID <= 203520107001677 and t0.SURVEYID = t1.SURVEY_PK AND t0.SRCID =
> t1.SOURCE_
When using MySQL, the performance was okay (~5 seconds per query) using:
date( concat_ws( '-', y.year, m.month, d.day ) ) between
-- Start date.
date( concat_ws( '-', y.year, $P{Month1}, $P{Day1} ) ) AND
-- End date. Calculated by checking to see if the end date wraps
-- into the
David Jarvis writes:
> I was thinking that I could add three more columns to the measurement table:
> year_taken, month_taken, day_taken
> Then index those. That should allow me to avoid extracting years, months,
> and days from the *m.taken* date column.
You could, but I don't think there's any
What if I were to have the application pass in two sets of date ranges?
For the condition of Dec 22 to Mar 22:
Dec 22 would become:
- Dec 22 - Dec 31
Mar 22 would become:
- Jan 1 - Mar 22
The first range would always be for the current year; the second range would
always be for the year
David Jarvis writes:
> What if I were to have the application pass in two sets of date ranges?
> For the condition of Dec 22 to Mar 22:
> Dec 22 would become:
>- Dec 22 - Dec 31
> Mar 22 would become:
>- Jan 1 - Mar 22
I think what you're essentially describing here is removing the OR fro
I was hoping to eliminate this part of the query:
(cast(extract( YEAR FROM m.taken ) + greatest( -1 *
sign(
(extract( YEAR FROM m.taken )||'-12-31')::date -
(extract( YEAR FROM m.taken )||'-01-01')::date ), 0
) AS text)||'-12-31')::date
That uses
David Jarvis writes:
> I was hoping to eliminate this part of the query:
> (cast(extract( YEAR FROM m.taken ) + greatest( -1 *
> sign(
> (extract( YEAR FROM m.taken )||'-12-31')::date -
> (extract( YEAR FROM m.taken )||'-01-01')::date ), 0
> ) AS t
Hi,
I was still referring to the measurement table. You have an index on
> stationid, but still seem to be getting a sequential scan. Maybe the planner
> does not realise that you are selecting a small number of stations. Posting
> an EXPLAIN ANALYSE would really help here.
>
Here is the result f
The greatest() expression reduces to either the current year (year + 0) or
the next year (year + 1) by taking the sign of the difference in start/end
days. This allows me to derive an end date, such as:
Dec 22, 1900 to Mar 22, 1901
Then I check if the measured date falls between those two dates.
* David Jarvis (thanga...@gmail.com) wrote:
> I was hoping to eliminate this part of the query:
> (cast(extract( YEAR FROM m.taken ) + greatest( -1 *
> sign(
> (extract( YEAR FROM m.taken )||'-12-31')::date -
> (extract( YEAR FROM m.taken )||'-01-01')::date
Tom Lane wrote:
David Jarvis writes:
I was hoping to eliminate this part of the query:
(cast(extract( YEAR FROM m.taken ) + greatest( -1 *
sign(
(extract( YEAR FROM m.taken )||'-12-31')::date -
(extract( YEAR FROM m.taken )||'-01-01')::date ), 0
* David Jarvis (thanga...@gmail.com) wrote:
> There are 72 child tables, each having a year index and a station index,
> which are defined as follows:
S, my thoughts:
Partition by something that makes sense... Typically, I'd say that you
would do it by the category id and when the measuremen
* David Jarvis (thanga...@gmail.com) wrote:
> I was still referring to the measurement table. You have an index on
> > stationid, but still seem to be getting a sequential scan. Maybe the planner
> > does not realise that you are selecting a small number of stations. Posting
> > an EXPLAIN ANALYSE
Hi,
~300 million measurements
~12000 stations (not 7 as I mentioned before)
~5500 cities
some serious data tho, at least. Basically, PG is sequentially scanning
> through all of the tables in your partitioning setup. What is
> constraint_exclusion set to? What version of PG is this? Do th
Hi,
check (taken >= '1913-12-01' and taken <= '1913-12-31')
>
I don't think I want to constrain by year, for a few reasons:
1. There are a lot of years -- over 110.
2. There will be more years added (both in the future for 2010 and in the
past as I get data from other sources).
Currently
Hi,
Something in here really smells fishy to me. Those extract's above are
> working on values which are from the table.. Why aren't you using these
> functions to figure out how to construct the actual dates based on the
> values provided by the *user*..?
>
Because I've only been using Postgre
I took out the date conditions:
SELECT
m.*
FROM
climate.measurement m
WHERE
m.category_id = 1 and
m.station_id = 2043
This uses the station indexes:
"Result (cost=0.00..21781.18 rows=8090 width=28)"
" -> Append (cost=0.00..21781.18 rows=8090 width=28)"
"-> Seq Scan on measur
On Thu, 20 May 2010, David Jarvis wrote:
I took out the date conditions:
SELECT
m.*
FROM
climate.measurement m
WHERE
m.category_id = 1 and
m.station_id = 2043
This uses the station indexes:
Yes, because there is only one station_id selected. That's exactly what an
index is for.
Then c
Hi,
(An EXPLAIN ANALYSE would be better here). Look at the expected number of
> stations
"Nested Loop (cost=0.00..994.94 rows=4046 width=4) (actual
time=0.053..41.173 rows=78 loops=1)"
" Join Filter: ((6371.009::double precision *
sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::
30 matches
Mail list logo