On Thu, Oct 20, 2011 at 8:42 AM, Jeff Adams <jeff.ad...@noaa.gov> wrote:

> Thanks for the reply and assistance. I share your concern that the approach
> may be slow. I am not adverse to creating some sort of table to store the
> time periods with years for the temporal extent of the data if that would
> speed up the process. In that situation there would be multiple records
> (one
> record for each year) that it would need to check against. I suppose I
> could
> then create some sort of aggregate to then count the number of records in
> which the date falls within the range. Anything over 0 would indicate that
> the date does fall within the range?
>
> From: Adam Cornett [mailto:adam.corn...@gmail.com]
> Sent: Wednesday, October 19, 2011 6:51 PM
> To: Jeff Adams
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Date Range Using Months and Days Only
>
>
> On Wed, Oct 19, 2011 at 12:40 PM, Jeff Adams <jeff.ad...@noaa.gov> wrote:
> Greetings,
>
> I have to write a query on a fairly large table of data (>100 million rows)
> where I need to check to see if a date (epoch) falls between a range of
> values. The catch is that the range is defined only by month and day
> values.
> For example the record containing the epoch value will be linked to a table
> that containing columns named start_month, start_day, end_month, end_day
> that define the range. With respect to the range, year does not matter,
> however, some of the ranges will start in November and end in April of the
> next year. Has anyone come across this type of query? I could certainly
> write a function or even include criteria in a query that would extract
> date
> parts of the epoch and then compare against the values in the start_month,
> start_day, end_month, end_day (it might get complex with respect to ranges
> where the start year and end year are different), but I am worried about
> performance. I thought I would seek some input before I floundered through
> the many iterations of poor performing alternatives! Any thoughts would be
> greatly appreciated.
>
> Thanks in advance...
> Jeff
>
>
>
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>
> create table a (
>     id_a integer,
>     epoch integer
> );
>
> create table b (
>     id_b integer,
>     start_month integer,
>     start_day integer,
>     end_month integer,
>     end_day integer
> );
> CREATE OR REPLACE FUNCTION get_timestamps(sm integer, sd integer, em
> integer, ed integer, year integer)
>   RETURNS timestamp without time zone[] AS
> $BODY$
> declare
>     syear integer := year;
>     eyear integer := year;
>     tstamps timestamp[];
> begin
>     if(sm>em) then
>      -- assume that since the end month is less than the start month is in
> the next year
>      eyear := eyear+1;
>     end if;
>     tstamps[0] := (syear|| '-' || sm || '-' || sd)::timestamp;
>     tstamps[1] := (eyear|| '-' || em || '-' || ed)::timestamp;
>     return tstamps;
>
> end
> $BODY$
>   LANGUAGE plpgsql STABLE;
>
> create view a_timestamp as
> SELECT id_a, TIMESTAMP WITH TIME ZONE 'epoch' + epoch * INTERVAL '1 second'
> as tstamp from a;
>
> with ab as(
> select
> id_a,id_b,get_timestamps(b.start_month,b.start_day,b.end_month,b.end_day,
> extract(year from a.tstamp)::integer) ts_arr,a.tstamp
>  from a_timestamp as a,b
> )
> select * from ab
> where ab.tstamp between ts_arr[0] and ts_arr[1]
>
> This obviously isn't a fast solution to your problem, although converting
> the integer epoch to a timestamp in table a would eliminate the view
> a_timestamp and you can index the column for some speed up, the real
> problem
> you're facing is that your ranges don't have years, otherwise you could
> store everything as a timestamp and then just join using 'between' and
> postgres would just need to do an index scan on each table.
>
> -Adam
>
>
Here is an updated method, it involves creating a third table to store the
timestamp values from table 'b' to allow for better lookups, I've also added
a timestamp column to table 'a' (which can be calculated from the epoch
using the view in my previous email.

create table a (
    id_a integer,
    epoch integer,
    ts timestamp
);

CREATE INDEX  ON a (ts);

create table b (
    id_b integer,
    start_month integer,
    start_day integer,
    end_month integer,
    end_day integer
);
CREATE OR REPLACE FUNCTION get_timestamps(sm integer, sd integer, em
integer, ed integer, year integer)
  RETURNS timestamp without time zone[] AS
$BODY$
declare
    syear integer := year;
    eyear integer := year;
    tstamps timestamp[];
begin
    if(sm>em) then
     -- assume that since the end month is less than the start month is in
the next year
     eyear := eyear+1;
    end if;
    tstamps[0] := (syear|| '-' || sm || '-' || sd)::timestamp;
    tstamps[1] := (eyear|| '-' || em || '-' || ed)::timestamp;
    return tstamps;

end
$BODY$
  LANGUAGE plpgsql STABLE;


create table b_calc (
id_b integer,
year integer,
range_start timestamp,
range_end timestamp
);

CREATE INDEX ON b_calc (range_start, range_end);

-- generate the ranges for the last 10 years
insert into b_calc
with bb as(
select id_b,get_timestamps(b.start_month,b.start_day,b.end_month,b.end_day,
yr) ts_arr, yr
 from b, generate_series(1991,2011) as yr
)
select id_b,yr,ts_arr[0],ts_arr[1] from bb;


-- the actual query; explain shows that it uses the two indexes to match up
the ranges

select id_a,id_b from a, b_calc b where a.ts between b.range_start and
b.range_end

Reply via email to