On Wed, Oct 19, 2011 at 12:40 PM, Jeff Adams <jeff.ad...@noaa.gov> wrote:
> Greetings, > > I have to write a query on a fairly large table of data (>100 million rows) > where I need to check to see if a date (epoch) falls between a range of > values. The catch is that the range is defined only by month and day > values. > For example the record containing the epoch value will be linked to a table > that containing columns named start_month, start_day, end_month, end_day > that define the range. With respect to the range, year does not matter, > however, some of the ranges will start in November and end in April of the > next year. Has anyone come across this type of query? I could certainly > write a function or even include criteria in a query that would extract > date > parts of the epoch and then compare against the values in the start_month, > start_day, end_month, end_day (it might get complex with respect to ranges > where the start year and end year are different), but I am worried about > performance. I thought I would seek some input before I floundered through > the many iterations of poor performing alternatives! Any thoughts would be > greatly appreciated. > > Thanks in advance... > Jeff > > > > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > create table a ( id_a integer, epoch integer ); create table b ( id_b integer, start_month integer, start_day integer, end_month integer, end_day integer ); CREATE OR REPLACE FUNCTION get_timestamps(sm integer, sd integer, em integer, ed integer, year integer) RETURNS timestamp without time zone[] AS $BODY$ declare syear integer := year; eyear integer := year; tstamps timestamp[]; begin if(sm>em) then -- assume that since the end month is less than the start month is in the next year eyear := eyear+1; end if; tstamps[0] := (syear|| '-' || sm || '-' || sd)::timestamp; tstamps[1] := (eyear|| '-' || em || '-' || ed)::timestamp; return tstamps; end $BODY$ LANGUAGE plpgsql STABLE; create view a_timestamp as SELECT id_a, TIMESTAMP WITH TIME ZONE 'epoch' + epoch * INTERVAL '1 second' as tstamp from a; with ab as( select id_a,id_b,get_timestamps(b.start_month,b.start_day,b.end_month,b.end_day, extract(year from a.tstamp)::integer) ts_arr,a.tstamp from a_timestamp as a,b ) select * from ab where ab.tstamp between ts_arr[0] and ts_arr[1] This obviously isn't a fast solution to your problem, although converting the integer epoch to a timestamp in table a would eliminate the view a_timestamp and you can index the column for some speed up, the real problem you're facing is that your ranges don't have years, otherwise you could store everything as a timestamp and then just join using 'between' and postgres would just need to do an index scan on each table. -Adam