On Mon, Mar 16, 2015 at 4:16 PM, Israel Brewster <isr...@ravnalaska.net> wrote:
> On Mar 16, 2015, at 2:22 PM, David G. Johnston <david.g.johns...@gmail.com> > wrote: > > > On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver <adrian.kla...@aklaver.com> > wrote: > >> On 03/16/2015 02:57 PM, Israel Brewster wrote: >> >>> I have a table with two timestamp columns for the start time and end >>> time of each record (call them start and end).I'm trying to figure out >>> if there is a way to group these records by "hour of day", that is the >>> record should be included in the group if the hour of the day for the >>> group falls anywhere in the range [start,end]. Obviously each record may >>> well fall into multiple groups under this scenario. >>> >>> The goal here is to figure out, for each hour of the day, a) what is the >>> total number of "active" records for that hour, and b) what is the total >>> "active" time for those records during the hour, with an ultimate goal >>> of figuring out the average active time per record per hour. >>> >>> So, for simplified example, if the table contained three records: >>> >>> start | end >>> ----------------------------------------------------- >>> 2015-03-15 08:15 | 2015-03-15 10:45 >>> 2015-03-15 09:30 | 2015-03-15 10:15 >>> 2015-03-15 10:30 | 2015-03-15 11:30 >>> >>> >>> Then the results should break out something like this: >>> >>> hour | count | sum >>> ----------------------------- >>> 8 | 1 | 0.75 >>> 9 | 2 | 1.5 >>> 10 | 3 | 1.5 >>> 11 | 1 | 0.5 >>> >>> I can then easily manipulate these values to get my ultimate goal of the >>> average, which would of course always be less than or equal to 1. Is >>> this doable in postgress? Or would it be a better idea to simply pull >>> the raw data and post-process in code? Thanks. >>> >> >> Do not have an answer for you, but a question: >> >> What version of Postgres are you on? >> >> This will help determine what tools are available to work with. > > > The following will give you endpoints for your bounds. Version is > important since "range types" could be very useful in this situation - but > you'd still need to generate the bounds info regardless. > > > SELECT * > FROM > (SELECT * FROM generate_series('2015-03-15'::timestamptz, > '2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s > CROSS JOIN > (SELECT end_ts + '1 hour'::interval AS end_ts FROM > generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, '1 > hour'::interval) e (end_ts)) AS e > > You would join this using an ON condition with an OR (start BETWEEN [...] > OR end BETWEEN [...]) - range logic will be better and you may want to > adjust the upper bound by negative 1 (nano-second?) to allow for easier > "<=" logic if using BETWEEN. > > > Thanks, that is very helpful, but are you sure CROSS JOIN is what you > wanted here? using that, I get a 625 row result set where each row from the > first SELECT is paired up with EVERY row from the second select. I would > think I would want the first row of the first SELECT paired up with only > the first row of the second, second row of the first paired with the second > row of the second, etc - i.e. 24 start and end bounds. Or am I missing > something? > No, I rushed things...:( Sorry. My concept is good though but indeed you want to end up with a table having only 24 rows (for the sample). LATERAL may work here but I haven't had a chance to play with it yet. A simple ordinal column to join on would be sufficient. David J.