Re: [GENERAL] Summing activity intervals without any obvious column to group by

Carey Tilden Mon, 13 Aug 2012 18:23:20 -0700

On Mon, Aug 13, 2012 at 6:01 PM, David Johnston <pol...@yahoo.com> wrote:


> On Aug 13, 2012, at 20:28, Carey Tilden <carey.til...@gmail.com> wrote:
>
> > Apologies for the awkward title.  I haven't quite thought of the right
> way to describe my problem, which may be why I've had a hard time figuring
> out how to solve it.  I have a list of program start/stop times, and I want
> to know how long each run takes to complete.  The thing that's really
> tripping me up is there are gaps in the sequence.  I've figured out how to
> collapse the results down to a single row per attempt, but I can't quite
> figure out how to further collapse down each full run to its own row.  It'd
> be easy if I had a session_id or something to group on, but I don't.  All I
> have are the start/stop times.
> >
> > Here's some sample data.  Hopefully this clarifies what I'm talking
> about:
> >
> >     drop table if exists program_runs;
> >
> >     create temporary table program_runs (
> >         id serial,
> >         time_stamp timestamptz,
> >         action text
> >     );
> >
> >     insert into program_runs (time_stamp, action) values
> >         ('2012-01-01 10:00:00 PST', 'started'), ('2012-01-01 10:10:00
> PST', 'stopped early'),
> >         ('2012-01-01 10:20:00 PST', 'started'), ('2012-01-01 10:30:00
> PST', 'stopped early'),
> >         ('2012-01-01 10:40:00 PST', 'started'), ('2012-01-01 10:47:00
> PST', 'completed'),
> >         ('2012-01-01 10:50:00 PST', 'started'), ('2012-01-01 11:00:00
> PST', 'stopped early'),
> >         ('2012-01-01 11:10:00 PST', 'started'), ('2012-01-01 11:13:00
> PST', 'completed'),
> >         ('2012-01-01 11:20:00 PST', 'started'), ('2012-01-01 11:30:00
> PST', 'stopped early'),
> >         ('2012-01-01 11:40:00 PST', 'started'), ('2012-01-01 11:50:00
> PST', 'stopped early'),
> >         ('2012-01-01 12:00:00 PST', 'started'), ('2012-01-01 12:10:00
> PST', 'stopped early'),
> >         ('2012-01-01 12:20:00 PST', 'started'), ('2012-01-01 12:29:00
> PST', 'completed');
> >
> >     select
> >         this_time_stamp as starting_time_stamp,
> >         next_time_stamp - this_time_stamp as time_elapsed,
> >         next_action as closing_action
> >     from (
> >         select
> >             time_stamp as this_time_stamp, lead(time_stamp) over (order
> by id) as next_time_stamp,
> >             action as this_action, lead(action) over (order by id) as
> next_action,
> >             id as this_id, lead(id) over (order by id) as next_id
> >         from program_runs
> >     ) q
> >     where this_action = 'started';
> >
> > Note that each run has a pair of entries in the table.  The first is
> always "started", but the second may be either "stopped early" or
> "completed".  The final results I'd like to see are:
> >
> >       starting_time_stamp   | total_time_elapsed
> >     ------------------------+--------------------
> >      2012-01-01 10:00:00-08 | 00:27:00
> >      2012-01-01 10:50:00-08 | 00:13:00
> >      2012-01-01 11:20:00-08 | 00:39:00
> >
> > Hope that's enough detail.  Any ideas or suggestions gladly accepted!
> >
> > Regards,
> > Carey
>
> First artificially generate row (pair) identifiers by integer dividing the
> ordered row number by 2.
>
> Using window or sub-queries identify the bookends for each group (i.e.,
> the identifier for each completed and the prior completed).  Give these
> groups artificial session identifiers/row numbers.
>
> Assign the artificial session id to each transaction row by using the
> bookends.
>

This is the part where I draw a blank.  How would I do that?  Seems like it
should be easy with window functions, but I just can't think of the way to
do it.


> Now you have identifiers with which to group.
>
> This makes a number of assumptions regarding the form of the input data.
> It will solve for your example data but it may not generalize.  In
> particular it assumes non-overlapping sessions.
>

The assumptions hold fairly well.  Sessions do not overlap, thankfully.
There are different program runs to untangle, but that's simple enough
(order by program_name, time_stamp).

Thanks for the suggestions so far!

Carey

Re: [GENERAL] Summing activity intervals without any obvious column to group by

Reply via email to