Re: [GENERAL] Summing activity intervals without any obvious column to group by

David Johnston Mon, 13 Aug 2012 19:10:41 -0700

On Aug 13, 2012, at 21:22, Carey Tilden <carey.til...@gmail.com> wrote:


> 
> 
> On Mon, Aug 13, 2012 at 6:01 PM, David Johnston <pol...@yahoo.com> wrote:
> On Aug 13, 2012, at 20:28, Carey Tilden <carey.til...@gmail.com> wrote:
> 
> > Apologies for the awkward title.  I haven't quite thought of the right way 
> > to describe my problem, which may be why I've had a hard time figuring out 
> > how to solve it.  I have a list of program start/stop times, and I want to 
> > know how long each run takes to complete.  The thing that's really tripping 
> > me up is there are gaps in the sequence.  I've figured out how to collapse 
> > the results down to a single row per attempt, but I can't quite figure out 
> > how to further collapse down each full run to its own row.  It'd be easy if 
> > I had a session_id or something to group on, but I don't.  All I have are 
> > the start/stop times.
> >
> > Here's some sample data.  Hopefully this clarifies what I'm talking about:
> >
> >     drop table if exists program_runs;
> >
> >     create temporary table program_runs (
> >         id serial,
> >         time_stamp timestamptz,
> >         action text
> >     );
> >
> >     insert into program_runs (time_stamp, action) values
> >         ('2012-01-01 10:00:00 PST', 'started'), ('2012-01-01 10:10:00 PST', 
> > 'stopped early'),
> >         ('2012-01-01 10:20:00 PST', 'started'), ('2012-01-01 10:30:00 PST', 
> > 'stopped early'),
> >         ('2012-01-01 10:40:00 PST', 'started'), ('2012-01-01 10:47:00 PST', 
> > 'completed'),
> >         ('2012-01-01 10:50:00 PST', 'started'), ('2012-01-01 11:00:00 PST', 
> > 'stopped early'),
> >         ('2012-01-01 11:10:00 PST', 'started'), ('2012-01-01 11:13:00 PST', 
> > 'completed'),
> >         ('2012-01-01 11:20:00 PST', 'started'), ('2012-01-01 11:30:00 PST', 
> > 'stopped early'),
> >         ('2012-01-01 11:40:00 PST', 'started'), ('2012-01-01 11:50:00 PST', 
> > 'stopped early'),
> >         ('2012-01-01 12:00:00 PST', 'started'), ('2012-01-01 12:10:00 PST', 
> > 'stopped early'),
> >         ('2012-01-01 12:20:00 PST', 'started'), ('2012-01-01 12:29:00 PST', 
> > 'completed');
> >
> >     select
> >         this_time_stamp as starting_time_stamp,
> >         next_time_stamp - this_time_stamp as time_elapsed,
> >         next_action as closing_action
> >     from (
> >         select
> >             time_stamp as this_time_stamp, lead(time_stamp) over (order by 
> > id) as next_time_stamp,
> >             action as this_action, lead(action) over (order by id) as 
> > next_action,
> >             id as this_id, lead(id) over (order by id) as next_id
> >         from program_runs
> >     ) q
> >     where this_action = 'started';
> >
> > Note that each run has a pair of entries in the table.  The first is always 
> > "started", but the second may be either "stopped early" or "completed".  
> > The final results I'd like to see are:
> >
> >       starting_time_stamp   | total_time_elapsed
> >     ------------------------+--------------------
> >      2012-01-01 10:00:00-08 | 00:27:00
> >      2012-01-01 10:50:00-08 | 00:13:00
> >      2012-01-01 11:20:00-08 | 00:39:00
> >
> > Hope that's enough detail.  Any ideas or suggestions gladly accepted!
> >
> > Regards,
> > Carey
> 
> First artificially generate row (pair) identifiers by integer dividing the 
> ordered row number by 2.
> 
> Using window or sub-queries identify the bookends for each group (i.e., the 
> identifier for each completed and the prior completed).  Give these groups 
> artificial session identifiers/row numbers.
> 
> Assign the artificial session id to each transaction row by using the 
> bookends.
> 
> This is the part where I draw a blank.  How would I do that?  Seems like it 
> should be easy with window functions, but I just can't think of the way to do 
> it.

With detail as ()
, bookmarks as ()
Select detail.id, bookmarks.id, ...
From detail
Join bookmarks on ( detail.id between bookmarks.startid and bookmarks.endid )

Bookmarks as (
Select row_number() over () as id,
detail.id, coalesce(min(detail.id) over (rows 1 preceeding),0) --coalesce for 
the first row
From detail where 'completed'
)

Probably need to play with row ordering but this should get you started.

>  
> Now you have identifiers with which to group.
> 
> This makes a number of assumptions regarding the form of the input data. It 
> will solve for your example data but it may not generalize.  In particular it 
> assumes non-overlapping sessions.
> 
> The assumptions hold fairly well.  Sessions do not overlap, thankfully.  
> There are different program runs to untangle, but that's simple enough (order 
> by program_name, time_stamp).
>  
> Thanks for the suggestions so far!
> 
> Carey
>

Re: [GENERAL] Summing activity intervals without any obvious column to group by

Reply via email to