On Mon, Aug 13, 2012 at 6:01 PM, David Johnston <pol...@yahoo.com> wrote:
> On Aug 13, 2012, at 20:28, Carey Tilden <carey.til...@gmail.com> wrote: > > > Apologies for the awkward title. I haven't quite thought of the right > way to describe my problem, which may be why I've had a hard time figuring > out how to solve it. I have a list of program start/stop times, and I want > to know how long each run takes to complete. The thing that's really > tripping me up is there are gaps in the sequence. I've figured out how to > collapse the results down to a single row per attempt, but I can't quite > figure out how to further collapse down each full run to its own row. It'd > be easy if I had a session_id or something to group on, but I don't. All I > have are the start/stop times. > > > > Here's some sample data. Hopefully this clarifies what I'm talking > about: > > > > drop table if exists program_runs; > > > > create temporary table program_runs ( > > id serial, > > time_stamp timestamptz, > > action text > > ); > > > > insert into program_runs (time_stamp, action) values > > ('2012-01-01 10:00:00 PST', 'started'), ('2012-01-01 10:10:00 > PST', 'stopped early'), > > ('2012-01-01 10:20:00 PST', 'started'), ('2012-01-01 10:30:00 > PST', 'stopped early'), > > ('2012-01-01 10:40:00 PST', 'started'), ('2012-01-01 10:47:00 > PST', 'completed'), > > ('2012-01-01 10:50:00 PST', 'started'), ('2012-01-01 11:00:00 > PST', 'stopped early'), > > ('2012-01-01 11:10:00 PST', 'started'), ('2012-01-01 11:13:00 > PST', 'completed'), > > ('2012-01-01 11:20:00 PST', 'started'), ('2012-01-01 11:30:00 > PST', 'stopped early'), > > ('2012-01-01 11:40:00 PST', 'started'), ('2012-01-01 11:50:00 > PST', 'stopped early'), > > ('2012-01-01 12:00:00 PST', 'started'), ('2012-01-01 12:10:00 > PST', 'stopped early'), > > ('2012-01-01 12:20:00 PST', 'started'), ('2012-01-01 12:29:00 > PST', 'completed'); > > > > select > > this_time_stamp as starting_time_stamp, > > next_time_stamp - this_time_stamp as time_elapsed, > > next_action as closing_action > > from ( > > select > > time_stamp as this_time_stamp, lead(time_stamp) over (order > by id) as next_time_stamp, > > action as this_action, lead(action) over (order by id) as > next_action, > > id as this_id, lead(id) over (order by id) as next_id > > from program_runs > > ) q > > where this_action = 'started'; > > > > Note that each run has a pair of entries in the table. The first is > always "started", but the second may be either "stopped early" or > "completed". The final results I'd like to see are: > > > > starting_time_stamp | total_time_elapsed > > ------------------------+-------------------- > > 2012-01-01 10:00:00-08 | 00:27:00 > > 2012-01-01 10:50:00-08 | 00:13:00 > > 2012-01-01 11:20:00-08 | 00:39:00 > > > > Hope that's enough detail. Any ideas or suggestions gladly accepted! > > > > Regards, > > Carey > > First artificially generate row (pair) identifiers by integer dividing the > ordered row number by 2. > > Using window or sub-queries identify the bookends for each group (i.e., > the identifier for each completed and the prior completed). Give these > groups artificial session identifiers/row numbers. > > Assign the artificial session id to each transaction row by using the > bookends. > This is the part where I draw a blank. How would I do that? Seems like it should be easy with window functions, but I just can't think of the way to do it. > Now you have identifiers with which to group. > > This makes a number of assumptions regarding the form of the input data. > It will solve for your example data but it may not generalize. In > particular it assumes non-overlapping sessions. > The assumptions hold fairly well. Sessions do not overlap, thankfully. There are different program runs to untangle, but that's simple enough (order by program_name, time_stamp). Thanks for the suggestions so far! Carey