On Aug 13, 2012, at 21:22, Carey Tilden <carey.til...@gmail.com> wrote:
> > > On Mon, Aug 13, 2012 at 6:01 PM, David Johnston <pol...@yahoo.com> wrote: > On Aug 13, 2012, at 20:28, Carey Tilden <carey.til...@gmail.com> wrote: > > > Apologies for the awkward title. I haven't quite thought of the right way > > to describe my problem, which may be why I've had a hard time figuring out > > how to solve it. I have a list of program start/stop times, and I want to > > know how long each run takes to complete. The thing that's really tripping > > me up is there are gaps in the sequence. I've figured out how to collapse > > the results down to a single row per attempt, but I can't quite figure out > > how to further collapse down each full run to its own row. It'd be easy if > > I had a session_id or something to group on, but I don't. All I have are > > the start/stop times. > > > > Here's some sample data. Hopefully this clarifies what I'm talking about: > > > > drop table if exists program_runs; > > > > create temporary table program_runs ( > > id serial, > > time_stamp timestamptz, > > action text > > ); > > > > insert into program_runs (time_stamp, action) values > > ('2012-01-01 10:00:00 PST', 'started'), ('2012-01-01 10:10:00 PST', > > 'stopped early'), > > ('2012-01-01 10:20:00 PST', 'started'), ('2012-01-01 10:30:00 PST', > > 'stopped early'), > > ('2012-01-01 10:40:00 PST', 'started'), ('2012-01-01 10:47:00 PST', > > 'completed'), > > ('2012-01-01 10:50:00 PST', 'started'), ('2012-01-01 11:00:00 PST', > > 'stopped early'), > > ('2012-01-01 11:10:00 PST', 'started'), ('2012-01-01 11:13:00 PST', > > 'completed'), > > ('2012-01-01 11:20:00 PST', 'started'), ('2012-01-01 11:30:00 PST', > > 'stopped early'), > > ('2012-01-01 11:40:00 PST', 'started'), ('2012-01-01 11:50:00 PST', > > 'stopped early'), > > ('2012-01-01 12:00:00 PST', 'started'), ('2012-01-01 12:10:00 PST', > > 'stopped early'), > > ('2012-01-01 12:20:00 PST', 'started'), ('2012-01-01 12:29:00 PST', > > 'completed'); > > > > select > > this_time_stamp as starting_time_stamp, > > next_time_stamp - this_time_stamp as time_elapsed, > > next_action as closing_action > > from ( > > select > > time_stamp as this_time_stamp, lead(time_stamp) over (order by > > id) as next_time_stamp, > > action as this_action, lead(action) over (order by id) as > > next_action, > > id as this_id, lead(id) over (order by id) as next_id > > from program_runs > > ) q > > where this_action = 'started'; > > > > Note that each run has a pair of entries in the table. The first is always > > "started", but the second may be either "stopped early" or "completed". > > The final results I'd like to see are: > > > > starting_time_stamp | total_time_elapsed > > ------------------------+-------------------- > > 2012-01-01 10:00:00-08 | 00:27:00 > > 2012-01-01 10:50:00-08 | 00:13:00 > > 2012-01-01 11:20:00-08 | 00:39:00 > > > > Hope that's enough detail. Any ideas or suggestions gladly accepted! > > > > Regards, > > Carey > > First artificially generate row (pair) identifiers by integer dividing the > ordered row number by 2. > > Using window or sub-queries identify the bookends for each group (i.e., the > identifier for each completed and the prior completed). Give these groups > artificial session identifiers/row numbers. > > Assign the artificial session id to each transaction row by using the > bookends. > > This is the part where I draw a blank. How would I do that? Seems like it > should be easy with window functions, but I just can't think of the way to do > it. With detail as () , bookmarks as () Select detail.id, bookmarks.id, ... From detail Join bookmarks on ( detail.id between bookmarks.startid and bookmarks.endid ) Bookmarks as ( Select row_number() over () as id, detail.id, coalesce(min(detail.id) over (rows 1 preceeding),0) --coalesce for the first row From detail where 'completed' ) Probably need to play with row ordering but this should get you started. > > Now you have identifiers with which to group. > > This makes a number of assumptions regarding the form of the input data. It > will solve for your example data but it may not generalize. In particular it > assumes non-overlapping sessions. > > The assumptions hold fairly well. Sessions do not overlap, thankfully. > There are different program runs to untangle, but that's simple enough (order > by program_name, time_stamp). > > Thanks for the suggestions so far! > > Carey >