On Thu, 3 Sep 2020 at 20:59, Michael Lewis <mle...@entrata.com> wrote:
> It seems like you are maybe wanting this- If the previous row is the same, > then get the previous row's run_nr. If it is different, then increment. > > case when lag( property_A ) over() = property_A and lag( property_B ) > over() = property_B then coalesce( lag( run_nr ) over(), 1 ) else lag( > run_nr ) over() + 1 end > > Perhaps there is a much simpler implementation though. > That would work were it not that the very column we're defining is the one to be aliased run_nr. The data does not contain that information, it's what I'm trying to enrich it with and what I'm having trouble wrapping my head around. Your query (adopted a tiny bit) unfortunately results in: select datetime, property_A, property_B , first_value(datetime::time) over run_win as swap_time , case when lag(property_A) over time_win = property_A and lag(property_B) over time_win = property_B then coalesce(lag(run_nr) over time_win, 1) else lag(run_nr) over time_win +1 end , value from process_data window time_win as (order by datetime) , run_win as (partition by property_A, property_B order by datetime) order by datetime ; ERROR: column "run_nr" does not exist LINE 6: then coalesce(lag(run_nr) over time_win, 1) ^ SQL state: 42703 Character: 221 I turned my example into a proper test-case (better late than never): CREATE TABLE process_data ( datetime timestamp without time zone NOT NULL, property_a text NOT NULL, property_b text NOT NULL, value numeric(12,3) ); COPY process_data (datetime, property_a, property_b, value) FROM stdin; 2020-09-03 15:06:00 tea earl grey 0.230 2020-09-03 15:07:00 tea earl grey 0.220 2020-09-03 15:08:00 tea ceylon 0.340 2020-09-03 15:09:00 coffee cappucino 0.450 2020-09-03 15:10:00 coffee cappucino 0.430 2020-09-03 15:11:00 tea earl grey 0.230 \. With the desired result (note that swap_time and run_nr are calculated columns): datetime | property_a | property_b | swap_time | run_nr | value ---------------------+------------+------------+-----------+--------+------- 2020-09-03 15:06:00 | tea | earl grey | 15:06:00 | 1 | 0.230 2020-09-03 15:07:00 | tea | earl grey | 15:06:00 | 1 | 0.220 2020-09-03 15:08:00 | tea | ceylon | 15:08:00 | 2 | 0.340 2020-09-03 15:09:00 | coffee | cappucino | 15:09:00 | 3 | 0.450 2020-09-03 15:10:00 | coffee | cappucino | 15:09:00 | 3 | 0.430 2020-09-03 15:11:00 | tea | earl grey | 15:06:00 | 4 | 0.230 (6 rows) I've been looking around on the Internet in the meantime, and it seems people either solve this with a recursive CTE (referencing the previous row by row_number() over (...)) or by writing a set-returning function that walks over the data in datetime order using a cursor. Since the actual query is growing more and more state-tracking flags, using a function has the added benefit that referencing state columns from the previous row gets a lot easier (lots of repeated window functions otherwise). It would become a procedural solution instead of a set-based one, but considering that this data is order-sensitive (on datetime), that's probably what a set-based solution would also end up doing anyway. Regards, Alban. -- If you can't see the forest for the trees, Cut the trees and you'll see there is no forest.