I have a table called stop_event (a stop event is one bus passing one bus stop at a given time for a given route and direction), and I'd like to get the average interval for each stop/route/direction combination.
A few hundred new events are written to the table once every minute. No rows are ever updated (or deleted, except in development). stop_event looks like this: Table "public.stop_event" Column | Type | Modifiers -----------+-----------------------------+----------- stop_time | timestamp without time zone | not null stop | integer | not null bus | integer | not null direction | integer | not null route | integer | not null Foreign-key constraints: "stop_event_direction_id_fkey" FOREIGN KEY (direction) REFERENCES direction(id) "stop_event_route_fkey" FOREIGN KEY (route) REFERENCES route(id) "stop_event_stop" FOREIGN KEY (stop) REFERENCES stop(id) And my query looks like this: SELECT (floor(date_part(E'epoch', avg(interval))) / 60)::INTEGER, route, direction, name, st_asgeojson(stop_location)::JSON FROM (SELECT (stop_time - (lag(stop_time) OVER w)) AS interval, route, direction, name, stop_location FROM stop_event INNER JOIN stop ON (stop_event.stop = stop.id) WINDOW w AS (PARTITION BY route, direction, stop ORDER BY stop_time)) AS all_intervals WHERE (interval IS NOT NULL) GROUP BY route, direction, name, stop_location; With around 1.2 million rows, this takes 20 seconds to run. 1.2 million rows is only about a week's worth of data, so I'd like to figure out a way to make this faster. The EXPLAIN ANALYZE is at http://explain.depesz.com/s/ntC. Clearly the bulk of the time is spent sorting the rows in the original table, and then again sorting the results of the subselect. But I'm afraid I don't really know what to do with this information. Is there any way I can speed this up? Is my use of an aggregate key for stop_event causing problems? Would using a synthetic key help? Thank you for any help you can provide, -Eli