Triggered by Michael mentioning subqueries I ended up trying explain select d.short_id,mv.timestamp ,mv.I64_01 from device d, device_configuration dc, measurement_value mv where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and mv.timestamp > '2020-11-06'::timestamp - (select tzn.utc_offset from pg_timezone_names tzn where tzn.name=dc.timezone) and mv.timestamp < '2020-11-07'::timestamp - (select tzn.utc_offset from pg_timezone_names tzn where tzn.name=dc.timezone)
==> Nested Loop (cost=25.85..84540074.64 rows=16996885 width=20) -> Seq Scan on device_configuration dc (cost=0.00..470.01 rows=615 width=30) Filter: latest -> Nested Loop (cost=25.85..137027.83 rows=43494 width=36) -> Index Scan using device_pkey on device d (cost=0.28..7.23 rows=1 width=20) Index Cond: (id = dc.device_id) -> Index Scan using measurement_values_pkey on measurement_value mv (cost=25.58..136585.66 rows=43494 width=20) Index Cond: ((device_id = d.short_id) AND ("timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - (SubPlan 1))) AND ("timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - (SubPlan 2)))) SubPlan 1 -> Function Scan on pg_timezone_names (cost=0.00..12.50 rows=5 width=16) Filter: (name = dc.timezone) SubPlan 2 -> Function Scan on pg_timezone_names pg_timezone_names_1 (cost=0.00..12.50 rows=5 width=16) Filter: (name = dc.timezone) Now returns the 320K in less than 5sec. I was till now convinced that correlated subqueries or joins are equivalent. I guess I was wrong :). Wonder how stable this plan will be though Peter > On 10 Nov 2020, at 09:06, Peter Coppens <peter.copp...@datylon.com> wrote: > > Pavel > > Tx for the tip. But given that if the I64_01 column is removed from the > select list, the index is used I guess the cast is not likely to be the cause. > > Like so > > explain > select d.short_id,mv.timestamp --,mv.I64_01 > from device d, device_configuration dc, measurement_value mv, > pg_timezone_names tzn > where mv.device_id=d.short_id and dc.device_id = d.id <http://d.id/> and > dc.latest=true and dc.timezone=tzn.name and > mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and > mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset > > > ==> > > Nested Loop (cost=1.13..6217004.08 rows=60425437 width=12) > -> Nested Loop (cost=0.56..21334.84 rows=2186 width=20) > Join Filter: (dc.timezone = pg_timezone_names.name) > -> Nested Loop (cost=0.56..7497.34 rows=615 width=18) > -> Index Scan using device_short_id_key on device d > (cost=0.28..2423.90 rows=683 width=20) > -> Index Scan using device_configuration_device_latest_idx on > device_configuration dc (cost=0.28..7.42 rows=1 width=30) > Index Cond: ((device_id = d.id <http://d.id/>) AND > (latest = true)) > Filter: latest > -> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 > width=48) > -> Index Only Scan using measurement_values_pkey on measurement_value mv > (cost=0.57..2399.33 rows=43492 width=12) > Index Cond: ((device_id = d.short_id) AND ("timestamp" > ('2020-11-06 > 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND > ("timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - > pg_timezone_names.utc_offset))) > > > Peter > >> On 10 Nov 2020, at 08:25, Pavel Stehule <pavel.steh...@gmail.com >> <mailto:pavel.steh...@gmail.com>> wrote: >> >> >> >> út 10. 11. 2020 v 8:18 odesílatel Peter Coppens <peter.copp...@datylon.com >> <mailto:peter.copp...@datylon.com>> napsal: >> Michael >> >> Many thanks for spending your time on this. Your alternative does not help >> unfortunately (see execution plan) >> >> Still a sequential scan on the complete table. I have tried many >> alternatives and somehow whenever I add a column that is not in the index >> (I64_01) the optimizer decides not to use the index. If I remove that >> column, the index is used. I guess it estimates that the extra indirection >> from index pages to the row pages is more costly than scanning the 168M >> records. Pretty sure it’s not, but I cannot explain it to the stubborn thing >> :) >> >> Btw, thanks for the >= tip (I was aware of it) >> >> Wkr, >> >> Peter >> >> >> Hash Join (cost=683.93..7270857.46 rows=458127 width=20) >> Hash Cond: (mv_inner.device_id = d.short_id) >> Join Filter: ((mv_inner."timestamp" > ('2020-11-06 00:00:00'::timestamp >> without time zone - pg_timezone_names.utc_offset)) AND (mv_inner."timestamp" >> < ('2020-11-07 00:00:00'::timestamp without time zone - >> pg_timezone_names.utc_offset))) >> -> Seq Scan on measurement_value mv_inner (cost=0.00..7166797.33 >> rows=1287989 width=1006) >> Filter: (("timestamp" > '2020-11-05 00:00:00'::timestamp without >> time zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time >> zone)) >> >> when you see cast in filter, then you should check type equality in >> constraints. With some exception Postgres uses indexes only when filtered >> value has same type like column type. >> >> Maybe there is inconsistency between timestamp (with time zone), and >> timestamp without time zone >> >> Regards >> >> Pavel >> >> >> -> Hash (cost=656.61..656.61 rows=2186 width=20) >> -> Hash Join (cost=77.87..656.61 rows=2186 width=20) >> Hash Cond: (dc.timezone = pg_timezone_names.name >> <http://pg_timezone_names.name/>) >> -> Hash Join (cost=55.37..533.83 rows=615 width=18) >> Hash Cond: (dc.device_id = d.id <http://d.id/>) >> -> Seq Scan on device_configuration dc >> (cost=0.00..470.01 rows=615 width=30) >> Filter: latest >> -> Hash (cost=46.83..46.83 rows=683 width=20) >> -> Seq Scan on device d (cost=0.00..46.83 >> rows=683 width=20) >> -> Hash (cost=10.00..10.00 rows=1000 width=48) >> -> Function Scan on pg_timezone_names >> (cost=0.00..10.00 rows=1000 width=48) >> >> >> >>> On 10 Nov 2020, at 01:15, Michael Lewis <mle...@entrata.com >>> <mailto:mle...@entrata.com>> wrote: >>> >>> On Mon, Nov 9, 2020 at 1:11 PM Peter Coppens <peter.copp...@datylon.com >>> <mailto:peter.copp...@datylon.com>> wrote: >>> Adding the tzn.utc_offset results in the fact that the execution plan no >>> longer considers to use the index on the measurement_value table. Is there >>> any way the SQL can be rewritten so that the index is used? Or any other >>> solution so that the query with the timezone offset returns in a comparable >>> time? >>> >>> I am not aware of a best practice to handle this. Your where condition on >>> mv.timestamp now depends on several joins to do a filtering that used to be >>> a static range that can be scanned into the index as a first node in the >>> plan. I have sometimes used a sub-query on a broader condition that allows >>> the use of the index, and then fully reducing the set later. Something like >>> this- >>> >>> select d.short_id,mv.timestamp,mv.I64_01 >>> from device d, device_configuration dc, ( >>> select mv.* >>> from measurement_value AS mv_inner >>> where mv.timestamp > '2020-11-06'::timestamp - interval '1 day' and >>> mv.timestamp < '2020-11-07'::timestamp + interval '1 day' >>> offset 0 /* to prevent in-lining the join to the outside set */ >>> ) mv, pg_timezone_names tzn >>> where mv.device_id=d.short_id and dc.device_id = d.id <http://d.id/> and >>> dc.latest=true and dc.timezone=tzn.name <http://tzn.name/> and >>> mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and >>> mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset >>> >>> By the way, it seems a little odd to be exclusive on both the begin and >>> end. I'd usually expect timestamp >= start_date and timestamp < end_date + >>> interval '1 day' to fully capture a 24 hour period. Right now, you are >>> excluding any data that happens to have a timestamp value with .000000 >>> seconds (midnight exactly). >> >