Pavel Tx for the tip. But given that if the I64_01 column is removed from the select list, the index is used I guess the cast is not likely to be the cause.
Like so explain select d.short_id,mv.timestamp --,mv.I64_01 from device d, device_configuration dc, measurement_value mv, pg_timezone_names tzn where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and dc.timezone=tzn.name and mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset ==> Nested Loop (cost=1.13..6217004.08 rows=60425437 width=12) -> Nested Loop (cost=0.56..21334.84 rows=2186 width=20) Join Filter: (dc.timezone = pg_timezone_names.name) -> Nested Loop (cost=0.56..7497.34 rows=615 width=18) -> Index Scan using device_short_id_key on device d (cost=0.28..2423.90 rows=683 width=20) -> Index Scan using device_configuration_device_latest_idx on device_configuration dc (cost=0.28..7.42 rows=1 width=30) Index Cond: ((device_id = d.id) AND (latest = true)) Filter: latest -> Function Scan on pg_timezone_names (cost=0.00..10.00 rows=1000 width=48) -> Index Only Scan using measurement_values_pkey on measurement_value mv (cost=0.57..2399.33 rows=43492 width=12) Index Cond: ((device_id = d.short_id) AND ("timestamp" > ('2020-11-06 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND ("timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset))) Peter > On 10 Nov 2020, at 08:25, Pavel Stehule <pavel.steh...@gmail.com> wrote: > > > > út 10. 11. 2020 v 8:18 odesílatel Peter Coppens <peter.copp...@datylon.com > <mailto:peter.copp...@datylon.com>> napsal: > Michael > > Many thanks for spending your time on this. Your alternative does not help > unfortunately (see execution plan) > > Still a sequential scan on the complete table. I have tried many alternatives > and somehow whenever I add a column that is not in the index (I64_01) the > optimizer decides not to use the index. If I remove that column, the index is > used. I guess it estimates that the extra indirection from index pages to the > row pages is more costly than scanning the 168M records. Pretty sure it’s > not, but I cannot explain it to the stubborn thing :) > > Btw, thanks for the >= tip (I was aware of it) > > Wkr, > > Peter > > > Hash Join (cost=683.93..7270857.46 rows=458127 width=20) > Hash Cond: (mv_inner.device_id = d.short_id) > Join Filter: ((mv_inner."timestamp" > ('2020-11-06 00:00:00'::timestamp > without time zone - pg_timezone_names.utc_offset)) AND (mv_inner."timestamp" > < ('2020-11-07 00:00:00'::timestamp without time zone - > pg_timezone_names.utc_offset))) > -> Seq Scan on measurement_value mv_inner (cost=0.00..7166797.33 > rows=1287989 width=1006) > Filter: (("timestamp" > '2020-11-05 00:00:00'::timestamp without time > zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time zone)) > > when you see cast in filter, then you should check type equality in > constraints. With some exception Postgres uses indexes only when filtered > value has same type like column type. > > Maybe there is inconsistency between timestamp (with time zone), and > timestamp without time zone > > Regards > > Pavel > > > -> Hash (cost=656.61..656.61 rows=2186 width=20) > -> Hash Join (cost=77.87..656.61 rows=2186 width=20) > Hash Cond: (dc.timezone = pg_timezone_names.name > <http://pg_timezone_names.name/>) > -> Hash Join (cost=55.37..533.83 rows=615 width=18) > Hash Cond: (dc.device_id = d.id <http://d.id/>) > -> Seq Scan on device_configuration dc > (cost=0.00..470.01 rows=615 width=30) > Filter: latest > -> Hash (cost=46.83..46.83 rows=683 width=20) > -> Seq Scan on device d (cost=0.00..46.83 > rows=683 width=20) > -> Hash (cost=10.00..10.00 rows=1000 width=48) > -> Function Scan on pg_timezone_names (cost=0.00..10.00 > rows=1000 width=48) > > > >> On 10 Nov 2020, at 01:15, Michael Lewis <mle...@entrata.com >> <mailto:mle...@entrata.com>> wrote: >> >> On Mon, Nov 9, 2020 at 1:11 PM Peter Coppens <peter.copp...@datylon.com >> <mailto:peter.copp...@datylon.com>> wrote: >> Adding the tzn.utc_offset results in the fact that the execution plan no >> longer considers to use the index on the measurement_value table. Is there >> any way the SQL can be rewritten so that the index is used? Or any other >> solution so that the query with the timezone offset returns in a comparable >> time? >> >> I am not aware of a best practice to handle this. Your where condition on >> mv.timestamp now depends on several joins to do a filtering that used to be >> a static range that can be scanned into the index as a first node in the >> plan. I have sometimes used a sub-query on a broader condition that allows >> the use of the index, and then fully reducing the set later. Something like >> this- >> >> select d.short_id,mv.timestamp,mv.I64_01 >> from device d, device_configuration dc, ( >> select mv.* >> from measurement_value AS mv_inner >> where mv.timestamp > '2020-11-06'::timestamp - interval '1 day' and >> mv.timestamp < '2020-11-07'::timestamp + interval '1 day' >> offset 0 /* to prevent in-lining the join to the outside set */ >> ) mv, pg_timezone_names tzn >> where mv.device_id=d.short_id and dc.device_id = d.id <http://d.id/> and >> dc.latest=true and dc.timezone=tzn.name <http://tzn.name/> and >> mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and >> mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset >> >> By the way, it seems a little odd to be exclusive on both the begin and end. >> I'd usually expect timestamp >= start_date and timestamp < end_date + >> interval '1 day' to fully capture a 24 hour period. Right now, you are >> excluding any data that happens to have a timestamp value with .000000 >> seconds (midnight exactly). >