Pavel

Tx for the tip. But given that if the I64_01 column is removed from the select 
list, the index is used I guess the cast is not likely to be the cause.

Like so

explain 
  select d.short_id,mv.timestamp --,mv.I64_01 
  from device d, device_configuration dc, measurement_value mv, 
pg_timezone_names tzn 
  where mv.device_id=d.short_id and dc.device_id = d.id and dc.latest=true and 
dc.timezone=tzn.name and 
        mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and 
mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset


==>

Nested Loop  (cost=1.13..6217004.08 rows=60425437 width=12)
  ->  Nested Loop  (cost=0.56..21334.84 rows=2186 width=20)
        Join Filter: (dc.timezone = pg_timezone_names.name)
        ->  Nested Loop  (cost=0.56..7497.34 rows=615 width=18)
              ->  Index Scan using device_short_id_key on device d  
(cost=0.28..2423.90 rows=683 width=20)
              ->  Index Scan using device_configuration_device_latest_idx on 
device_configuration dc  (cost=0.28..7.42 rows=1 width=30)
                    Index Cond: ((device_id = d.id) AND (latest = true))
                    Filter: latest
        ->  Function Scan on pg_timezone_names  (cost=0.00..10.00 rows=1000 
width=48)
  ->  Index Only Scan using measurement_values_pkey on measurement_value mv  
(cost=0.57..2399.33 rows=43492 width=12)
        Index Cond: ((device_id = d.short_id) AND ("timestamp" > ('2020-11-06 
00:00:00'::timestamp without time zone - pg_timezone_names.utc_offset)) AND 
("timestamp" < ('2020-11-07 00:00:00'::timestamp without time zone - 
pg_timezone_names.utc_offset)))


Peter

> On 10 Nov 2020, at 08:25, Pavel Stehule <pavel.steh...@gmail.com> wrote:
> 
> 
> 
> út 10. 11. 2020 v 8:18 odesílatel Peter Coppens <peter.copp...@datylon.com 
> <mailto:peter.copp...@datylon.com>> napsal:
> Michael
> 
> Many thanks for spending your time on this. Your alternative does not help 
> unfortunately (see execution plan)
> 
> Still a sequential scan on the complete table. I have tried many alternatives 
> and somehow whenever I add a column that is not in the index (I64_01) the 
> optimizer decides not to use the index. If I remove that column, the index is 
> used. I guess it estimates that the extra indirection from index pages to the 
> row pages is more costly than scanning the 168M records. Pretty sure it’s 
> not, but I cannot explain it to the stubborn thing :)
> 
> Btw, thanks for the >= tip (I was aware of it)
> 
> Wkr,
> 
> Peter
> 
> 
> Hash Join  (cost=683.93..7270857.46 rows=458127 width=20)
>   Hash Cond: (mv_inner.device_id = d.short_id)
>   Join Filter: ((mv_inner."timestamp" > ('2020-11-06 00:00:00'::timestamp 
> without time zone - pg_timezone_names.utc_offset)) AND (mv_inner."timestamp" 
> < ('2020-11-07 00:00:00'::timestamp without time zone - 
> pg_timezone_names.utc_offset)))
>   ->  Seq Scan on measurement_value mv_inner  (cost=0.00..7166797.33 
> rows=1287989 width=1006)
>         Filter: (("timestamp" > '2020-11-05 00:00:00'::timestamp without time 
> zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time zone))
> 
> when you see cast in filter, then you should check type equality in 
> constraints. With some exception Postgres uses indexes only when filtered 
> value has same type like column type.
> 
> Maybe there is inconsistency between timestamp (with time zone), and 
> timestamp without time zone
> 
> Regards
> 
> Pavel
> 
> 
>   ->  Hash  (cost=656.61..656.61 rows=2186 width=20)
>         ->  Hash Join  (cost=77.87..656.61 rows=2186 width=20)
>               Hash Cond: (dc.timezone = pg_timezone_names.name 
> <http://pg_timezone_names.name/>)
>               ->  Hash Join  (cost=55.37..533.83 rows=615 width=18)
>                     Hash Cond: (dc.device_id = d.id <http://d.id/>)
>                     ->  Seq Scan on device_configuration dc  
> (cost=0.00..470.01 rows=615 width=30)
>                           Filter: latest
>                     ->  Hash  (cost=46.83..46.83 rows=683 width=20)
>                           ->  Seq Scan on device d  (cost=0.00..46.83 
> rows=683 width=20)
>               ->  Hash  (cost=10.00..10.00 rows=1000 width=48)
>                     ->  Function Scan on pg_timezone_names  (cost=0.00..10.00 
> rows=1000 width=48)
> 
> 
> 
>> On 10 Nov 2020, at 01:15, Michael Lewis <mle...@entrata.com 
>> <mailto:mle...@entrata.com>> wrote:
>> 
>> On Mon, Nov 9, 2020 at 1:11 PM Peter Coppens <peter.copp...@datylon.com 
>> <mailto:peter.copp...@datylon.com>> wrote:
>> Adding the tzn.utc_offset results in the fact that the execution plan no 
>> longer considers to use the index on the measurement_value table. Is there 
>> any way the SQL can be rewritten so that the index is used? Or any other 
>> solution so that the query with the timezone offset returns in a comparable 
>> time?
>> 
>> I am not aware of a best practice to handle this. Your where condition on 
>> mv.timestamp now depends on several joins to do a filtering that used to be 
>> a static range that can be scanned into the index as a first node in the 
>> plan. I have sometimes used a sub-query on a broader condition that allows 
>> the use of the index, and then fully reducing the set later. Something like 
>> this-
>> 
>> select d.short_id,mv.timestamp,mv.I64_01
>>   from device d, device_configuration dc, (
>> select mv.*
>>   from measurement_value AS mv_inner
>>   where mv.timestamp > '2020-11-06'::timestamp - interval '1 day' and 
>> mv.timestamp < '2020-11-07'::timestamp + interval '1 day'
>> offset 0 /* to prevent in-lining the join to the outside set */
>> ) mv, pg_timezone_names tzn
>>   where mv.device_id=d.short_id and dc.device_id = d.id <http://d.id/> and 
>> dc.latest=true and dc.timezone=tzn.name <http://tzn.name/> and
>>         mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and 
>> mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset
>> 
>> By the way, it seems a little odd to be exclusive on both the begin and end. 
>> I'd usually expect timestamp >= start_date and timestamp < end_date + 
>> interval '1 day' to fully capture a 24 hour period. Right now, you are 
>> excluding any data that happens to have a timestamp value with .000000 
>> seconds (midnight exactly).
> 

Reply via email to