Michael

Many thanks for spending your time on this. Your alternative does not help 
unfortunately (see execution plan)

Still a sequential scan on the complete table. I have tried many alternatives 
and somehow whenever I add a column that is not in the index (I64_01) the 
optimizer decides not to use the index. If I remove that column, the index is 
used. I guess it estimates that the extra indirection from index pages to the 
row pages is more costly than scanning the 168M records. Pretty sure it’s not, 
but I cannot explain it to the stubborn thing :)

Btw, thanks for the >= tip (I was aware of it)

Wkr,

Peter


Hash Join  (cost=683.93..7270857.46 rows=458127 width=20)
  Hash Cond: (mv_inner.device_id = d.short_id)
  Join Filter: ((mv_inner."timestamp" > ('2020-11-06 00:00:00'::timestamp 
without time zone - pg_timezone_names.utc_offset)) AND (mv_inner."timestamp" < 
('2020-11-07 00:00:00'::timestamp without time zone - 
pg_timezone_names.utc_offset)))
  ->  Seq Scan on measurement_value mv_inner  (cost=0.00..7166797.33 
rows=1287989 width=1006)
        Filter: (("timestamp" > '2020-11-05 00:00:00'::timestamp without time 
zone) AND ("timestamp" < '2020-11-08 00:00:00'::timestamp without time zone))
  ->  Hash  (cost=656.61..656.61 rows=2186 width=20)
        ->  Hash Join  (cost=77.87..656.61 rows=2186 width=20)
              Hash Cond: (dc.timezone = pg_timezone_names.name)
              ->  Hash Join  (cost=55.37..533.83 rows=615 width=18)
                    Hash Cond: (dc.device_id = d.id)
                    ->  Seq Scan on device_configuration dc  (cost=0.00..470.01 
rows=615 width=30)
                          Filter: latest
                    ->  Hash  (cost=46.83..46.83 rows=683 width=20)
                          ->  Seq Scan on device d  (cost=0.00..46.83 rows=683 
width=20)
              ->  Hash  (cost=10.00..10.00 rows=1000 width=48)
                    ->  Function Scan on pg_timezone_names  (cost=0.00..10.00 
rows=1000 width=48)



> On 10 Nov 2020, at 01:15, Michael Lewis <mle...@entrata.com> wrote:
> 
> On Mon, Nov 9, 2020 at 1:11 PM Peter Coppens <peter.copp...@datylon.com 
> <mailto:peter.copp...@datylon.com>> wrote:
> Adding the tzn.utc_offset results in the fact that the execution plan no 
> longer considers to use the index on the measurement_value table. Is there 
> any way the SQL can be rewritten so that the index is used? Or any other 
> solution so that the query with the timezone offset returns in a comparable 
> time?
> 
> I am not aware of a best practice to handle this. Your where condition on 
> mv.timestamp now depends on several joins to do a filtering that used to be a 
> static range that can be scanned into the index as a first node in the plan. 
> I have sometimes used a sub-query on a broader condition that allows the use 
> of the index, and then fully reducing the set later. Something like this-
> 
> select d.short_id,mv.timestamp,mv.I64_01
>   from device d, device_configuration dc, (
> select mv.*
>   from measurement_value AS mv_inner
>   where mv.timestamp > '2020-11-06'::timestamp - interval '1 day' and 
> mv.timestamp < '2020-11-07'::timestamp + interval '1 day'
> offset 0 /* to prevent in-lining the join to the outside set */
> ) mv, pg_timezone_names tzn
>   where mv.device_id=d.short_id and dc.device_id = d.id <http://d.id/> and 
> dc.latest=true and dc.timezone=tzn.name <http://tzn.name/> and
>         mv.timestamp > '2020-11-06'::timestamp - tzn.utc_offset and 
> mv.timestamp < '2020-11-07'::timestamp - tzn.utc_offset
> 
> By the way, it seems a little odd to be exclusive on both the begin and end. 
> I'd usually expect timestamp >= start_date and timestamp < end_date + 
> interval '1 day' to fully capture a 24 hour period. Right now, you are 
> excluding any data that happens to have a timestamp value with .000000 
> seconds (midnight exactly).

Reply via email to