>
>                ->  Hash Join  (cost=644250.54..10734700.30 rows=22333224
> width=223) (actual time=7864.023..44546.392 rows=22325462 loops=1)
>                      Hash Cond: (file_1.id = granule_file_1.file_id)
>                      Buffers: shared hit=780882 read=8345236
>                      ->  Seq Scan on file file_1  (cost=0.00..9205050.88
> rows=22068888 width=207) (actual time=402.706..25222.525 rows=22057988
> loops=1)
>                            Buffers: shared hit=639126 read=8345236
>                      ->  Hash  (cost=365085.24..365085.24 rows=22333224
> width=20) (actual time=7288.228..7288.235 rows=22325462 loops=1)
>                            Buckets: 33554432  Batches: 1  Memory Usage:
> 1391822kB
>                            Buffers: shared hit=141753
>                            ->  Seq Scan on granule_file granule_file_1  
> (cost=0.00..365085.24
> rows=22333224 width=20) (actual time=0.030..2151.380 rows=22325462 loops=1)
>                                  Buffers: shared hit=141753


This part above is the most expensive so far, and taking a look at your
`granule_file` table on the first message, it has no indexes nor
constraints, which certainly looks like a mistake. I'd start optimizing
this, you could add an index on it, but seems that you need a primary key
on both columns of this (junction?) table:

    ALTER TABLE granule_file ADD PRIMARY KEY (granule_uuid, file_id);

There are certainly more things to optimize on this query, but I prefer
doing one thing at a time. Could you try with the PK and send the EXPLAIN
ANALYZE of the query again after that?

Best regards,
Matheus de Oliveira

Reply via email to