pá 4. 2. 2022 v 10:11 odesílatel Les <nagy...@gmail.com> napsal:

> Hello,
>
> I have a table that contains folders, and another one that contains files.
>
> Here are the table definitions. I have removed most of the columns because
> they are not important for this question. (There are lots of columns.)
>
> CREATE TABLE media.oo_folder (
> id int8 NOT NULL,
> is_active bool NOT NULL DEFAULT true,
> title text NOT NULL,
> relpath text NOT NULL,
> CONSTRAINT chk_no_slash CHECK (("position"(title, '/'::text) = 0)),
> CONSTRAINT oo_folder_chk_no_slash CHECK (("position"(title, '/'::text) =
> 0)),
> CONSTRAINT pk_oo_folder PRIMARY KEY (id),
> CONSTRAINT fk_oo_folder_parent_id FOREIGN KEY (parent_id) REFERENCES
> media.oo_folder(id) ON DELETE CASCADE DEFERRABLE
> );
> CREATE INDEX oo_folder_idx_parent ON media.oo_folder USING btree
> (parent_id);
> CREATE INDEX oo_folder_idx_relpath ON media.oo_folder USING btree
> (relpath);
> CREATE UNIQUE INDEX uidx_oo_folder_active_title ON media.oo_folder USING
> btree (parent_id, title) WHERE is_active;
>
>
> CREATE TABLE media.oo_file (
> id int8 NOT NULL,
> is_active bool NOT NULL DEFAULT true,
> title text NOT NULL,
> ext text NULL,
> relpath text NOT NULL,
> sha1 text NOT NULL,
> CONSTRAINT chk_no_slash CHECK (("position"(title, '/'::text) = 0)),
> CONSTRAINT oo_file_chk_no_slash CHECK (("position"(title, '/'::text) = 0)),
> CONSTRAINT pk_oo_file PRIMARY KEY (id),
> CONSTRAINT fk_oo_file_oo_folder_id FOREIGN KEY (oo_folder_id) REFERENCES
> media.oo_folder(id) ON DELETE CASCADE DEFERRABLE,
> );
> CREATE INDEX oo_file_idx_oo_folder_id ON media.oo_file USING btree
> (oo_folder_id);
> CREATE INDEX oo_file_idx_relpath ON media.oo_file USING btree (relpath);
> CREATE INDEX oo_file_idx_sha1 ON media.oo_file USING btree (sha1);
> CREATE UNIQUE INDEX uidx_oo_file_active_title ON media.oo_file USING btree
> (oo_folder_id, title) WHERE is_active;
>
> The "replath" field contains the path of the file/folder. For example:
> "/folder1/folder2/folder3/filename4.ext5".  The replath field is managed by
> triggers. There are about 1M rows for files and 600K folder rows in the
> database. The files are well distributed between folders, and there are
> only 45 root folders ( parent_id is null)
>
> This query runs very fast:
>
>
> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS) select id, title from
> media.oo_folder f where f.parent_id is null
>
> QUERY PLAN
>                                                                |
>
> ------------------------------------------------------------------------------------------------------------------------------------------+
> Index Scan using oo_folder_idx_parent on media.oo_folder f
>  (cost=0.42..73.70 rows=20 width=25) (actual time=0.030..0.159 rows=45
> loops=1)|
>   Output: id, title
>                                                                 |
>   Index Cond: (f.parent_id IS NULL)
>                                                                 |
>   Buffers: shared hit=40
>                                                                |
> Planning Time: 0.123 ms
>                                                                 |
> Execution Time: 0.187 ms
>                                                                |
>
> My task is to write a query that tells if a folder has any active file
> inside it - directly or in subfolders. Here is the query for that:
>
> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
>
> select id, title,
> (exists (select f2.id from media.oo_file f2 where f2.relpath like
> f.relpath || '%')) as has_file
> from media.oo_folder f where f.parent_id is null
>
> QUERY PLAN
>
>        |
>
> --------------------------------------------------------------------------------------------------------------------------------------------------------------+
> Index Scan using oo_folder_idx_parent on media.oo_folder f
>  (cost=0.42..488.02 rows=20 width=26) (actual time=713.419..25414.969
> rows=45 loops=1)             |
>   Output: f.id, f.title, (SubPlan 1)
>
>          |
>   Index Cond: (f.parent_id IS NULL)
>
>         |
>   Buffers: shared hit=7014170
>
>         |
>   SubPlan 1
>
>         |
>     ->  Index Only Scan using oo_file_idx_relpath on media.oo_file f2
>  (cost=0.55..108499.27 rows=5381 width=0) (actual time=564.756..564.756
> rows=0 loops=45)|
>           Filter: (f2.relpath ~~ (f.relpath || '%'::text))
>
>        |
>           Rows Removed by Filter: 792025
>
>        |
>           Heap Fetches: 768960
>
>        |
>           Buffers: shared hit=7014130
>
>         |
> Planning Time: 0.361 ms
>
>         |
> Execution Time: 25415.088 ms
>
>        |
>
> It also returns 45 rows, but in 25 seconds which is unacceptable.
>
> It I execute the "has_file" subquery for one specific relpath then it
> speeds up again, to < 1msec:
>
> EXPLAIN (ANALYZE, COSTS, VERBOSE, BUFFERS)
> select exists ( select id from media.oo_file of2  where relpath  like
> 'Felhasználók%')
> QUERY PLAN
>                                                |
>
> --------------------------------------------------------------------------------------------------------------------------+
> Result  (cost=1.66..1.67 rows=1 width=1) (actual time=0.049..0.050 rows=1
> loops=1)                                        |
>   Output: $0
>                                                |
>   Buffers: shared hit=2
>                                                 |
>   InitPlan 1 (returns $0)
>                                                 |
>     ->  Seq Scan on media.oo_file of2  (cost=0.00..144714.70 rows=86960
> width=0) (actual time=0.044..0.044 rows=1 loops=1)|
>           Filter: (of2.relpath ~~ 'Felhasználók%'::text)
>                                                |
>           Rows Removed by Filter: 15
>                                                |
>           Buffers: shared hit=2
>                                                 |
> Planning Time: 0.290 ms
>                                                 |
> Execution Time: 0.076 ms
>                                                |
>
> In other words, I could write a pl/sql function with a nested loop instead
> of the problematic query, and it will be 1000 times faster.
>
> What am I missing?
>

I don't understand how it is possible in the slow case Rows Removed by
Filter: 792025 (returns 0 row) and in the second case Rows Removed by
Filter: 15 (returns 1 row).

It is strange.




> Thanks,
>
>    Laszlo
>

Reply via email to