I’m sending this again.  I don’t see that it made it to the list but there is 
also new info here.

> On Apr 16, 2022, at 10:33, Tom Lane <t...@sss.pgh.pa.us 
> <mailto:t...@sss.pgh.pa.us>> wrote:
> 
> Perry Smith <p...@easesoftware.com <mailto:p...@easesoftware.com>> writes:
>> Currently I have one table that mimics a file system.  Each entry has a 
>> parent_id and a base name where parent_id is an id in the table that must 
>> exist in the table or be null with cascade on delete.
>> I’ve started a delete of a root entry with about 300,000 descendants.  The 
>> table currently has about 22M entries and I’m adding about 1600 entries per 
>> minute still.  Eventually there will not be massive amounts of entries being 
>> added and the table will be mostly static.
> 
> The most obvious question is do you have an index on the referencing
> column.  PG doesn't require one to exist to create an FK; but if you
> don't, deletes of referenced rows had better be uninteresting to you
> performance-wise, because each one will cause a seqscan.

To try to reply to Peter’s question, I jstarted:

psql -c "explain analyze delete from dateien where basename = 
'/mnt/pedz/Visual_Media'” find_dups

I did this last night at 10 p.m. and killed it just now at 6:30 without any 
response.

This is inside a BSD “jail” on a NAS.  I don’t know how much CPU the jail is 
given.

For Tom’s question, here is the description of the table:

psql -c '\d dateien' find_dups
                                          Table "public.dateien"
   Column   |              Type              | Collation | Nullable |           
    Default
------------+--------------------------------+-----------+----------+-------------------------------------
 id         | bigint                         |           | not null | 
nextval('dateien_id_seq'::regclass)
 basename   | character varying              |           | not null |
 parent_id  | bigint                         |           |          |
 dev        | bigint                         |           | not null |
 ftype      | character varying              |           | not null |
 uid        | bigint                         |           | not null |
 gid        | bigint                         |           | not null |
 ino        | bigint                         |           | not null |
 mode       | bigint                         |           | not null |
 mtime      | timestamp without time zone    |           | not null |
 nlink      | bigint                         |           | not null |
 size       | bigint                         |           | not null |
 sha1       | character varying              |           |          |
 created_at | timestamp(6) without time zone |           | not null |
 updated_at | timestamp(6) without time zone |           | not null |
Indexes:
    "dateien_pkey" PRIMARY KEY, btree (id)
    "unique_dev_ino_for_dirs" UNIQUE, btree (dev, ino) WHERE ftype::text = 
'directory'::text
    "unique_parent_basename" UNIQUE, btree (COALESCE(parent_id, 
'-1'::integer::bigint), basename)
Foreign-key constraints:
    "fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) REFERENCES dateien(id) ON 
DELETE CASCADE
Referenced by:
    TABLE "dateien" CONSTRAINT "fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) 
REFERENCES dateien(id) ON DELETE CASCADE

To do a simple delete of a node that has no children takes about 11 seconds:

time psql -c "delete from dateien where id = 13498939;" find_dups
DELETE 1
psql -c "delete from dateien where id = 13498939;" find_dups  0.00s user 0.01s 
system 0% cpu 11.282 total

I’m implementing the suggestion that I do the recession myself but at this rate 
it will take about 38 days to delete 300K entries.  I must be doing something 
horribly wrong.  I hope you guys can enlighten me.

Thank you for your time,
Perry















Attachment: signature.asc
Description: Message signed with OpenPGP

Reply via email to