Thank you TOM!!! So… I did:
create index parent_id_index on dateien(parent_id); And now things are going much faster. As you can see, I had an index kinda sorta on the parent id but I guess the way I did it prevented Postgres from using it. > On Apr 17, 2022, at 06:58, Perry Smith <p...@easesoftware.com> wrote: > > I’m sending this again. I don’t see that it made it to the list but there is > also new info here. > >> On Apr 16, 2022, at 10:33, Tom Lane <t...@sss.pgh.pa.us >> <mailto:t...@sss.pgh.pa.us>> wrote: >> >> Perry Smith <p...@easesoftware.com <mailto:p...@easesoftware.com>> writes: >>> Currently I have one table that mimics a file system. Each entry has a >>> parent_id and a base name where parent_id is an id in the table that must >>> exist in the table or be null with cascade on delete. >>> I’ve started a delete of a root entry with about 300,000 descendants. The >>> table currently has about 22M entries and I’m adding about 1600 entries per >>> minute still. Eventually there will not be massive amounts of entries >>> being added and the table will be mostly static. >> >> The most obvious question is do you have an index on the referencing >> column. PG doesn't require one to exist to create an FK; but if you >> don't, deletes of referenced rows had better be uninteresting to you >> performance-wise, because each one will cause a seqscan. > > To try to reply to Peter’s question, I jstarted: > > psql -c "explain analyze delete from dateien where basename = > '/mnt/pedz/Visual_Media'” find_dups > > I did this last night at 10 p.m. and killed it just now at 6:30 without any > response. > > This is inside a BSD “jail” on a NAS. I don’t know how much CPU the jail is > given. > > For Tom’s question, here is the description of the table: > > psql -c '\d dateien' find_dups > Table "public.dateien" > Column | Type | Collation | Nullable | > Default > ------------+--------------------------------+-----------+----------+------------------------------------- > id | bigint | | not null | > nextval('dateien_id_seq'::regclass) > basename | character varying | | not null | > parent_id | bigint | | | > dev | bigint | | not null | > ftype | character varying | | not null | > uid | bigint | | not null | > gid | bigint | | not null | > ino | bigint | | not null | > mode | bigint | | not null | > mtime | timestamp without time zone | | not null | > nlink | bigint | | not null | > size | bigint | | not null | > sha1 | character varying | | | > created_at | timestamp(6) without time zone | | not null | > updated_at | timestamp(6) without time zone | | not null | > Indexes: > "dateien_pkey" PRIMARY KEY, btree (id) > "unique_dev_ino_for_dirs" UNIQUE, btree (dev, ino) WHERE ftype::text = > 'directory'::text > "unique_parent_basename" UNIQUE, btree (COALESCE(parent_id, > '-1'::integer::bigint), basename) > Foreign-key constraints: > "fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) REFERENCES dateien(id) ON > DELETE CASCADE > Referenced by: > TABLE "dateien" CONSTRAINT "fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) > REFERENCES dateien(id) ON DELETE CASCADE > > To do a simple delete of a node that has no children takes about 11 seconds: > > time psql -c "delete from dateien where id = 13498939;" find_dups > DELETE 1 > psql -c "delete from dateien where id = 13498939;" find_dups 0.00s user > 0.01s system 0% cpu 11.282 total > > I’m implementing the suggestion that I do the recession myself but at this > rate it will take about 38 days to delete 300K entries. I must be doing > something horribly wrong. I hope you guys can enlighten me. > > Thank you for your time, > Perry > > > > > > > > > > > > > > >
signature.asc
Description: Message signed with OpenPGP