Re: Help with large delete

Perry Smith Sun, 17 Apr 2022 05:23:00 -0700

Thank you TOM!!!

So… I did:


create index parent_id_index on dateien(parent_id);

And now things are going much faster.  As you can see, I had an index kinda 
sorta on the parent id but I guess the way I did it prevented Postgres from 
using it.

> On Apr 17, 2022, at 06:58, Perry Smith <p...@easesoftware.com> wrote:
> 
> I’m sending this again.  I don’t see that it made it to the list but there is 
> also new info here.
> 
>> On Apr 16, 2022, at 10:33, Tom Lane <t...@sss.pgh.pa.us 
>> <mailto:t...@sss.pgh.pa.us>> wrote:
>> 
>> Perry Smith <p...@easesoftware.com <mailto:p...@easesoftware.com>> writes:
>>> Currently I have one table that mimics a file system.  Each entry has a 
>>> parent_id and a base name where parent_id is an id in the table that must 
>>> exist in the table or be null with cascade on delete.
>>> I’ve started a delete of a root entry with about 300,000 descendants.  The 
>>> table currently has about 22M entries and I’m adding about 1600 entries per 
>>> minute still.  Eventually there will not be massive amounts of entries 
>>> being added and the table will be mostly static.
>> 
>> The most obvious question is do you have an index on the referencing
>> column.  PG doesn't require one to exist to create an FK; but if you
>> don't, deletes of referenced rows had better be uninteresting to you
>> performance-wise, because each one will cause a seqscan.
> 
> To try to reply to Peter’s question, I jstarted:
> 
> psql -c "explain analyze delete from dateien where basename = 
> '/mnt/pedz/Visual_Media'” find_dups
> 
> I did this last night at 10 p.m. and killed it just now at 6:30 without any 
> response.
> 
> This is inside a BSD “jail” on a NAS.  I don’t know how much CPU the jail is 
> given.
> 
> For Tom’s question, here is the description of the table:
> 
> psql -c '\d dateien' find_dups
>                                           Table "public.dateien"
>    Column   |              Type              | Collation | Nullable |         
>       Default
> ------------+--------------------------------+-----------+----------+-------------------------------------
>  id         | bigint                         |           | not null | 
> nextval('dateien_id_seq'::regclass)
>  basename   | character varying              |           | not null |
>  parent_id  | bigint                         |           |          |
>  dev        | bigint                         |           | not null |
>  ftype      | character varying              |           | not null |
>  uid        | bigint                         |           | not null |
>  gid        | bigint                         |           | not null |
>  ino        | bigint                         |           | not null |
>  mode       | bigint                         |           | not null |
>  mtime      | timestamp without time zone    |           | not null |
>  nlink      | bigint                         |           | not null |
>  size       | bigint                         |           | not null |
>  sha1       | character varying              |           |          |
>  created_at | timestamp(6) without time zone |           | not null |
>  updated_at | timestamp(6) without time zone |           | not null |
> Indexes:
>     "dateien_pkey" PRIMARY KEY, btree (id)
>     "unique_dev_ino_for_dirs" UNIQUE, btree (dev, ino) WHERE ftype::text = 
> 'directory'::text
>     "unique_parent_basename" UNIQUE, btree (COALESCE(parent_id, 
> '-1'::integer::bigint), basename)
> Foreign-key constraints:
>     "fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) REFERENCES dateien(id) ON 
> DELETE CASCADE
> Referenced by:
>     TABLE "dateien" CONSTRAINT "fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) 
> REFERENCES dateien(id) ON DELETE CASCADE
> 
> To do a simple delete of a node that has no children takes about 11 seconds:
> 
> time psql -c "delete from dateien where id = 13498939;" find_dups
> DELETE 1
> psql -c "delete from dateien where id = 13498939;" find_dups  0.00s user 
> 0.01s system 0% cpu 11.282 total
> 
> I’m implementing the suggestion that I do the recession myself but at this 
> rate it will take about 38 days to delete 300K entries.  I must be doing 
> something horribly wrong.  I hope you guys can enlighten me.
> 
> Thank you for your time,
> Perry
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
>

signature.asc
Description: Message signed with OpenPGP

Re: Help with large delete

Reply via email to