What have I done!?!?!? :-)

2022-04-06 Thread Perry Smith
Rather than explain how I got here, I’ll just explain the state I’m in.

From psql:

files_development=# \d files
  Table "public.files"
   Column   |  Type  | Collation | Nullable |   
   Default
++---+--+---
 id | bigint |   | not null | 
nextval('files_id_seq'::regclass)
 basename   | character varying  |   | not null |
 parent_id  | bigint |   | not null |
 dev| bigint |   | not null |
 ftype  | character varying  |   | not null |
 uid| bigint |   | not null |
 gid| bigint |   | not null |
 ino| bigint |   | not null |
 mode   | bigint |   | not null |
 mtime  | time without time zone |   | not null |
 nlink  | bigint |   | not null |
 size   | bigint |   | not null |
 created_at | timestamp(6) without time zone |   | not null |
 updated_at | timestamp(6) without time zone |   | not null |
Indexes:
"files_pkey" PRIMARY KEY, btree (id)
"index_files_on_parent_id" btree (parent_id)
Foreign-key constraints:
"fk_rails_15605042e6" FOREIGN KEY (parent_id) REFERENCES files(id)
Referenced by:
TABLE "files" CONSTRAINT "fk_rails_15605042e6" FOREIGN KEY (parent_id) 
REFERENCES files(id)

Notice that parent_id is suppose to refer to an id in the same table — at 
least, that is what I’m trying to do.  I’m trying to create a “root” entry 
whose parent points to themselves and I botched the code first time around and 
now I have this:

files_development=# select * from files;
 id | basename | parent_id |dev|   ftype   | uid  | gid  | ino | mode  
| mtime  | nlink | size | created_at | 
updated_at
+--+---+---+---+--+--+-+---++---+--++
 11 | pedz |  1234 | 687931150 | directory | 1000 | 1002 |   2 | 16877 
| 18:43:29.65271 |31 |   34 | 2022-04-06 21:58:43.570539 | 2022-04-06 
21:58:43.570539
 12 | pedz |12 | 687931150 | directory | 1000 | 1002 |   2 | 16877 
| 18:43:29.65271 |31 |   34 | 2022-04-06 22:00:29.087417 | 2022-04-06 
22:00:29.115021
(2 rows)


The record with id 11 has a parent id of 1234 which doesn’t exist.

My question isn’t how do I fix it, my question is why didn’t Postgres back out 
the botched record?  Why isn’t it complaining?

I’m using Active Record with the psql adapter.  It has a 
disable_referential_integrity which takes a block of code.  When the block of 
code exists, the constraints are put back.  At least, that is what I thought.

I’m wondering if the disabled constraints are still disabled somehow.  If so, 
how would I check for that and how would I turn them back on?  Or am I way off 
in the weeds?

Thank you for your time
Perry Smith



signature.asc
Description: Message signed with OpenPGP


Re: What have I done!?!?!? :-)

2022-04-06 Thread Perry Smith


> On Apr 6, 2022, at 18:05, Lionel Bouton  wrote:
> 
> Hi Perry,
> 
> Le 07/04/2022 à 00:25, Perry Smith a écrit :
>> [...]
> I'd say the later : in your case I would use a NULL parent_id for root(s). 
> Your way leads you to bend PostgreSQL until its back brakes

Yea.  This is definitely walking up the down escalator.

But… I did learn something.  So… bandage up my wounds, learn, and grow…

Thank you,
Perry




signature.asc
Description: Message signed with OpenPGP


Re: What have I done!?!?!? :-)

2022-04-08 Thread Perry Smith


> On Apr 8, 2022, at 07:47, Jan Wieck  wrote:
> 
> On 4/8/22 01:57, Nikolay Samokhvalov wrote:
>> On Thu, Apr 7, 2022 at 8:10 AM Jan Wieck > > wrote:
>>So **IF** Active Record is using that feature, then it can dump any
>>amount of garbage into your PostgreSQL database and PostgreSQL will
>>happily accept it with zero integrity checking.
>> It's DISABLE TRIGGER ALL 
>> https://github.com/rails/rails/blob/831031a8cec5bfe59ef653ae2857d4fe64c5698d/activerecord/lib/active_record/connection_adapters/postgresql/referential_integrity.rb#L12
>>  
>> 
> 
> Similar poison, same side effect.
> 
> Looking further at that code it also directly updates the PostgreSQL system 
> catalog. This is a big, red flag.
> 
> Why do the Rails developers think they need a sledgehammer like that? It 
> seems to be doing that for over 7 years, so it is hard to tell from the 
> commit log why they need to disable RI at all.


It has been a long time since I’ve done Rails stuff.  What follows is the best 
I can recall but please take it with a grain of salt.

The first problem is that generally Rails does not put constraints in the 
database.  There were others like me who thought that was insane and would put 
constraints in the database — this includes foreign key constraints, check 
constraints, etc.  About the only constraint that could be added into the DB 
using native Rails was the “not null” constraint.

When foreign and other constraints were added, it broke something they call 
“Fixtures” which are present db states that are plopped into the DB during 
testing.  To “fix” that, I (and others) would add this into our code base: (I’m 
adding this to see what you guys think of it — is it safer / better or just as 
insane?)

  def disable_referential_integrity(&block)
transaction {
  begin
execute "SET CONSTRAINTS ALL DEFERRED"
yield
  ensure
execute "SET CONSTRAINTS ALL IMMEDIATE"
  end
}
  end

The above would only be used during testing.

For this project, I wondered if it was in the AR code base and found a method 
with the same name.  Note that my method was all done under one transaction.  
The code they have uses multiple transactions.  I’m curious to know if either 
method is better / safer.

For the longest time, the Rails developers refused to put constraints into the 
DB insisting that the app could do it.  Eventually (SURPRISE) they were 
proven wrong.  Users can now add foreign key constraints using native AR 
constructs as well as general check constraints and indexes.  Before it had to 
be done using SQL.  Rails did have a general purpose “execute” statement (shown 
above).



signature.asc
Description: Message signed with OpenPGP


Re: What have I done!?!?!? :-)

2022-04-08 Thread Perry Smith


> On Apr 8, 2022, at 08:10, Magnus Hagander  wrote:
> 
> 
> 
> On Fri, Apr 8, 2022 at 3:07 PM Jan Wieck  > wrote:
> On 4/8/22 08:58, Magnus Hagander wrote:
> > A side-note on this, which of course won't help the OP at this point,
> > but if the general best practice of not running the application with a
> > highly privileged account is followed, the problem won't occur (it will
> > just fail early before breaking things). DISABLE TRIGGER ALL requires
> > either ownership of the table or superuser permissions, none of which
> > it's recommended that the application run with. Doesn't help once the
> > problem has occurred of course, but can help avoid it happening in the
> > future.
> 
> It gets even better further down in that code, where it UPDATEs
> pg_constraint directly. That not only requires superuser but also catupd
> permissions (which are separate from superuser for a reason).
> 
> Indeed.The fact that's in the code is sadly an indicator of how many people 
> run their app as superuser :(

Interesting conversation.  Yes, the developer generally has superuser DB 
because they need to create the DB, etc.   And, during testing, I’m sure their 
test user has super user privileges as well.  I bet they don’t set up the DB 
with one user and then test with a non-SU user during test and the corollary is 
I bet the DB user used in production is also a super user in the majority of 
the installations.  I’ve never seen this discussed.  I’m not hugely active in 
the Rails community.  They never liked me (or perhaps that was projection on my 
part).



signature.asc
Description: Message signed with OpenPGP


Constraint ordering

2022-04-09 Thread Perry Smith
I think (hope) I’ve made a bad assumption.  I have my DB with one table with 
two constraint on new entries.  The “first” is for the parent and basename be 
unique.  The “second” is that the devno and inode are unique if it is a 
directory.

When I was doing my early testing, the parent+basename constraint would fire 
first if it needed to.  Now that I’m doing a longer test run, the second 
constraint is firing at a time that I wasn’t expecting.  I’m debugging but it 
takes time to hit this again.

My assumption was if the error reported back that the “second" constraint 
failed that the “first” constraint passed.  But I bet that isn’t a valid 
assumption at all.

All that to ask: is there a predictable ordering of constraints?

Thank you,
Perry



signature.asc
Description: Message signed with OpenPGP


Help with large delete

2022-04-16 Thread Perry Smith
Currently I have one table that mimics a file system.  Each entry has a 
parent_id and a base name where parent_id is an id in the table that must exist 
in the table or be null with cascade on delete.

I’ve started a delete of a root entry with about 300,000 descendants.  The 
table currently has about 22M entries and I’m adding about 1600 entries per 
minute still.  Eventually there will not be massive amounts of entries being 
added and the table will be mostly static.

I started the delete before from a terminal that got detached.  So I killed 
that process and started it up again from a terminal less likely to get 
detached.˘

My question is basically how can I make life easier for Postgres?  I believe 
(hope) the deletes will be few and far between but they will happen from time 
to time.  In this case, Dropbox — its a long story that isn’t really pertinent. 
 The point is that @#$% happens.

“What can I do” includes starting completely over if necessary.  I’ve only got 
about a week invested in this and its just machine time at zero cost.  I could 
stop the other processes that are adding entries and let the delete finish if 
that would help.  etc.

Thank you for your time,
Perry



signature.asc
Description: Message signed with OpenPGP


Re: ***SPAM*** Re: Help with large delete

2022-04-16 Thread Perry Smith


> On Apr 16, 2022, at 10:33, Tom Lane  wrote:
> 
> Perry Smith  writes:
>> Currently I have one table that mimics a file system.  Each entry has a 
>> parent_id and a base name where parent_id is an id in the table that must 
>> exist in the table or be null with cascade on delete.
>> I’ve started a delete of a root entry with about 300,000 descendants.  The 
>> table currently has about 22M entries and I’m adding about 1600 entries per 
>> minute still.  Eventually there will not be massive amounts of entries being 
>> added and the table will be mostly static.
> 
> The most obvious question is do you have an index on the referencing
> column.  PG doesn't require one to exist to create an FK; but if you
> don't, deletes of referenced rows had better be uninteresting to you
> performance-wise, because each one will cause a seqscan.

To try to reply to Peter’s question, I just now started:

psql -c "explain analyze delete from dateien where basename = 
'/mnt/pedz/Visual_Media'” find_dups

And it hasn’t replied yet.  I hope you are not slapping your head muttering 
“this guy is an idiot!!” — in that this would not give you the plan you are 
asking for...

This is inside a BSD “jail” on a NAS.  I’m wondering if the jail has a limited 
time and the other processes have consumed it all.  In any case, if / when it 
replies, I will post the results.

For Tom’s question, here is the description of the table:

psql -c '\d dateien' find_dups
  Table "public.dateien"
   Column   |  Type  | Collation | Nullable |   
Default
++---+--+-
 id | bigint |   | not null | 
nextval('dateien_id_seq'::regclass)
 basename   | character varying  |   | not null |
 parent_id  | bigint |   |  |
 dev| bigint |   | not null |
 ftype  | character varying  |   | not null |
 uid| bigint |   | not null |
 gid| bigint |   | not null |
 ino| bigint |   | not null |
 mode   | bigint |   | not null |
 mtime  | timestamp without time zone|   | not null |
 nlink  | bigint |   | not null |
 size   | bigint |   | not null |
 sha1   | character varying  |   |  |
 created_at | timestamp(6) without time zone |   | not null |
 updated_at | timestamp(6) without time zone |   | not null |
Indexes:
"dateien_pkey" PRIMARY KEY, btree (id)
"unique_dev_ino_for_dirs" UNIQUE, btree (dev, ino) WHERE ftype::text = 
'directory'::text
"unique_parent_basename" UNIQUE, btree (COALESCE(parent_id, 
'-1'::integer::bigint), basename)
Foreign-key constraints:
"fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) REFERENCES dateien(id) ON 
DELETE CASCADE
Referenced by:
TABLE "dateien" CONSTRAINT "fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) 
REFERENCES dateien(id) ON DELETE CASCADE




signature.asc
Description: Message signed with OpenPGP


Re: Help with large delete

2022-04-16 Thread Perry Smith


> On Apr 16, 2022, at 12:57, Jan Wieck  wrote:
> 
> Make your connection immune to disconnects by using something like the screen 
> utility.

Exactly… I’m using emacs in a server (daemon) mode so it stays alive.  Then I 
do “shell” within it.


> On Sat, Apr 16, 2022, 09:26 Perry Smith  <mailto:p...@easesoftware.com>> wrote:
> Currently I have one table that mimics a file system.  Each entry has a 
> parent_id and a base name where parent_id is an id in the table that must 
> exist in the table or be null with cascade on delete.
> 
> I’ve started a delete of a root entry with about 300,000 descendants.  The 
> table currently has about 22M entries and I’m adding about 1600 entries per 
> minute still.  Eventually there will not be massive amounts of entries being 
> added and the table will be mostly static.
> 
> I started the delete before from a terminal that got detached.  So I killed 
> that process and started it up again from a terminal less likely to get 
> detached.˘
> 
> My question is basically how can I make life easier for Postgres?  I believe 
> (hope) the deletes will be few and far between but they will happen from time 
> to time.  In this case, Dropbox — its a long story that isn’t really 
> pertinent.  The point is that @#$% happens.
> 
> “What can I do” includes starting completely over if necessary.  I’ve only 
> got about a week invested in this and its just machine time at zero cost.  I 
> could stop the other processes that are adding entries and let the delete 
> finish if that would help.  etc.
> 
> Thank you for your time,
> Perry
> 



signature.asc
Description: Message signed with OpenPGP


Re: Help with large delete

2022-04-16 Thread Perry Smith


> On Apr 16, 2022, at 13:56, Rob Sargent  wrote:
> 
> 
> 
>> On Apr 16, 2022, at 12:24 PM, Perry Smith  wrote:
>> 
>> 
>> 
>>> On Apr 16, 2022, at 12:57, Jan Wieck >> <mailto:j...@wi3ck.info>> wrote:
>>> 
>>> Make your connection immune to disconnects by using something like the 
>>> screen utility.
>> 
>> Exactly… I’m using emacs in a server (daemon) mode so it stays alive.  Then 
>> I do “shell” within it.
> I use emacs a lot.  It doesn’t keep the terminal alive in my experience. 
> Perhaps nohup?

https://www.emacswiki.org/emacs/EmacsAsDaemon

Doing: emacs —daemon

You will see a couple of messages about loading up the customized file and then 
it detaches and you get back to the prompt.



signature.asc
Description: Message signed with OpenPGP


Re: Help with large delete

2022-04-17 Thread Perry Smith
I’m sending this again.  I don’t see that it made it to the list but there is 
also new info here.

> On Apr 16, 2022, at 10:33, Tom Lane  <mailto:t...@sss.pgh.pa.us>> wrote:
> 
> Perry Smith mailto:p...@easesoftware.com>> writes:
>> Currently I have one table that mimics a file system.  Each entry has a 
>> parent_id and a base name where parent_id is an id in the table that must 
>> exist in the table or be null with cascade on delete.
>> I’ve started a delete of a root entry with about 300,000 descendants.  The 
>> table currently has about 22M entries and I’m adding about 1600 entries per 
>> minute still.  Eventually there will not be massive amounts of entries being 
>> added and the table will be mostly static.
> 
> The most obvious question is do you have an index on the referencing
> column.  PG doesn't require one to exist to create an FK; but if you
> don't, deletes of referenced rows had better be uninteresting to you
> performance-wise, because each one will cause a seqscan.

To try to reply to Peter’s question, I jstarted:

psql -c "explain analyze delete from dateien where basename = 
'/mnt/pedz/Visual_Media'” find_dups

I did this last night at 10 p.m. and killed it just now at 6:30 without any 
response.

This is inside a BSD “jail” on a NAS.  I don’t know how much CPU the jail is 
given.

For Tom’s question, here is the description of the table:

psql -c '\d dateien' find_dups
  Table "public.dateien"
   Column   |  Type  | Collation | Nullable |   
Default
++---+--+-
 id | bigint |   | not null | 
nextval('dateien_id_seq'::regclass)
 basename   | character varying  |   | not null |
 parent_id  | bigint |   |  |
 dev| bigint |   | not null |
 ftype  | character varying  |   | not null |
 uid| bigint |   | not null |
 gid| bigint |   | not null |
 ino| bigint |   | not null |
 mode   | bigint |   | not null |
 mtime  | timestamp without time zone|   | not null |
 nlink  | bigint |   | not null |
 size   | bigint |   | not null |
 sha1   | character varying  |   |  |
 created_at | timestamp(6) without time zone |   | not null |
 updated_at | timestamp(6) without time zone |   | not null |
Indexes:
"dateien_pkey" PRIMARY KEY, btree (id)
"unique_dev_ino_for_dirs" UNIQUE, btree (dev, ino) WHERE ftype::text = 
'directory'::text
"unique_parent_basename" UNIQUE, btree (COALESCE(parent_id, 
'-1'::integer::bigint), basename)
Foreign-key constraints:
"fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) REFERENCES dateien(id) ON 
DELETE CASCADE
Referenced by:
TABLE "dateien" CONSTRAINT "fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) 
REFERENCES dateien(id) ON DELETE CASCADE

To do a simple delete of a node that has no children takes about 11 seconds:

time psql -c "delete from dateien where id = 13498939;" find_dups
DELETE 1
psql -c "delete from dateien where id = 13498939;" find_dups  0.00s user 0.01s 
system 0% cpu 11.282 total

I’m implementing the suggestion that I do the recession myself but at this rate 
it will take about 38 days to delete 300K entries.  I must be doing something 
horribly wrong.  I hope you guys can enlighten me.

Thank you for your time,
Perry

















signature.asc
Description: Message signed with OpenPGP


Re: Help with large delete

2022-04-17 Thread Perry Smith
Thank you TOM!!!

So… I did:

create index parent_id_index on dateien(parent_id);

And now things are going much faster.  As you can see, I had an index kinda 
sorta on the parent id but I guess the way I did it prevented Postgres from 
using it.

> On Apr 17, 2022, at 06:58, Perry Smith  wrote:
> 
> I’m sending this again.  I don’t see that it made it to the list but there is 
> also new info here.
> 
>> On Apr 16, 2022, at 10:33, Tom Lane > <mailto:t...@sss.pgh.pa.us>> wrote:
>> 
>> Perry Smith mailto:p...@easesoftware.com>> writes:
>>> Currently I have one table that mimics a file system.  Each entry has a 
>>> parent_id and a base name where parent_id is an id in the table that must 
>>> exist in the table or be null with cascade on delete.
>>> I’ve started a delete of a root entry with about 300,000 descendants.  The 
>>> table currently has about 22M entries and I’m adding about 1600 entries per 
>>> minute still.  Eventually there will not be massive amounts of entries 
>>> being added and the table will be mostly static.
>> 
>> The most obvious question is do you have an index on the referencing
>> column.  PG doesn't require one to exist to create an FK; but if you
>> don't, deletes of referenced rows had better be uninteresting to you
>> performance-wise, because each one will cause a seqscan.
> 
> To try to reply to Peter’s question, I jstarted:
> 
> psql -c "explain analyze delete from dateien where basename = 
> '/mnt/pedz/Visual_Media'” find_dups
> 
> I did this last night at 10 p.m. and killed it just now at 6:30 without any 
> response.
> 
> This is inside a BSD “jail” on a NAS.  I don’t know how much CPU the jail is 
> given.
> 
> For Tom’s question, here is the description of the table:
> 
> psql -c '\d dateien' find_dups
>   Table "public.dateien"
>Column   |  Type  | Collation | Nullable | 
>   Default
> ++---+--+-
>  id | bigint |   | not null | 
> nextval('dateien_id_seq'::regclass)
>  basename   | character varying  |   | not null |
>  parent_id  | bigint |   |  |
>  dev| bigint |   | not null |
>  ftype  | character varying  |   | not null |
>  uid| bigint |   | not null |
>  gid| bigint |   | not null |
>  ino| bigint |   | not null |
>  mode   | bigint |   | not null |
>  mtime  | timestamp without time zone|   | not null |
>  nlink  | bigint |   | not null |
>  size   | bigint |   | not null |
>  sha1   | character varying  |   |  |
>  created_at | timestamp(6) without time zone |   | not null |
>  updated_at | timestamp(6) without time zone |   | not null |
> Indexes:
> "dateien_pkey" PRIMARY KEY, btree (id)
> "unique_dev_ino_for_dirs" UNIQUE, btree (dev, ino) WHERE ftype::text = 
> 'directory'::text
> "unique_parent_basename" UNIQUE, btree (COALESCE(parent_id, 
> '-1'::integer::bigint), basename)
> Foreign-key constraints:
> "fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) REFERENCES dateien(id) ON 
> DELETE CASCADE
> Referenced by:
> TABLE "dateien" CONSTRAINT "fk_rails_c01ebbd0bf" FOREIGN KEY (parent_id) 
> REFERENCES dateien(id) ON DELETE CASCADE
> 
> To do a simple delete of a node that has no children takes about 11 seconds:
> 
> time psql -c "delete from dateien where id = 13498939;" find_dups
> DELETE 1
> psql -c "delete from dateien where id = 13498939;" find_dups  0.00s user 
> 0.01s system 0% cpu 11.282 total
> 
> I’m implementing the suggestion that I do the recession myself but at this 
> rate it will take about 38 days to delete 300K entries.  I must be doing 
> something horribly wrong.  I hope you guys can enlighten me.
> 
> Thank you for your time,
> Perry
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 



signature.asc
Description: Message signed with OpenPGP


Oddity that I don't understand

2022-08-11 Thread Perry Smith
I’m tempted to ask “Is this a bug” but I predict there is an explanation.

I have a view:
find_dups=# \sv+ dateien
1   CREATE OR REPLACE VIEW public.dateien AS
2SELECT d.id,
3   d.basename,
4   d.parent_id,
5   d.ino,
6   d.ext,
7   i.ftype,
8   i.uid,
9   i.gid,
10  i.mode,
11  i.mtime,
12  i.nlink,
13  i.size,
14  i.sha1,
15  i.file_type
16 FROM dirents d
17   FULL JOIN inodes i USING (ino)

find_dups=# \d inodes
Table "public.inodes"
   Column   |  Type  | Collation | Nullable | Default
++---+--+-
 ino| bigint |   | not null |
 ftype  | character varying  |   | not null |
 uid| bigint |   | not null |
 gid| bigint |   | not null |
 mode   | bigint |   | not null |
 mtime  | timestamp without time zone|   | not null |
 nlink  | bigint |   | not null |
 size   | bigint |   | not null |
 sha1   | character varying  |   |  |
 created_at | timestamp(6) without time zone |   | not null |
 updated_at | timestamp(6) without time zone |   | not null |
 file_type  | character varying  |   |  |
Indexes:
"inodes_pkey" PRIMARY KEY, btree (ino)
"index_inodes_on_ftype_and_size_and_file_type_and_sha1_and_nlink" btree 
(ftype, size, file_type, sha1, nlink)
Referenced by:
TABLE "dirents" CONSTRAINT "fk_rails_f076303053" FOREIGN KEY (ino) 
REFERENCES inodes(ino)

find_dups=# \d dirents
  Table "public.dirents"
   Column   |  Type  | Collation | Nullable |   
Default
++---+--+-
 id | bigint |   | not null | 
nextval('dirents_id_seq'::regclass)
 basename   | character varying  |   | not null |
 parent_id  | bigint |   |  |
 ino| bigint |   | not null |
 created_at | timestamp(6) without time zone |   | not null |
 updated_at | timestamp(6) without time zone |   | not null |
 ext| character varying  |   |  |
Indexes:
"dirents_pkey" PRIMARY KEY, btree (id)
"index_dirents_on_basename" btree (basename)
"index_dirents_on_ext" btree (ext)
"index_dirents_on_ino" btree (ino)
"index_dirents_on_parent_id_and_basename" UNIQUE, btree (parent_id, 
basename)
Foreign-key constraints:
"fk_rails_4dfefe0fc0" FOREIGN KEY (parent_id) REFERENCES dirents(id) ON 
DELETE CASCADE
"fk_rails_f076303053" FOREIGN KEY (ino) REFERENCES inodes(ino)
Referenced by:
TABLE "dirents" CONSTRAINT "fk_rails_4dfefe0fc0" FOREIGN KEY (parent_id) 
REFERENCES dirents(id) ON DELETE CASCADE

 I do a select and I get:

find_dups=# select id, basename, ext, parent_id, ino, sha1 from dateien where 
sha1 = '36f53d60353e0de6ed55d9da70a36b17559039f3' order by parent_id;
id|   basename   | ext | parent_id |   ino|   sha1
--+--+-+---+--+--
 85276821 | VC5Y8191.CR2 | CR2 |  85273064 | 70163023 | 
36f53d60353e0de6ed55d9da70a36b17559039f3
 85829158 | VC5Y8191.CR2 | CR2 |  85827904 |79366 | 
36f53d60353e0de6ed55d9da70a36b17559039f3
  |  | |   |  | 
36f53d60353e0de6ed55d9da70a36b17559039f3
(3 rows)

How can the third line exist?  Or, perhaps I should ask, what is the third line 
telling me?

Thank you,
Perry





signature.asc
Description: Message signed with OpenPGP


Re: Oddity that I don't understand

2022-08-11 Thread Perry Smith
I see why…  The select of the view is picking d.ino which is null because there 
is no match in the dirents table.

Thanks guys!

> On Aug 11, 2022, at 08:23, Perry Smith  wrote:
> 
> I’m tempted to ask “Is this a bug” but I predict there is an explanation.
> 
> I have a view:
> find_dups=# \sv+ dateien
> 1   CREATE OR REPLACE VIEW public.dateien AS
> 2SELECT d.id <http://d.id/>,
> 3   d.basename,
> 4   d.parent_id,
> 5   d.ino,
> 6   d.ext,
> 7   i.ftype,
> 8   i.uid,
> 9   i.gid,
> 10  i.mode,
> 11  i.mtime,
> 12  i.nlink,
> 13  i.size,
> 14  i.sha1,
> 15  i.file_type
> 16 FROM dirents d
> 17   FULL JOIN inodes i USING (ino)
> 
> find_dups=# \d inodes
> Table "public.inodes"
>Column   |  Type  | Collation | Nullable | Default
> ++---+--+-
>  ino| bigint |   | not null |
>  ftype  | character varying  |   | not null |
>  uid| bigint |   | not null |
>  gid| bigint |   | not null |
>  mode   | bigint |   | not null |
>  mtime  | timestamp without time zone|   | not null |
>  nlink  | bigint |   | not null |
>  size   | bigint |   | not null |
>  sha1   | character varying  |   |  |
>  created_at | timestamp(6) without time zone |   | not null |
>  updated_at | timestamp(6) without time zone |   | not null |
>  file_type  | character varying  |   |  |
> Indexes:
> "inodes_pkey" PRIMARY KEY, btree (ino)
> "index_inodes_on_ftype_and_size_and_file_type_and_sha1_and_nlink" btree 
> (ftype, size, file_type, sha1, nlink)
> Referenced by:
> TABLE "dirents" CONSTRAINT "fk_rails_f076303053" FOREIGN KEY (ino) 
> REFERENCES inodes(ino)
> 
> find_dups=# \d dirents
>   Table "public.dirents"
>Column   |  Type  | Collation | Nullable | 
>   Default
> ++---+--+-
>  id | bigint |   | not null | 
> nextval('dirents_id_seq'::regclass)
>  basename   | character varying  |   | not null |
>  parent_id  | bigint |   |  |
>  ino| bigint |   | not null |
>  created_at | timestamp(6) without time zone |   | not null |
>  updated_at | timestamp(6) without time zone |   | not null |
>  ext| character varying  |   |  |
> Indexes:
> "dirents_pkey" PRIMARY KEY, btree (id)
> "index_dirents_on_basename" btree (basename)
> "index_dirents_on_ext" btree (ext)
> "index_dirents_on_ino" btree (ino)
> "index_dirents_on_parent_id_and_basename" UNIQUE, btree (parent_id, 
> basename)
> Foreign-key constraints:
> "fk_rails_4dfefe0fc0" FOREIGN KEY (parent_id) REFERENCES dirents(id) ON 
> DELETE CASCADE
> "fk_rails_f076303053" FOREIGN KEY (ino) REFERENCES inodes(ino)
> Referenced by:
> TABLE "dirents" CONSTRAINT "fk_rails_4dfefe0fc0" FOREIGN KEY (parent_id) 
> REFERENCES dirents(id) ON DELETE CASCADE
> 
>  I do a select and I get:
> 
> find_dups=# select id, basename, ext, parent_id, ino, sha1 from dateien where 
> sha1 = '36f53d60353e0de6ed55d9da70a36b17559039f3' order by parent_id;
> id|   basename   | ext | parent_id |   ino|   sha1
> --+--+-+---+--+--
>  85276821 | VC5Y8191.CR2 | CR2 |  85273064 | 70163023 | 
> 36f53d60353e0de6ed55d9da70a36b17559039f3
>  85829158 | VC5Y8191.CR2 | CR2 |  85827904 |79366 | 
> 36f53d60353e0de6ed55d9da70a36b17559039f3
>   |  | |   |  | 
> 36f53d60353e0de6ed55d9da70a36b17559039f3
> (3 rows)
> 
> How can the third line exist?  Or, perhaps I should ask, what is the third 
> line telling me?
> 
> Thank you,
> Perry
> 
> 
> 



signature.asc
Description: Message signed with OpenPGP


Can I get the number of results plus the results with a single query?

2022-08-15 Thread Perry Smith
I like to have what I call “baby sitting” messages such as “Completed 15 out of 
1023”.  To do this, I need the number of results a query returns but I also 
need the results.

Currently I’m doing this with two queries such as:

SELECT COUNT(*) FROM table WHERE …. expression …
SELECT * FROM table WHERE …. expression …

But this requires two queries.  Is there a way to do the same thing with just 
one query somehow?

I’ve been toying with row_number() and then sort by row_number descending and 
pick off the first row as the total number.  The problem is that mucks with the 
order of the original query because of the sort on row_number.  I’ve even 
thought about reversing the order and then reversing the order a second time 
but that seems to be getting more than a little absurd.

Thank you for your time,
Perry



signature.asc
Description: Message signed with OpenPGP


Re: ***SPAM*** Re: Can I get the number of results plus the results with a single query?

2022-08-15 Thread Perry Smith


> On Aug 15, 2022, at 08:55, David G. Johnston  
> wrote:
> 
> On Monday, August 15, 2022, Perry Smith  <mailto:p...@easesoftware.com>> wrote:
> I’ve been toying with row_number() and then sort by row_number descending and 
> pick off the first row as the total number.
> 
> Use count as a window function.

I see others are commenting after David’s update so:

Thank you David.

This seems to work for me:

SELECT count(*) OVER (), id, basename, sha1 FROM dateien WHERE (lower(ext) in ( 
'pxd' ) and ftype = 'file') ORDER BY sha1;

This has, e.g. 73, in the first column for all of the rows.



signature.asc
Description: Message signed with OpenPGP


With Recursive / Recursive View question

2022-08-20 Thread Perry Smith
This select is almost instant:

WITH RECURSIVE pathname(id, parent_id, basename) AS (
SELECT child.id, child.parent_id, child.basename
FROM dirents child
WHERE basename = '10732.emlx'
  UNION ALL
SELECT parent.id, parent.parent_id, CONCAT(parent.basename, '/', 
child.basename)
FROM dirents parent, pathname child
WHERE parent.id = child.parent_id
)
SELECT basename FROM pathname where parent_id IS NULL;

Note that the non-recursive term selects the children and the recursion is 
“out” towards the ancestors.

This select doesn’t complete before I get impatient:

CREATE RECURSIVE VIEW pathname(id, basename, parent_id, ino, ext, fullpath) AS
 SELECT id, basename, parent_id, ino, ext, basename
 FROM dirents
 WHERE parent_id IS NULL
   UNION ALL
 SELECT child.id, child.basename, child.parent_id, child.ino, child.ext, 
CONCAT(parent.fullpath, '/', child.basename)
 FROM dirents child, pathname parent
 WHERE parent.id = child.parent_id;

SELECT * FROM pathname WHERE basename = '10732.emlx’;

In this case, the non-recursive term starts at the top of the directory trees 
and the recursion works “in” towards the children.

I’m not surprised that the first is fast and the second is very slow.  My 
problem is I currently have a file called recurse.sql which is the top query.  
I go in and edit that file and then execute it via psql -f recurse.sql.  What 
I’m attempting to do in the second example is to create a view and then use 
select on the view to select the rows that I’m looking for.

To rephrase, is it possible to write a view that would work from the child 
terms out towards the ancestors?

Thank you for your time,
Perry



signature.asc
Description: Message signed with OpenPGP


Re: ***SPAM*** Re: With Recursive / Recursive View question

2022-08-20 Thread Perry Smith


> On Aug 20, 2022, at 19:38, Christophe Pettus  wrote:
> 
> 
> 
>> On Aug 20, 2022, at 15:42, Perry Smith  wrote:
>> 
>> To rephrase, is it possible to write a view that would work from the child 
>> terms out towards the ancestors?
> 
> Assuming that the concern is that you want to parameterize this predicate:
> 
>   WHERE basename = '10732.emlx'
> 
> ... you might consider an SQL function taking basename as a parameter.

Yea.  If I did a function, I would just pass in the id.  I’ve used functions 
only rarely.  For whatever reason, I’ve always been very skittish around them.  
But perhaps I need to grow up.

Thank you again,
Perry



signature.asc
Description: Message signed with OpenPGP


Re: ***SPAM*** Re: With Recursive / Recursive View question

2022-08-20 Thread Perry Smith

> On Aug 20, 2022, at 19:38, Christophe Pettus  wrote:
> 
> 
>> On Aug 20, 2022, at 15:42, Perry Smith  wrote:
>> 
>> To rephrase, is it possible to write a view that would work from the child 
>> terms out towards the ancestors?
> 
> Assuming that the concern is that you want to parameterize this predicate:
> 
>   WHERE basename = '10732.emlx'
> 
> ... you might consider an SQL function taking basename as a parameter.

That wasn’t so bad…

CREATE OR REPLACE FUNCTION pathname(in_id bigint)
RETURNS character varying AS
$$
DECLARE
  fullpath character varying;

BEGIN
  WITH RECURSIVE pathname(id, parent_id, basename) AS (
  SELECT child.id, child.parent_id, child.basename
  FROM dirents child
  WHERE child.id = in_id
UNION ALL
  SELECT parent.id, parent.parent_id, CONCAT(parent.basename, '/', 
child.basename)
  FROM dirents parent, pathname child
  WHERE parent.id = child.parent_id
  )
  SELECT basename INTO fullpath FROM pathname where parent_id IS NULL;
  RETURN fullpath;
END;
$$ LANGUAGE plpgsql;

SELECT pathname(id) FROM dirents WHERE basename = 'OSX';

Thank you … again! :-)
Perry



signature.asc
Description: Message signed with OpenPGP


WARNING: could not open statistics file "pg_stat_tmp/global.stat": Operation not permitted

2022-09-08 Thread Perry Smith
This is an issue when PostgreSQL is running inside a container.  In my quest to 
find an answer, I’ve discovered three instances that it has come up and various 
people have talked about fixes but no one seemed to notice what I found.

I opened an issue here[1].

From within the container, files which I assume are created by PostgreSQL are 
ending up being owned by root rather than Postgres.  Thus, to me, it appears to 
NOT be an issue of mapping internal UIDs and GIDs to external IDs since there 
should not be anything outside the PostgreSQL container creating files inside 
Postgres’ data directory.

The reason I’m sending this note to the general list is to ask how bad is this 
error?  Some “solutions” are to make the pg_stat_tmp directory internal to the 
image and that somehow resolves the issue but I don’t think anyone really 
understands why and things like that bother me.  But I’m also curious if that 
appears to be a viable solution.  The result will be that when the Postgres is 
stopped and the container exited, the next time Postgres starts back up, the 
pg_stat_tmp directory will be gone.  Is that ok?  Does Postgres store anything 
that that needs to survive a restart?

Thank you for your help,
Perry
[1] https://github.com/docker-library/docs/issues/2188#issue-1367170047 




signature.asc
Description: Message signed with OpenPGP


Re: ***SPAM*** Re: WARNING: could not open statistics file "pg_stat_tmp/global.stat": Operation not permitted

2022-09-08 Thread Perry Smith


> On Sep 8, 2022, at 10:43 PM, Tom Lane  wrote:
> 
> Perry Smith  writes:
>> From within the container, files which I assume are created by
>> PostgreSQL are ending up being owned by root rather than Postgres.
> 
> If it looks that way from *inside* the container, that's not good
> --- wouldn't that prevent Postgres from reading the files?
> 
>> The reason I’m sending this note to the general list is to ask how bad
>> is this error?  Some “solutions” are to make the pg_stat_tmp directory
>> internal to the image and that somehow resolves the issue but I don’t
>> think anyone really understands why and things like that bother me.  But
>> I’m also curious if that appears to be a viable solution.  The result
>> will be that when the Postgres is stopped and the container exited, the
>> next time Postgres starts back up, the pg_stat_tmp directory will be
>> gone.  Is that ok?
> 
> pg_stat_tmp exists specifically because it holds only temporary files,
> cf
> 
> https://www.postgresql.org/docs/devel/storage-file-layout.html
> 
> It's explicitly cleared out during server start.
> 
> The only reason to put it outside the data directory is to make it
> *less* persistent than the rest of PG's files, say by putting it
> on a RAM disk.  You sound like you've set it up to be *more*
> persistent (ie outside the container not inside), which surely is
> exactly backwards.

The data directory is outside so it is persistent.  The pg_stat_tmp is inside 
the data directory.  This is how whoever builds the “official” Postgres images 
set things up.

If the right solution is to not make it part of the data directory, I can 
change that.  One of the solutions explains how which solves the Postgres issue 
but still leaves open the question of how and why do some of the files get 
owned by root (which probably is a Docker or a Linux issue).