Question about accessing partitions whose name includes the schema name and a period - is this correct?

2023-04-19 Thread Jay Stanley



I've come across some interesting behavior with regards to creating a 
partition of a table that includes the schema name and a period in the 
beginning, so that the resulting name is like 
"my_schema"."my_schema.my_table_should_not_work".


After created it, most SQL won't access it at all, even when 
double-quoting the table name exactly, though drop seems to work.  It's 
very repeatable at least in versions up to v14.5.


Here's a script that demonstrates the issue -- in pl/pgsql, I'm able to 
create a partition on a table that has the schema and period included by 
using the format statement, then I'm unable to access the partition at 
all:




drop schema if exists my_schema cascade;

create schema my_schema;

create table my_schema.my_table(
  i  bigint not null primary key,
  dat  text)
  partition by range(i);

create table my_table_default partition of my_schema.my_table DEFAULT;
create table my_table_1 partition of my_schema.my_table for values from 
(1) to (100);


SELECT pt.relname AS partition_name
FROM pg_class pc
JOIN pg_inherits i ON i.inhparent = pc.oid
JOIN pg_class pt ON pt.oid = i.inhrelid
AND pc.relnamespace::regnamespace::text = 'my_schema';

create procedure my_schema.test()
language plpgsql
as $BODY$
begin
  execute format('create table %I partition of %I.%I for values from 
(%s) to 
(%s)','my_schema.my_table_should_not_work','my_schema','my_table','100','200');

end;
$BODY$;

call my_schema.test();

SELECT pt.relname AS partition_name
FROM pg_class pc
JOIN pg_inherits i ON i.inhparent = pc.oid
JOIN pg_class pt ON pt.oid = i.inhrelid
AND pc.relnamespace::regnamespace::text = 'my_schema';

alter table "my_schema"."my_schema.my_table_should_not_work" rename to 
'fixed;


alter table my_schema.my_table detach partition 
"my_schema.my_table_should_not_work";


select count(*) from "my_schema"."my_schema.my_table_should_not_work";



When run, the partition name ends up like this:

SELECT pt.relname AS partition_name
FROM pg_class pc
JOIN pg_inherits i ON i.inhparent = pc.oid
JOIN pg_class pt ON pt.oid = i.inhrelid
AND pc.relnamespace::regnamespace::text = 'my_schema';
 partition_name
-
 my_table_default
 my_table_default_pkey
 my_table_1
 my_table_1_pkey
 my_schema.my_table_should_not_work
 my_schema.my_table_should_not_work_pkey
(6 rows)

and, none of the 'alter table' SQL works, similar to this error: it just 
can't find it, even with correct quoting:


SQL> select count(*) from 
"my_schema"."my_schema.my_table_should_not_work";
ERROR:  relation "my_schema.my_schema.my_table_should_not_work" does not 
exist


Does anyone have any insights regarding how this partition could be 
renamed or detached?  I've reviewed the mailing list archive and other 
sources and haven't been able to find anything similar.


Thanks everyone!

-Jay Stanley, DBA
-Cycorp: The Why behind AI

Long-time lurker, first-time poster.

Re: Question about accessing partitions whose name includes the schema name and a period - is this correct?

2023-04-19 Thread Jay Stanley



On 2023-04-19 21:42, Tom Lane wrote:


Jay Stanley  writes:


I've come across some interesting behavior with regards to creating a
partition of a table that includes the schema name and a period in the
beginning, so that the resulting name is like
"my_schema"."my_schema.my_table_should_not_work".
After created it, most SQL won't access it at all, even when
double-quoting the table name exactly, though drop seems to work.


I think this has little to do with the funny table names, and much
to do with your being careless about which schema the partitions
end up in.  We intentionally don't constrain partitions to live
in the same schema as their parent.  So when you do


create schema my_schema;



create table my_schema.my_table(
i  bigint not null primary key,
dat  text)
partition by range(i);



create table my_table_default partition of my_schema.my_table DEFAULT;
create table my_table_1 partition of my_schema.my_table for values 
from

(1) to (100);


the parent "my_table" is in "my_schema", but the partitions are
(probably) in schema "public".  Your catalog-investigation query
doesn't show that, adding to your confusion.  The commands
that don't work for you are failing because you assume the
partitions are in "my_schema", except in some places where
you leave that off, and then it does work because public
is in your search_path.

regards, tom lane


Thanks, Tom!

the parent "my_table" is in "my_schema", but the partitions are 
(probably) in schema "public".


You are correct -- that example is putting the partition in the first 
schema in the search_path (cycdba in this case) - I apologies for the 
not ideal example.  The name of the partition created in the search_path 
schema does contain the schema name:


postgres=# select 
oid,relname,relnamespace,relnamespace::regnamespace::text as 
text_schema,reltype from pg_class where relname like 
'%should_not_work%';
  oid   | relname | relnamespace | 
text_schema | reltype

+-+--+-+-
 184482 | my_schema.my_table_should_not_work_pkey |16612 | 
cycdba  |   0
 184479 | my_schema.my_table_should_not_work  |16612 | 
cycdba  |  184481

(2 rows)

Modifying my example a bit, I can make it end up in my_schema:
postgres=#create procedure my_schema.test()
language plpgsql
as $BODY$
begin
  execute format('create table %I.%I partition of %I.%I for values from 
(%s) to 
(%s)','my_schema','my_schema.my_table_should_not_work','my_schema','my_table','100','200');

end;
$BODY$;
postgres-# postgres-# postgres$# postgres$# postgres$# postgres$# CREATE 
PROCEDURE

postgres=# call my_schema.test();
postgres=# CALL
postgres=# select 
oid,relname,relnamespace,relnamespace::regnamespace::text as 
text_schema,reltype from pg_class where relname like 
'%should_not_work%';
  oid   | relname | relnamespace | 
text_schema | reltype

+-+--+-+-
 184978 | my_schema.my_table_should_not_work_pkey |   184954 | 
my_schema   |   0
 184975 | my_schema.my_table_should_not_work  |   184954 | 
my_schema   |  184977

(2 rows)

After re-testing, I found that double-quoting the table name works for 
inserts, updates, and deletes: example


postgres=# insert into "my_schema"."my_schema.my_table_should_not_work" 
(i,dat) values (101,'test');

INSERT 0 1

However, it's failing on partition-management SQL like:

postgres=# alter table my_schema.my_table drop partition 
"my_schema"."my_schema.my_table_should_not_work";

ERROR:  syntax error at or near ""my_schema""
LINE 1: alter table my_schema.my_table drop partition "my_schema"."m...
  ^
-or-

postgres=# alter table my_schema.my_table drop partition 
my_schema."my_schema.my_table_should_not_work";

ERROR:  syntax error at or near "my_schema"
LINE 1: alter table my_schema.my_table drop partition my_schema."my_...

I noticed this while maintaining an in-house partition management 
procedure which was updated from constructing the 'create table... 
partition' sql using plpgsql format(), rather than constructing it 
without using format() using more naiive string concatenations.


-jay

Re: PostgreSQL DB in prod, test, debug

2024-02-15 Thread Jay Stanley

On 2024-02-15 23:51, Peter J. Holzer wrote:

On 2024-02-14 11:25:48 +0100, Daniel Gustafsson wrote: On 14 Feb 2024, 
at 10:59, Simon Connah  wrote:

This is probably a stupid question so I apologies in advance.
There is no such thing.

What I think is the best way to do this is to do a pg_dump of the
database (using the --schema-only flag) and then load it into a test
only database that gets created at the start of the unit tests and
destroyed at the end. The automated tests will insert, update,
delete and select data to test if it all still works.
If the source of truth for your schema is the database, then sure.  If 
the
source of truth is a .sql file in your source code repository then you 
should

use that.


I sort of do both for one of my projects:

I originally created the SQL script by running pg_dump on a manually
constructed test database (I do have code to create the empty schema,
but I had to insert the test data manually). That script went into the
git repo.

The test files all contain a fixture which drops and recreates the test
database using that sql file.

When I add a new test case I try to make do with the existing test data.

When I need additional data for a new test case, I create a new pristine
test database using the sql file, add the new data, and then create a
new sql file using sql_dump which is then committed with the test cases.

Same for migrations: If I need to migrate the schema, I run the
migration on the test database, then dump and commit it.

This project is small enough (86 tests in 10 files) that all test cases
can use the same test data. However, I could easily use different test
data for different tests.

hp

I had a very similar issue a few years ago; our large-ish codebase needs 
a lot of suites tests (about 100) to run daily against specific database 
conditions -- each test involved specific data in around 50 tables.  At 
first we had 'source of truth' sql scripts checked into git, and with 
jenkins would run each to create a database for the test, load it with 
the exact data needed for that suite of tests, run the tests and record 
them, then drop the database.


This worked fine for a few tests but became unmanageable as more tests 
were added and the data volume increased. Instead, I created a 'master' 
script that creates one _database template_ for each suite of tests by 
creating a blank database and running those same build scripts in git, 
then disconnecting from that target database and switching it to become 
a template.  This was re-done very infrequently -- only when we wanted 
to use a different testing database, or needed to refresh test content, 
drop tests or add more tests.  That's right - we have about 100 
templates; I've found copying a template is FAR faster than reloading it 
from SQL.


When each test is run, it creates a new database from the template 
appropriate for that suite of tests.  When the test suite completes, the 
new database is dropped.  This sped up our tests by at least 2 orders of 
magnitude, and it was in a way more reliable because each test gets a 
binary copy of the test database including exactly how vacuumed each 
table is, exact state of the indexes, every block in the same place, 
etc.   Results were definitely more consistent in how long they ran.  
Note that while a database is in template mode, it cannot be changed 
(unless a DBA switches it back to non-template-mode).


This has been in production for a few years, in multiple projects, and 
hasn't really hit any issues; the one cavaet is that you can't switch a 
database to 'template mode' if anything's logged into the target.  
According to the doc, I don't see an upper limit for the number of 
template databases but there must be one - I haven't tested it with 
1000's of templates.


See doc for 'create database DBNAME template TEMPLATENAME' as well as 
'alter database DBNAME set datistemplate = true|false'.


- jay stanley
- https://cyc.com; The Next Generation of Enterprise AI