On Mon, Jun 12, 2023 at 01:47:02AM +0000, Wei Wang (Fujitsu) wrote: > # load test cases from the regression tests > -my @regress_tests = split /\s+/, $ENV{REGRESS}; > +#my @regress_tests = split /\s+/, $ENV{REGRESS}; > +my @regress_tests = ("create_type", "create_schema", "create_rule", > "create_index"); > ``` > I think @regress_tests should include all SQL files, instead of just four. > BTW, > the old way (using `split /\s+/, $ENV{REGRESS}`) doesn't work in meson.
I have been studying what is happening on this thread, and got a look at the full patch set posted on 2023-06-08. First, the structure of the patch set does not really help much in understanding what would be the final structure aimed for. I mean, I understand that the goal is to transform the DDL Nodes at a given point in time, fetched from the event query code paths, into a structure on top of which we want to be apply to apply easily filtering rules because there could be schema changes or even more.. But, take patch 0001. It introduces ObjTree to handle the transformation state from the DDL nodes, and gets replaced by jsonb later on in 0008. This needs to be reworked and presented in a shape that's suited for review, split into more parts so as this is manageable. In terms of diffs, for a total of 12k lines, the new test module represents 4k and the backend footprint is a bit more than 6k. Here is a short summary before entering more in the details: I am very concerned by the design choices done. As presented, this stuff has an extremely high maintenance cost long-term because it requires anybody doing a change in the parsed tree structures of the DDLs replicated to also change the code paths introduced by this patch set. It is very hard to follow what should be changed in them in such cases, and what are the rules that should be respected to avoid breaking the transformation of the parsed trees into the parsable structure on top of which could be applied some filtering rules (like schema changes across nodes, for example). + case AT_DropOf: + new_jsonb_VA(state, "NOT OF", false, 1, + "type", jbvString, "not of"); + break; [...] + case REPLICA_IDENTITY_DEFAULT: + new_jsonb_VA(state, NULL, true, 1, "ident", jbvString, "DEFAULT"); + break; The patch is made of a lot of one-one mapping between enum structures and hardcoded text used in the JSON objects, making it something hard to maintain if a node field is added, removed or even updated into something else. I have mentioned that during PGCon, but here is something more durable: why don't we generate more of this code automatically based on the structure of the nodes we are looking at? As far as I understand, there are two raw key concepts: 1) We want to generate structures of the DDL nodes at a given point in time to be able to pass it across the board and be able to change its structure easily. This area is something that pretty much looks like what we are doing for DDLs in src/backend/nodes/. A bit more below.. 2) We want to apply rules to the generated structures. Rules would apply across a logical replication setup (on the subscriber, because that's the place where we have a higher version number than the origin for major upgrades or even minor upgrades, right?). If I am not missing something, that would be a pretty good fit for a catalog, with potentiall some contents generated from a .dat to handle the upgrade cases when the DDL nodes have structure changes. This could be always updated once a year, for example, but that should make the maintenance cost much more edible in the long-term. Point 1) is the important bit to tackle first, and that's where I don't us going far if we don't have more automation when it comes to the generation of this code. As a first version of this feature, we could live with restrictions that allow us to build a sound basic infrastructure. Anyway, the more I look at the patch set, the more I see myself doing again what I have been doing recently with query jumbling and pg_node_attr, where we go through a Node tree and build in a state depending on what we are scanning: deparse_utility_command() and deparse_simple_command() are the entry points, generating a JSON blob from the trees. The automation of the code has its limits, though, which is where we need to be careful about what kind of node_attr there should be. Note that assigning node_attr in the structure definitions makes it really easy to document the choices made, which is surely something everybody will need to care about if manipulating the Node structures of the DDLs. Here are some takes based on my read of the patch: - The code switching the structures is close to outfuncs.c. I was wondering first if there could be an argument for changing outfuncs.c to use a JSON format, but discarded that based on the next point. - The enum/text translation worries me a lot. The key thing is that we translate the enums into hardcoded text fields. Thinking differently: why don't we use in the JSON blobs as text the *same* strings as what we C enum values? For example, take all the subcommands of ALTER TABLE.. AT_DropConstrain is changed to "DROP CONSTRAINT blah". With some automation we could switch that to a simpler structure where a key would be a subcommand, followed by an array with all its elements, say roughly: "AT_DropConstraint":["if_exists":true/false,"constraint":"name"..] - DefElems have similar issues, where there values are hardcoded in the deparsing path. - Nodes that do not have support for deparsing could use an attribute to ignore them in the code generated, with a node_attr inherited so as it reflects to anything down them. - Some of the node fields could be ignored, as required. For example, "TABLE-ELEMENTS array creation" caught my attention in this area. - I am not sure if that's entirely necessary, but scans on a catalog's attribute for a given value in the nodes could be enforced as a node_attr, as these can have arguments optionally. Similarly, the surrounding code could rely on macros with the internals handled as separate functions. In passing, I have noticed a few things while looking at the full diffs.. CREATE ROLE ddl_testing_role SUPERUSER; +WARNING: roles created by regression test cases should have names starting with "regress_" [...] CREATE TABLESPACE ddl_tblspace LOCATION ''; +WARNING: tablespaces created by regression test cases should have names starting with "regress_" The regression tests use incompatible names for one role and one tablespace. This can be triggered when compiling with -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS. The tests generate a bunch of stuff like that: +-- parent table defintion +CREATE TABLE orders( + id int, + name varchar, + description text, + price float4, + quantity int, + purchase_date date +); +NOTICE: deparsed json: [very long one-liner] Any diffs produced because of a change or another is impossible to follow in this format. Using something like JSON means that you could use a pretty output. Still, bloating the output of the tests with thousands of line may not be the best thing ever in terms of debuggability. -- Michael
signature.asc
Description: PGP signature