Re: SQL Property Graph Queries (SQL/PGQ)

Junwang Zhao Sun, 31 Aug 2025 04:05:56 -0700

Hi Ashutosh,

Thanks for the new patch set.


On Thu, Aug 7, 2025 at 6:47 PM Ashutosh Bapat
<[email protected]> wrote:
>
> Hi Amit,
> Thanks for your comments
>
> On Wed, Jul 23, 2025 at 2:08 PM Amit Langote <[email protected]> wrote:
> >
> > Hi Ashutosh,
> >
> > On Tue, Jul 22, 2025 at 8:40 PM Ashutosh Bapat
> > <[email protected]> wrote:
> > > Here's the patchset rebased on the latest master.
> >
> > Thanks for the update. I was going through the patch series today and
> > had a few comments on the structure that might help with reviewing and
> > eventually committing, whoever ends up doing that. :)
> >
> > I noticed that some functions, like graph_table_property_reference(),
> > are introduced in one patch and then renamed or removed in a later
> > one. It’s easier to review if such refactoring is avoided across
> > patches -- ideally, each patch should introduce code in its final
> > intended form. That also helps reduce the number of patches and makes
> > each one more self-contained.
> >
> > One of the later patches (0004) collects several follow-up changes.
> > While it fixes a real bug -- a crash when GRAPH_TABLE is used inside
> > plpgsql due to a conflict with the columnref hook -- it also includes
> > incidental cleanups like switching to foreach_node(), updating
> > comments, and adjusting function signatures. Those would be better
> > folded into the patches that introduced the relevant code, rather than
> > grouped into a catch-all at the end. That keeps each patch focused and
> > easier to review -- and easier to merge when committing.
> >
> > A general principle that might help: if you're refactoring existing
> > code, a standalone preliminary patch makes sense. But if you're
> > tweaking something just added in the same series, it’s usually better
> > to squash that into the original patch. The rename from
> > graph_table_property_reference() to transformGraphTablePropertyRef()
> > may be a fair exception since it reflects a shift prompted by the bug
> > fix -- but most other adjustments could be folded in without loss of
> > clarity.
> >
> > I understand the intent to spell out each change, but if the details
> > are incidental to the overall design, they don’t necessarily need to
> > be split out. Explaining the reasoning in the thread is always
> > helpful, but consolidating the patches once the design has settled
> > makes things easier for both reviewers and committers.
>
> 0001 is almost the same patch that Peter posted almost a year ago.
> Each of 0002 onwards patches contains logically grouped changes. I
> have changed 0001 minimally (so that it continues to apply, compile
> and pass regression). I think this arrangement will make it easy for
> Peter to review the changes. It will help him understand what all
> changed since 0001 and each change in its own context. That has led to
> a lot of overlap between 0002 and 0001 which I think should be
> squashed together even now. But I would like to know what Peter thinks
> - what would make his review easier. This arrangement might also help
> Junwang, who has reviewed some patches, to continue his incremental
> review.
>
> I have rearranged the patches 0002 to 0014 so that the same change
> isn't revised in multiple patches. Hope that helps.
>
> I am also attaching a single patch as well to this email, so that
> newer reviewers can review the changes as a whole.
>
> SQL_PGQ_20250807.zip is separate patches zipped together.
> SQL_PGQ_one_patch_20250807.zip is a single diff with changes from all
> the patches squashed together. It's still zipped to avoid the email
> being held for moderation.
>
> >
> > Finally, attaching the patches directly, with versioned names like
> > v8-000x, instead of zipping them helps.  Many folks (myself included)
> > will casually skip a zip file because of the small hassle of unzipping
> > just to read a patch. I once postponed reading such a patch and didn’t
> > get back to it for quite a while. :)
>
> I used to attach those as separate patches till [1], but after that
> the total size of the patchset grew so much that my emails used to get
> held back for moderation. So I started attaching zip files. The
> increase in size is mostly due to the 0014 patch. If we think that
> it's not needed or can be trimmed down, we will be able to attach the
> patchset as separate attachments.
>
> Here's what changed between the last patchset and this
>
> 0001 has some TODOs, FIXMEs and XXXs. I and Junwang had fixed some of
> them earlier. This patchset fixes some more. Below is my analysis of
> each of them.
> +
> + <!-- TODO: multiple path patterns in a graph pattern (comma-separated) -->
> +
> + <!-- TODO: quantifiers -->
>
> I think we should remove these TODOs since we are not going to support
> these cases in this patchset. We will add the documentation when we
> implement these features.
>
> +/*
> + * 15.2
> + * PG_DEFINED_LABEL_SETS view
> + */
> +
> +-- TODO
> +
> +
> +/*
> + * 15.3
> + * PG_DEFINED_LABEL_SET_LABELS view
> + */
> +
> +-- TODO
> +
> +
> +/*
> + * 15.4
> + * PG_EDGE_DEFINED_LABEL_SETS view
> + */
> +
> +-- TODO
> +
> +/*
> + * 15.6
> + * PG_EDGE_TRIPLETS view
> + */
> +
> +-- TODO
> +
> +/*
> + * 15.15
> + * PG_VERTEX_DEFINED_LABEL_SETS view
> + */
> +
> +-- TODO
>
> All these views are related to the defined label set. IIUC, defined
> label set is a set of labels which is shared by multiple edges or
> vertices of a property graph. Please note it's the whole set shared;
> not just individual labels in the set. In that sense, I think, the
> labels associated with each edge or vertex table in a property graph
> form a defined label set. That way the property graph element's OID
> becomes the defined label set's identifier. I am not sure if this view
> has any practical use for tabular property graphs. It's not clear to
> me what these views will contain. Since they are not essential for
> SQL/PGQ functionality, I have not implemented them.
>
> + /*
> + * Now check number and names.
> + *
> + * XXX We could provide more detail in the error messages, but that would
> + * probably only be useful for some ALTER commands, because otherwise it's
> + * not really clear which label definition is the wrong one, and so you'd
> + * have to construct a rather verbose report to be of any use. Let's keep
> + * it simple for now.
> + */
>
> Agree with XXX. I think the code is good enough as is. We can leave
> XXX there in case we expect someone to improve it in future. Otherwise
> we should remove XXX as well.
>
> +
> + rel = table_open(PropgraphLabelPropertyRelationId, RowShareLock);
> + ScanKeyInit(&key[0],
> + Anum_pg_propgraph_label_property_plppropid,
> + BTEqualStrategyNumber, F_OIDEQ,
> + ObjectIdGetDatum(propoid));
> + scan = systable_beginscan(rel, InvalidOid /* FIXME */ ,
> + true, NULL, 1, key);
>
> I think the FIXME is to add the correct index OID.
> pg_propgraph_label_property_label_prop_index contains plppropid as a
> key but it's not the first column. So probably some other, yet not
> defined, index is expected here?
>
> +ALTER PROPERTY GRAPH g3 ALTER VERTEX TABLE t3 DROP LABEL t3l3;
> +ALTER PROPERTY GRAPH g3 DROP VERTEX TABLES (t2); -- fail (TODO:
> dubious error message)
> +ERROR: cannot drop vertex t2 of property graph g3 because other
> objects depend on it
> +DETAIL: edge e1 of property graph g3 depends on vertex t2 of property graph 
> g3
> +HINT: Use DROP ... CASCADE to drop the dependent objects too.
>
> I don't see what's dubious about this error message. edge e1 connects
> t2 and t1. Dropping t2 would leave e1 dangling; error is preventing
> it. Looks similar to how we prevent a referenced table being dropped.
> Or is it expected that e1 be dropped without specifying cascade when
> t2 is dropped since e1 has no existence without t2?
>
> Also the patchset fixes the CI failure from the previous patchset.
>
> The patches are reordered slightly. Each patch contains a commit
> message that explains the changes in that patch.
>
> [1] 
> https://www.postgresql.org/message-id/caexhw5sm+zgvzr1428fsdhuwc-fu2-6zqr5j91klh6vzawy...@mail.gmail.com
>
>
>
>
>
> --
> Best Wishes,
> Ashutosh Bapat

I have some review comments, and hope some of them are helpful.

1.

doc/src/sgml/ddl.sgml

+<programlisting>
+CREATE PROPERTY GRAPH myshop
+    VERTEX TABLES (
+        products LABEL product,
+        customers LABEL customer,
+        orders LABEL order
+    )
+    EDGE TABLES (
+        order_items SOURCE orders DESTINATION products LABEL contains,
+        customer_orders SOURCE customers DESTINATION orders LABEL has
+    );
+</programlisting>

order is a reserved keyword, so the following example will fail, we
should not give a wrong example in our document.

2.

doc/src/sgml/information_schema.sgml

+  <title><literal>pg_element_table_key_columns</literal></title>
+
+  <para>
+   The view <literal>pg_element_key_columns</literal> identifies which columns
+   are part of the keys of the element tables of property graphs defined in
+   the current database.  Only those property graphs are shown that the
+   current user has access to (by way of being the owner or having some
+   privilege).
+  </para>


+  <title><literal>pg_element_table_properties</literal></title>
+
+  <para>
+   The view <literal>pg_element_table_labels</literal> shows the definitions
+   of the properties for the element tables of property graphs defined in the
+   current database.  Only those property graphs are shown that the current
+   user has access to (by way of being the owner or having some privilege).
+  </para>

the <title> and the <para> doesn't match.

3.

doc/src/sgml/queries.sgml

+   <para>
+    A path pattern thus matches a sequence of vertices and edges.  The
+    simplest possible path pattern is
+<programlisting>
+()
+</programlisting>
+    which matches a single vertex.  The next simplest pattern would be
+<programlisting>
+()-[]->()
+</programlisting>
+    which matches a vertex followed by an edge followed by a vertex.  The
+    characters <literal>()</literal> are a vertex pattern and the characters
+    <literal>-[]-></literal> are an edge pattern.
+   </para>

the description seems wrong, when a user writes (), it should match
all vertices in a property graph, for example:

explain SELECT customer_name FROM GRAPH_TABLE (myshop MATCH () COLUMNS
(1 AS customer_name));

[local] zjw@postgres:5432-73666=# explain SELECT customer_name FROM
GRAPH_TABLE (myshop MATCH () COLUMNS (1 AS customer_name));
                            QUERY PLAN
------------------------------------------------------------------
 Append  (cost=0.00..89.40 rows=3960 width=4)
   ->  Seq Scan on customers  (cost=0.00..18.50 rows=850 width=4)
   ->  Seq Scan on orders  (cost=0.00..32.60 rows=2260 width=4)
   ->  Seq Scan on products  (cost=0.00..18.50 rows=850 width=4)
(4 rows)

4.

doc/src/sgml/ref/create_property_graph.sgml

+<programlisting>
+CREATE PROPERTY GRAPH g1
+    VERTEX TABLES (
+        v1 LABEL foo PROPERTIES (x AS a * 2) LABEL bar PROPERTIES (x AS a * 2)
+    ) ...
+</programlisting>
+      but this would not:
+<programlisting>
+CREATE PROPERTY GRAPH g1
+    VERTEX TABLES (
+        v1 LABEL foo PROPERTIES (x AS a * 2) LABEL bar PROPERTIES (x AS a * 10)
+    ) ...
+</programlisting></para>

The expr after  PROPERTIES should be (a * 2 AS x)

5.

src/backend/catalog/sql_features.txt

+G034 Path concatenation YES SQL/PGQ required

Do we support path concatenation?

+G070 Label expression: label disjunction NO SQL/PGQ required

I think this should be a YES?

6.

src/backend/commands/tablecmds.c

The description RenameRelation and RemoveRelations should adapt
property graph, below are diffs I suggest:

 /*
- * Execute ALTER TABLE/INDEX/SEQUENCE/VIEW/MATERIALIZED VIEW/FOREIGN TABLE
+ * Execute ALTER TABLE/INDEX/SEQUENCE/VIEW/MATERIALIZED VIEW/FOREIGN
TABLE/PROPERTY GRAPH
  * RENAME
  */
 ObjectAddress
@@ -4210,8 +4210,8 @@ RenameRelation(RenameStmt *stmt)

        /*
         * Grab an exclusive lock on the target table, index, sequence, view,
-        * materialized view, or foreign table, which we will NOT release until
-        * end of transaction.
+        * materialized view, foreign table or property graph, which we will NOT
+        * release until end of transaction.


 /*
  * RemoveRelations
  *             Implements DROP TABLE, DROP INDEX, DROP SEQUENCE, DROP VIEW,
- *             DROP MATERIALIZED VIEW, DROP FOREIGN TABLE
+ *             DROP MATERIALIZED VIEW, DROP FOREIGN TABLE, DROP PROPERTY GRAPH

7.

src/backend/nodes/nodeFuncs.c

exprLocation should have an arm for T_GraphPropertyRef? I suggest:

@@ -1811,6 +1811,9 @@ exprLocation(const Node *expr)
                case T_PartitionRangeDatum:
                        loc = ((const PartitionRangeDatum *) expr)->location;
                        break;
+               case T_GraphPropertyRef:
+                       loc = ((const GraphPropertyRef *) expr)->location;
+                       break;

8.

src/backend/rewrite/rewriteGraphTable.c

patch 0002

+ /*
+ * Label expressions from two element patterns need to be
+ * conjucted. Label conjuction is not supported right now

typo, should be conjuncted and conjunction.

9.

The line length of some code is quite long, which isn't ideal, but
it's not a major issue. It can be formatted before being committed.

BTW, the patch set seems not to apply to the current master.

I will go another round of review of the test cases, thanks.

-- 
Regards
Junwang Zhao

Re: SQL Property Graph Queries (SQL/PGQ)

Reply via email to