davidzollo commented on issue #10593:
URL: https://github.com/apache/seatunnel/issues/10593#issuecomment-4060751302
Thanks for the detailed report and for confirming the workaround works.
**Root Cause Analysis (Source Code Level):**
There are two separate code paths that block partitioned table support:
**1. Table Discovery (`TableDiscoveryUtils.listTables()`, lines 35-90):**
The table listing query filters by `TABLE_TYPE = 'BASE TABLE'`. In
PostgreSQL, partitioned parent tables created with `PARTITION BY` have
`TABLE_TYPE = 'PARTITIONED TABLE'` in `INFORMATION_SCHEMA.TABLES`, not `'BASE
TABLE'`. This means the parent table is silently excluded from discovery.
**2. Row Count Estimation (`PostgresUtils.queryApproximateRowCnt()`, lines
87-106):**
```java
"SELECT reltuples FROM pg_class r WHERE relkind = 'r' AND relname = '%s';"
```
This only matches regular tables (`relkind = 'r'`). Partitioned parent
tables have `relkind = 'p'`. Even if the table were discovered, this query
would fail with `SQLException("No result returned...")`.
**Recommended Fix:**
The fix requires changes in two places:
1. **`TableDiscoveryUtils.listTables()`**: Add `'PARTITIONED TABLE'` to the
`TABLE_TYPE` filter:
```sql
WHERE TABLE_TYPE IN ('BASE TABLE', 'PARTITIONED TABLE')
```
2. **`PostgresUtils.queryApproximateRowCnt()`**: Change `relkind = 'r'` to
`relkind IN ('r', 'p')`:
```sql
SELECT reltuples FROM pg_class r WHERE relkind IN ('r', 'p') AND relname
= '%s';
```
Note: For partitioned tables, `reltuples` on the parent may be 0 in some
PostgreSQL versions. A more robust approach would be to sum child partitions:
```sql
SELECT COALESCE(
(SELECT reltuples FROM pg_class WHERE relkind = 'r' AND relname = ?),
(SELECT SUM(c.reltuples) FROM pg_inherits i
JOIN pg_class c ON c.oid = i.inhrelid
WHERE i.inhparent = (SELECT oid FROM pg_class WHERE relname = ?))
)
```
3. **CDC-specific consideration**: For Debezium-based CDC, the `pgoutput`
plugin publishes WAL changes for child partitions. When subscribing to the
parent table's publication, Debezium should receive changes from all child
partitions. Verify that the publication is set up correctly:
```sql
CREATE PUBLICATION seatunnel_pub FOR TABLE lazada_order_details;
-- PostgreSQL 13+ automatically includes partitions
```
**Workaround (current):** As you've confirmed, listing child partition
tables individually works. For pg_partman-managed tables, you could use a
wildcard pattern if supported, or query `pg_inherits` to dynamically discover
child partitions.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]