davidzollo commented on issue #10593:
URL: https://github.com/apache/seatunnel/issues/10593#issuecomment-4060751302

   Thanks for the detailed report and for confirming the workaround works.
   
   **Root Cause Analysis (Source Code Level):**
   
   There are two separate code paths that block partitioned table support:
   
   **1. Table Discovery (`TableDiscoveryUtils.listTables()`, lines 35-90):**
   
   The table listing query filters by `TABLE_TYPE = 'BASE TABLE'`. In 
PostgreSQL, partitioned parent tables created with `PARTITION BY` have 
`TABLE_TYPE = 'PARTITIONED TABLE'` in `INFORMATION_SCHEMA.TABLES`, not `'BASE 
TABLE'`. This means the parent table is silently excluded from discovery.
   
   **2. Row Count Estimation (`PostgresUtils.queryApproximateRowCnt()`, lines 
87-106):**
   
   ```java
   "SELECT reltuples FROM pg_class r WHERE relkind = 'r' AND relname = '%s';"
   ```
   
   This only matches regular tables (`relkind = 'r'`). Partitioned parent 
tables have `relkind = 'p'`. Even if the table were discovered, this query 
would fail with `SQLException("No result returned...")`.
   
   **Recommended Fix:**
   
   The fix requires changes in two places:
   
   1. **`TableDiscoveryUtils.listTables()`**: Add `'PARTITIONED TABLE'` to the 
`TABLE_TYPE` filter:
      ```sql
      WHERE TABLE_TYPE IN ('BASE TABLE', 'PARTITIONED TABLE')
      ```
   
   2. **`PostgresUtils.queryApproximateRowCnt()`**: Change `relkind = 'r'` to 
`relkind IN ('r', 'p')`:
      ```sql
      SELECT reltuples FROM pg_class r WHERE relkind IN ('r', 'p') AND relname 
= '%s';
      ```
      Note: For partitioned tables, `reltuples` on the parent may be 0 in some 
PostgreSQL versions. A more robust approach would be to sum child partitions:
      ```sql
      SELECT COALESCE(
        (SELECT reltuples FROM pg_class WHERE relkind = 'r' AND relname = ?),
        (SELECT SUM(c.reltuples) FROM pg_inherits i
         JOIN pg_class c ON c.oid = i.inhrelid
         WHERE i.inhparent = (SELECT oid FROM pg_class WHERE relname = ?))
      )
      ```
   
   3. **CDC-specific consideration**: For Debezium-based CDC, the `pgoutput` 
plugin publishes WAL changes for child partitions. When subscribing to the 
parent table's publication, Debezium should receive changes from all child 
partitions. Verify that the publication is set up correctly:
      ```sql
      CREATE PUBLICATION seatunnel_pub FOR TABLE lazada_order_details;
      -- PostgreSQL 13+ automatically includes partitions
      ```
   
   **Workaround (current):** As you've confirmed, listing child partition 
tables individually works. For pg_partman-managed tables, you could use a 
wildcard pattern if supported, or query `pg_inherits` to dynamically discover 
child partitions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to