[ 
https://issues.apache.org/jira/browse/FLINK-36812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18093139#comment-18093139
 ] 

Chanhae Oh edited comment on FLINK-36812 at 7/2/26 6:14 AM:
------------------------------------------------------------

I came across this ticket and had some thoughts on the implementation approach.

The most robust solution for parallel reads without a numeric column would be 
leveraging DB-native row identifiers (Oracle ROWID, PostgreSQL ctid, etc.), as 
they are sequential and evenly distributed by nature.

However, this is limited to specific databases and requires per-Dialect 
abstraction, making it difficult to support generically.

As a practical complement that works across all databases, one option could be 
a `scan.partition.boundary-query` for VARCHAR columns — users provide a SQL 
query returning ordered split points, and the system samples partition.num - 1 
values to generate range predicates (col < v1, v1 <= col < v2, ...). Split 
values would be used internally only, not exposed in results.

[~ouyangwuli]  If you are planning to work on this, I'm curious whether this 
aligns with your thinking or if you have a different approach in mind.


was (Author: JIRAUSER311960):
I came across this ticket and had some thoughts on the implementation approach.

The most robust solution for parallel reads without a numeric column would be 
leveraging DB-native row identifiers (Oracle ROWID, PostgreSQL ctid, etc.), as 
they are sequential and evenly distributed by nature. However, this is limited 
to specific databases and requires per-Dialect abstraction, making it difficult 
to support generically.

As a practical complement that works across all databases, one option could be 
a `scan.partition.boundary-query` for VARCHAR columns — users provide a SQL 
query returning ordered split points, and the system samples partition.num - 1 
values to generate range predicates (col < v1, v1 <= col < v2, ...). Split 
values would be used internally only, not exposed in results.

[~ouyangwuli]  If you are planning to work on this, I'm curious whether this 
aligns with your thinking or if you have a different approach in mind.

> The flink jdbc connector's 'scan.partition.column' supports the varchar field 
> type.
> -----------------------------------------------------------------------------------
>
>                 Key: FLINK-36812
>                 URL: https://issues.apache.org/jira/browse/FLINK-36812
>             Project: Flink
>          Issue Type: Improvement
>          Components: Connectors / JDBC
>    Affects Versions: jdbc-3.2.0
>            Reporter: ouyangwulin
>            Priority: Major
>             Fix For: jdbc-4.1.0
>
>
> The scan.partition.column must be a numeric, date, or timestamp column from 
> the table in question.  But in many cases, tables don't have numeric, date, 
> or timestamp columns, and we need to support a varchar field to make 
> concurrent ingest useful in a wider range of scenarios.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to