[
https://issues.apache.org/jira/browse/FLINK-36812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18093139#comment-18093139
]
Chanhae Oh edited comment on FLINK-36812 at 7/2/26 6:14 AM:
------------------------------------------------------------
I came across this ticket and had some thoughts on the implementation approach.
The most robust solution for parallel reads without a numeric column would be
leveraging DB-native row identifiers (Oracle ROWID, PostgreSQL ctid, etc.), as
they are sequential and evenly distributed by nature.
However, this is limited to specific databases and requires per-Dialect
abstraction, making it difficult to support generically.
As a practical complement that works across all databases, one option could be
a `scan.partition.boundary-query` for VARCHAR columns — users provide a SQL
query returning ordered split points, and the system samples partition.num - 1
values to generate range predicates (col < v1, v1 <= col < v2, ...). Split
values would be used internally only, not exposed in results.
[~ouyangwuli] If you are planning to work on this, I'm curious whether this
aligns with your thinking or if you have a different approach in mind.
was (Author: JIRAUSER311960):
I came across this ticket and had some thoughts on the implementation approach.
The most robust solution for parallel reads without a numeric column would be
leveraging DB-native row identifiers (Oracle ROWID, PostgreSQL ctid, etc.), as
they are sequential and evenly distributed by nature. However, this is limited
to specific databases and requires per-Dialect abstraction, making it difficult
to support generically.
As a practical complement that works across all databases, one option could be
a `scan.partition.boundary-query` for VARCHAR columns — users provide a SQL
query returning ordered split points, and the system samples partition.num - 1
values to generate range predicates (col < v1, v1 <= col < v2, ...). Split
values would be used internally only, not exposed in results.
[~ouyangwuli] If you are planning to work on this, I'm curious whether this
aligns with your thinking or if you have a different approach in mind.
> The flink jdbc connector's 'scan.partition.column' supports the varchar field
> type.
> -----------------------------------------------------------------------------------
>
> Key: FLINK-36812
> URL: https://issues.apache.org/jira/browse/FLINK-36812
> Project: Flink
> Issue Type: Improvement
> Components: Connectors / JDBC
> Affects Versions: jdbc-3.2.0
> Reporter: ouyangwulin
> Priority: Major
> Fix For: jdbc-4.1.0
>
>
> The scan.partition.column must be a numeric, date, or timestamp column from
> the table in question. But in many cases, tables don't have numeric, date,
> or timestamp columns, and we need to support a varchar field to make
> concurrent ingest useful in a wider range of scenarios.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)