Hi! These configurations are not required to merely read from a database. They are here to accelerate the reads by allowing sources to read data in parallel.
This optimization works by dividing the data into several (scan.partition.num) partitions and each partition will be read by a task slot (not a task manager, as a task manager may have multiple task slots). You can set scan.partition.column to specify the partition key and also set the lower and upper bounds for the range of data. Let's say your partition key is the column "k" which ranges from 0 to 999. If you set the lower bound to 0, the upper bound to 999 and the number of partitions to 10, then all data satisfying 0 <= k < 100 will be divided into the first partition and read by the first task slot, all 100 <= k < 200 will be divided into the second partition and read by the second task slot and so on. So these configurations should have nothing to do with the number of rows you have, but should be related to the range of your partition key. Qihua Yang <yang...@gmail.com> 于2021年10月7日周四 上午7:43写道: > Hi, > > I am trying to read data from database with JDBC driver. From [1], I have > to config below parameters. I am not quite sure if I understand it > correctly. lower-bound is smallest value of the first partition, > upper-bound is largest value of the last partition. For example, if the db > table has 1000 rows. lower-bound is 0, upper-bound is 999. Is that correct? > If setting scan.partition.num to 10, each partition read 100 row? > if I set scan.partition.num to 10 and I have 10 task managers. Each task > manager will pick a partition to read? > > - scan.partition.column: The column name used for partitioning the > input. > - scan.partition.num: The number of partitions. > - scan.partition.lower-bound: The smallest value of the first > partition. > - scan.partition.upper-bound: The largest value of the last partition. > > [1] > https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/jdbc/ > > Thanks, > Qihua >