Hi, Sorry for asking again. I plan to use JDBC connector to scan a database. How do I know if it is done? Are there any metrics I can track? We want to monitor the progress, stop flink application when it is done.
Thanks, Qihua On Fri, Oct 8, 2021 at 10:07 AM Qihua Yang <yang...@gmail.com> wrote: > It is pretty clear. Thanks Caizhi! > > On Thu, Oct 7, 2021 at 7:27 PM Caizhi Weng <tsreape...@gmail.com> wrote: > >> Hi! >> >> These configurations are not required to merely read from a database. >> They are here to accelerate the reads by allowing sources to read data in >> parallel. >> >> This optimization works by dividing the data into several >> (scan.partition.num) partitions and each partition will be read by a task >> slot (not a task manager, as a task manager may have multiple task slots). >> You can set scan.partition.column to specify the partition key and also set >> the lower and upper bounds for the range of data. >> >> Let's say your partition key is the column "k" which ranges from 0 to >> 999. If you set the lower bound to 0, the upper bound to 999 and the number >> of partitions to 10, then all data satisfying 0 <= k < 100 will be divided >> into the first partition and read by the first task slot, all 100 <= k < >> 200 will be divided into the second partition and read by the second task >> slot and so on. So these configurations should have nothing to do with the >> number of rows you have, but should be related to the range of your >> partition key. >> >> Qihua Yang <yang...@gmail.com> 于2021年10月7日周四 上午7:43写道: >> >>> Hi, >>> >>> I am trying to read data from database with JDBC driver. From [1], I >>> have to config below parameters. I am not quite sure if I understand it >>> correctly. lower-bound is smallest value of the first partition, >>> upper-bound is largest value of the last partition. For example, if the db >>> table has 1000 rows. lower-bound is 0, upper-bound is 999. Is that correct? >>> If setting scan.partition.num to 10, each partition read 100 row? >>> if I set scan.partition.num to 10 and I have 10 task managers. Each task >>> manager will pick a partition to read? >>> >>> - scan.partition.column: The column name used for partitioning the >>> input. >>> - scan.partition.num: The number of partitions. >>> - scan.partition.lower-bound: The smallest value of the first >>> partition. >>> - scan.partition.upper-bound: The largest value of the last >>> partition. >>> >>> [1] >>> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/jdbc/ >>> >>> Thanks, >>> Qihua >>> >>