K8s should not restart a finished job. Are you seeing this? How did you configure the job?
On Wed, Oct 13, 2021 at 7:39 AM Qihua Yang <yang...@gmail.com> wrote: > Hi, > > If I configure batch mode, application will stop after the job is > complete, right? Then k8s will restart the pod and rerun the job. That is > not what we want. > > Thanks, > Qihua > > On Tue, Oct 12, 2021 at 7:27 PM Caizhi Weng <tsreape...@gmail.com> wrote: > >> Hi! >> >> It seems that you want to run a batch job instead of a streaming job. >> Call EnvironmentSettings.newInstance().inBatchMode().build() to create your >> environment settings for a batch job. >> >> Qihua Yang <yang...@gmail.com> 于2021年10月13日周三 上午5:50写道: >> >>> Hi, >>> >>> Sorry for asking again. I plan to use JDBC connector to scan a database. >>> How do I know if it is done? Are there any metrics I can track? We want to >>> monitor the progress, stop flink application when it is done. >>> >>> Thanks, >>> Qihua >>> >>> On Fri, Oct 8, 2021 at 10:07 AM Qihua Yang <yang...@gmail.com> wrote: >>> >>>> It is pretty clear. Thanks Caizhi! >>>> >>>> On Thu, Oct 7, 2021 at 7:27 PM Caizhi Weng <tsreape...@gmail.com> >>>> wrote: >>>> >>>>> Hi! >>>>> >>>>> These configurations are not required to merely read from a database. >>>>> They are here to accelerate the reads by allowing sources to read data in >>>>> parallel. >>>>> >>>>> This optimization works by dividing the data into several >>>>> (scan.partition.num) partitions and each partition will be read by a task >>>>> slot (not a task manager, as a task manager may have multiple task slots). >>>>> You can set scan.partition.column to specify the partition key and also >>>>> set >>>>> the lower and upper bounds for the range of data. >>>>> >>>>> Let's say your partition key is the column "k" which ranges from 0 to >>>>> 999. If you set the lower bound to 0, the upper bound to 999 and the >>>>> number >>>>> of partitions to 10, then all data satisfying 0 <= k < 100 will be divided >>>>> into the first partition and read by the first task slot, all 100 <= k < >>>>> 200 will be divided into the second partition and read by the second task >>>>> slot and so on. So these configurations should have nothing to do with the >>>>> number of rows you have, but should be related to the range of your >>>>> partition key. >>>>> >>>>> Qihua Yang <yang...@gmail.com> 于2021年10月7日周四 上午7:43写道: >>>>> >>>>>> Hi, >>>>>> >>>>>> I am trying to read data from database with JDBC driver. From [1], I >>>>>> have to config below parameters. I am not quite sure if I understand it >>>>>> correctly. lower-bound is smallest value of the first partition, >>>>>> upper-bound is largest value of the last partition. For example, if the >>>>>> db >>>>>> table has 1000 rows. lower-bound is 0, upper-bound is 999. Is that >>>>>> correct? >>>>>> If setting scan.partition.num to 10, each partition read 100 row? >>>>>> if I set scan.partition.num to 10 and I have 10 task managers. Each >>>>>> task manager will pick a partition to read? >>>>>> >>>>>> - scan.partition.column: The column name used for partitioning >>>>>> the input. >>>>>> - scan.partition.num: The number of partitions. >>>>>> - scan.partition.lower-bound: The smallest value of the first >>>>>> partition. >>>>>> - scan.partition.upper-bound: The largest value of the last >>>>>> partition. >>>>>> >>>>>> [1] >>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/jdbc/ >>>>>> >>>>>> Thanks, >>>>>> Qihua >>>>>> >>>>>