Re: jdbc connector configuration

Qihua Yang Tue, 12 Oct 2021 22:39:46 -0700

Hi,

If I configure batch mode, application will stop after the job is complete,
right? Then k8s will restart the pod and rerun the job. That is not what we
want.


Thanks,
Qihua

On Tue, Oct 12, 2021 at 7:27 PM Caizhi Weng <tsreape...@gmail.com> wrote:

> Hi!
>
> It seems that you want to run a batch job instead of a streaming job.
> Call EnvironmentSettings.newInstance().inBatchMode().build() to create your
> environment settings for a batch job.
>
> Qihua Yang <yang...@gmail.com> 于2021年10月13日周三 上午5:50写道：
>
>> Hi,
>>
>> Sorry for asking again. I plan to use JDBC connector to scan a database.
>> How do I know if it is done? Are there any metrics I can track? We want to
>> monitor the progress, stop flink application when it is done.
>>
>> Thanks,
>> Qihua
>>
>> On Fri, Oct 8, 2021 at 10:07 AM Qihua Yang <yang...@gmail.com> wrote:
>>
>>> It is pretty clear. Thanks Caizhi!
>>>
>>> On Thu, Oct 7, 2021 at 7:27 PM Caizhi Weng <tsreape...@gmail.com> wrote:
>>>
>>>> Hi!
>>>>
>>>> These configurations are not required to merely read from a database.
>>>> They are here to accelerate the reads by allowing sources to read data in
>>>> parallel.
>>>>
>>>> This optimization works by dividing the data into several
>>>> (scan.partition.num) partitions and each partition will be read by a task
>>>> slot (not a task manager, as a task manager may have multiple task slots).
>>>> You can set scan.partition.column to specify the partition key and also set
>>>> the lower and upper bounds for the range of data.
>>>>
>>>> Let's say your partition key is the column "k" which ranges from 0 to
>>>> 999. If you set the lower bound to 0, the upper bound to 999 and the number
>>>> of partitions to 10, then all data satisfying 0 <= k < 100 will be divided
>>>> into the first partition and read by the first task slot, all 100 <= k <
>>>> 200 will be divided into the second partition and read by the second task
>>>> slot and so on. So these configurations should have nothing to do with the
>>>> number of rows you have, but should be related to the range of your
>>>> partition key.
>>>>
>>>> Qihua Yang <yang...@gmail.com> 于2021年10月7日周四 上午7:43写道：
>>>>
>>>>> Hi,
>>>>>
>>>>> I am trying to read data from database with JDBC driver. From [1], I
>>>>> have to config below parameters. I am not quite sure if I understand it
>>>>> correctly. lower-bound is smallest value of the first partition,
>>>>> upper-bound is largest value of the last partition. For example, if the db
>>>>> table has 1000 rows. lower-bound is 0, upper-bound is 999. Is that 
>>>>> correct?
>>>>> If  setting scan.partition.num to 10, each partition read 100 row?
>>>>> if I set scan.partition.num to 10 and I have 10 task managers. Each
>>>>> task manager will pick a partition to read?
>>>>>
>>>>>    - scan.partition.column: The column name used for partitioning the
>>>>>    input.
>>>>>    - scan.partition.num: The number of partitions.
>>>>>    - scan.partition.lower-bound: The smallest value of the first
>>>>>    partition.
>>>>>    - scan.partition.upper-bound: The largest value of the last
>>>>>    partition.
>>>>>
>>>>> [1]
>>>>> https://nightlies.apache.org/flink/flink-docs-release-1.13/docs/connectors/table/jdbc/
>>>>>
>>>>> Thanks,
>>>>> Qihua
>>>>>
>>>>

Re: jdbc connector configuration

Reply via email to