Hi everyone,
Thanks for all the comments! I will initiate the vote tomorrow if there is
no further discussion.
Best,
Xia
Leonard Xu 于2023年11月24日周五 18:50写道:
> Thanks Xia and Zhu Zhu for driving this work,
>
> It will help unify the parallelism inference for all operators of batch
> job, the upd
Thanks Xia and Zhu Zhu for driving this work,
It will help unify the parallelism inference for all operators of batch job,
the updated FLIP looks good to me.
Best,
Leonard
> 2023年11月24日 下午5:53,Xia Sun 写道:
>
> Hi all,
> Offline discussed with Zhu Zhu and Leonard Xu and we have reached the
> f
Hi all,
Offline discussed with Zhu Zhu and Leonard Xu and we have reached the
following three points of consensus:
1. Rename the interface method Context#getMaxSourceParallelism proposed by
the FLIP to Context#getParallelismInferenceUpperBound, to make the meaning
of the method clearer. See [1] fo
Thanks Xia for the reply, sorry for the late reply.
> Thanks for pointing out the issue, the current wording does indeed seem to
> be confusing. It involves the existing implementation of the
> AdaptiveBatchScheduler, where the dynamically inferred parallelism cannot
> exceed the JobVertex's maxP
Thanks Leonard for the detailed feedback and input.
> The 'Max source parallelism’ is the information that runtime offered to
Source as a hint to infer the actual parallelism, a name with max prefix
but calculated > with minimum value confusing me a lot, especially when I
read the HiveSource pseud
Thanks Xia for the detailed reply.
>> `How user disable the parallelism inference if they want to use fixed source
>> parallelism?`
>> `Could you explain the priority the static parallelism set from table layer
>> and the proposed dynamic source parallelism?`
>
> From the user's perspective, if
Thanks Leonard for the feedback and sorry for my late response.
> `How user disable the parallelism inference if they want to use fixed
source parallelism?`
> `Could you explain the priority the static parallelism set from table
layer and the proposed dynamic source parallelism?`
>From the user'
Thanks Xia and Zhu Zhu for kickoff this discussion.
The dynamic source parallelism inference is a useful feature for batch story.
I’ve some comments about current design.
1.How user disable the parallelism inference if they want to use fixed source
parallelism? They can configure fixed paralle
Thanks Lijie for the comments!
1. For Hive source, dynamic parallelism inference in batch scenarios is a
superset of static parallelism inference. As a follow-up task, we can
consider changing the default value of
'table.exec.hive.infer-source-parallelism' to false.
2. I think that both dynamic pa
Hi Xia,
Thanks for driving this FLIP, +1 for the proposal.
I have 2 questions about the relationship between static inference and
dynamic inference:
1. AFAIK, currently the hive table source enable static inference by
default. In this case, which one (static vs dynamic) will take effect ? I
thin
Thanks for opening the FLIP and kicking off this discussion, Xia!
The proposed changes make up an important missing part of the dynamic
parallelism inference of adaptive batch scheduler.
Besides that, it is also one good step towards supporting dynamic
parallelism inference for streaming sources,
11 matches
Mail list logo