Re: [blog article] Howto create a batch source with the new Source framework

yuxia Thu, 30 Mar 2023 19:29:03 -0700

Hi, Etienne.

Thanks for Etienne for sharing this article. I really like it and learn much 
from it.


I'd like to raise some questions about implementing batch source. Welcome devs 
to share insights about them.

The first question is how to generate splits:
As the article mentioned:
"Whenever possible, it is preferable to generate the splits lazily, meaning 
that each time a reader asks the enumerator for a split, the enumerator 
generates one on demand and assigns it to the reader. "
I think it maybe not for all cases. In some cases, generating split may be time 
counsuming, then it may be better to generate a batch of splits on demand to 
amortize the expense.
But it then raises another question, how many splits should be generated in a 
batch, too many maywell cause OOM, too less may not make good use of batch 
generating splits.
To solve it, I think maybe we can provide a configuration to make user to 
configure how many splits should be generated in a batch.
What's your opinion on it. Have you ever encountered this problem in your 
implementation?


The second question is how to assign splits:
What's your split assign stratgy? In flink, we provide 
`LocalityAwareSplitAssigner` to make use of locality to assign split to reader. 
But it may not perfert for the case of failover
for which we intend to introduce another split assign strategy[1].
But I do think it should be configurable to enable advanced user to decide 
which assign stratgy to use.


Welcome other devs to share opinion.

[1]: https://issues.apache.org/jira/browse/FLINK-31065





Also as for split assigner . 


Best regards,
Yuxia

----- 原始邮件 -----
发件人: "Etienne Chauchot" <echauc...@apache.org>
收件人: "dev" <dev@flink.apache.org>
抄送: "Chesnay Schepler" <ches...@apache.org>
发送时间: 星期四, 2023年 3 月 30日 下午 10:36:39
主题: [blog article] Howto create a batch source with the new Source framework

Hi all,

After creating the Cassandra source connector (thanks Chesnay for the 
review!), I wrote a blog article about how to create a batch source with 
the new Source framework [1]. It gives field feedback on how to 
implement the different components.

I felt it could be useful to people interested in contributing or 
migrating connectors.

=> Can you give me your opinion ?

=> I think it could be useful to post the article to Flink official blog 
also if you agree.

=> Same remark on my previous article [2]: what about publishing it to 
Flink official blog ?


[1]https://echauchot.blogspot.com/2023/03/flink-howto-create-batch-source-with.html

[2]https://echauchot.blogspot.com/2022/11/flink-howto-migrate-real-life-batch.html
 


Best

Etienne

Re: [blog article] Howto create a batch source with the new Source framework

Reply via email to