Re: [ANNOUNCE] New PMC member: Dian Fu

2020-08-27 Thread Yuan Mei
Congrats!

On Thu, Aug 27, 2020 at 5:38 PM Xingbo Huang  wrote:

> Congratulations Dian!
>
> Best,
> Xingbo
>
> jincheng sun  于2020年8月27日周四 下午5:24写道:
>
>> Hi all,
>>
>> On behalf of the Flink PMC, I'm happy to announce that Dian Fu is now
>> part of the Apache Flink Project Management Committee (PMC).
>>
>> Dian Fu has been very active on PyFlink component, working on various
>> important features, such as the Python UDF and Pandas integration, and
>> keeps checking and voting for our releases, and also has successfully
>> produced two releases(1.9.3&1.11.1) as RM, currently working as RM to push
>> forward the release of Flink 1.12.
>>
>> Please join me in congratulating Dian Fu for becoming a Flink PMC Member!
>>
>> Best,
>> Jincheng(on behalf of the Flink PMC)
>>
>


Re: How does at least once checkpointing work

2021-01-11 Thread Yuan Mei
Hey Rex,

You probably will find the link below helpful; it explains how
at-least-once (does not have alignment) is different
from exactly-once(needs alignment). It also explains how the
alignment phase is skipped in the at-least-once mode.

https://ci.apache.org/projects/flink/flink-docs-release-1.12/concepts/stateful-stream-processing.html#exactly-once-vs-at-least-once

In a high level, at least once mode for a task with multiple input channels
1. does NOT block processing to wait for barriers from all inputs, meaning
the task keeps processing data after receiving a barrier even if it has
multiple inputs.
2. but still, a task takes a snapshot after seeing the checkpoint barrier
from all input channels.

In this way, a Snapshot N may contain data change coming from Epoch N+1;
that's where "at least once" comes from.

On Tue, Jan 12, 2021 at 1:03 PM Rex Fenley  wrote:

> Hello,
>
> We're using the TableAPI and want to optimize for checkpoint alignment
> times. We received some advice to possibly use at-least-once. I'd like to
> understand how checkpointing works in at-least-once mode so I understand
> the caveats and can evaluate whether or not that will work for us.
>
> Thanks!
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com  |  BLOG   |
>  FOLLOW US   |  LIKE US
> 
>


Re: How does at least once checkpointing work

2021-01-12 Thread Yuan Mei
>
>
> It sounds like any state which does not have some form of uniqueness could
> end up being incorrect.
>
> at least once usually works if the use case can tolerate a certain level
of duplication or the computation is idempotent.


> Specifically in my case, all rows passing through the execution graph have
> unique ids. However, any operator from groupby foreign_key then sum/count
> could end up with an inconsistent count. Normally a retract (-1) and then
> insert (+1) would keep the count correct, but with "at least once" a
> retract (-1) may be from epoch n+1 and therefore played twice, making the
> count equal less than it should actually be.
>
>
Not completely sure how the "retract (-1)" and "insert (+1)" work in your
case, but "input data" that leads to a state change (count/sum change) is
possible to be played twice after a recovery.


> Am I understanding this correctly?
>
> Thanks!
>
> On Mon, Jan 11, 2021 at 10:06 PM Yuan Mei  wrote:
>
>> Hey Rex,
>>
>> You probably will find the link below helpful; it explains how
>> at-least-once (does not have alignment) is different
>> from exactly-once(needs alignment). It also explains how the
>> alignment phase is skipped in the at-least-once mode.
>>
>>
>> https://ci.apache.org/projects/flink/flink-docs-release-1.12/concepts/stateful-stream-processing.html#exactly-once-vs-at-least-once
>>
>> In a high level, at least once mode for a task with multiple input
>> channels
>> 1. does NOT block processing to wait for barriers from all inputs,
>> meaning the task keeps processing data after receiving a barrier even if it
>> has multiple inputs.
>> 2. but still, a task takes a snapshot after seeing the checkpoint barrier
>> from all input channels.
>>
>> In this way, a Snapshot N may contain data change coming from Epoch N+1;
>> that's where "at least once" comes from.
>>
>> On Tue, Jan 12, 2021 at 1:03 PM Rex Fenley  wrote:
>>
>>> Hello,
>>>
>>> We're using the TableAPI and want to optimize for checkpoint alignment
>>> times. We received some advice to possibly use at-least-once. I'd like to
>>> understand how checkpointing works in at-least-once mode so I understand
>>> the caveats and can evaluate whether or not that will work for us.
>>>
>>> Thanks!
>>> --
>>>
>>> Rex Fenley  |  Software Engineer - Mobile and Backend
>>>
>>>
>>> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>
>>>  |  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
>>> <https://www.facebook.com/remindhq>
>>>
>>
>
> --
>
> Rex Fenley  |  Software Engineer - Mobile and Backend
>
>
> Remind.com <https://www.remind.com/> |  BLOG <http://blog.remind.com/>  |
>  FOLLOW US <https://twitter.com/remindhq>  |  LIKE US
> <https://www.facebook.com/remindhq>
>


Re: [ANNOUNCE] Yu Li is now part of the Flink PMC

2020-06-16 Thread Yuan Mei
Congrats, Yu!

GXGX & well deserved!!

Best Regards,

Yuan

On Wed, Jun 17, 2020 at 9:15 AM jincheng sun 
wrote:

> Hi all,
>
> On behalf of the Flink PMC, I'm happy to announce that Yu Li is now
> part of the Apache Flink Project Management Committee (PMC).
>
> Yu Li has been very active on Flink's Statebackend component, working on
> various improvements, for example the RocksDB memory management for 1.10.
> and keeps checking and voting for our releases, and also has successfully
> produced two releases(1.10.0&1.10.1) as RM.
>
> Congratulations & Welcome Yu Li!
>
> Best,
> Jincheng (on behalf of the Flink PMC)
>


Re: how to propagate watermarks across multiple jobs

2021-03-04 Thread Yuan Mei
Hey Yidan,

KafkaShuffle is initially motivated to support shuffle data materialization
on Kafka, and started with a limited version supporting hash-partition
only. Watermark is maintained and forwarded as part of shuffle data. So you
are right, watermark storing/forwarding logic has nothing to do with
whether the stream is keyed or not. The current approach in KafkaShuffle
should also work for non-keyed streams if I remember correclty. So, yes,
the logic can be extracted and generalized.

Best,

Yuan

On Thu, Mar 4, 2021 at 4:26 PM yidan zhao  wrote:

> One more question, If I only need watermark's logic, not keyedStream, why
> not provide methods such as writeDataStream and readDataStream. It uses the
> similar methods for kafka producer sink records and broadcast watermark to
> partitions and then kafka consumers read it and regenerate the watermark. I
> think it will be more general? In this way, the kafka consumer reads the
> stream from kafka, and can continue to call keyBy to get a keyedStream. I
> don't know why KafkaShuffle only considers the 'keyedStream' case.
>
> Piotr Nowojski  于2021年3月4日周四 下午3:54写道:
>
>> Great :)
>>
>> Just one more note. Currently FlinkKafkaShuffle has a critical bug [1]
>> that probably will prevent you from using it directly. I hope it will be
>> fixed in some next release. In the meantime you can just inspire your
>> solution with the source code.
>>
>> Best,
>> Piotrek
>>
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-21317
>>
>> czw., 4 mar 2021 o 03:48 yidan zhao  napisał(a):
>>
>>> Yes, you are right and thank you. I take a brief look at what
>>> FlinkKafkaShuffle is doing, it seems what I need and I will have a try.
>>>



Re: [ANNOUNCE] Apache Flink 1.12.2 released

2021-03-05 Thread Yuan Mei
Cheers!

Thanks, Roman, for doing the most time-consuming and difficult part of the
release!

Best,

Yuan

On Fri, Mar 5, 2021 at 5:41 PM Roman Khachatryan  wrote:

> The Apache Flink community is very happy to announce the release of Apache
> Flink 1.12.2, which is the second bugfix release for the Apache Flink 1.12
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the improvements
> for this bugfix release:
> https://flink.apache.org/news/2021/03/03/release-1.12.2.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?version=12349502&projectId=12315522
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Special thanks to Yuan Mei for managing the release and PMC members Robert
> Metzger, Chesnay Schepler and Piotr Nowojski.
>
> Regards,
> Roman
>


Re: [ANNOUNCE] Dian Fu becomes a Flink committer

2020-01-16 Thread Yuan Mei
Congrats!

Best

Yuan

On Thu, Jan 16, 2020 at 5:59 PM jincheng sun 
wrote:

> Hi everyone,
>
> I'm very happy to announce that Dian accepted the offer of the Flink PMC to
> become a committer of the Flink project.
>
> Dian Fu has been contributing to Flink for many years. Dian Fu played an
> essential role in PyFlink/CEP/SQL/Table API modules. Dian Fu has
> contributed several major features, reported and fixed many bugs, spent a
> lot of time reviewing pull requests and also frequently helping out on the
> user mailing lists and check/vote the release.
>
> Please join in me congratulating Dian for becoming a Flink committer !
>
> Best,
> Jincheng(on behalf of the Flink PMC)
>


Re: [ANNOUNCE] Welcome to join the Apache Flink community on Slack

2022-06-03 Thread Yuan Mei
Thanks, Xintong and Jark the great effort driving this, and everyone for
making this possible.

I've also Twittered this announcement on our Apache Flink Twitter account.

Best

Yuan



On Fri, Jun 3, 2022 at 12:54 AM Jing Ge  wrote:

> Thanks everyone for your effort!
>
> Best regards,
> Jing
>
> On Thu, Jun 2, 2022 at 4:17 PM Martijn Visser 
> wrote:
>
>> Thanks everyone for joining! It's good to see so many have joined in such
>> a short time already. I've just refreshed the link which you can always
>> find on the project website [1]
>>
>> Best regards, Martijn
>>
>> [1] https://flink.apache.org/community.html#slack
>>
>> Op do 2 jun. 2022 om 11:42 schreef Jingsong Li :
>>
>>> Thanks Xingtong, Jark, Martijn and Robert for making this possible!
>>>
>>> Best,
>>> Jingsong
>>>
>>>
>>> On Thu, Jun 2, 2022 at 5:32 PM Jark Wu  wrote:
>>>
 Thank Xingtong for making this possible!

 Cheers,
 Jark Wu

 On Thu, 2 Jun 2022 at 15:31, Xintong Song 
 wrote:

 > Hi everyone,
 >
 > I'm very happy to announce that the Apache Flink community has
 created a
 > dedicated Slack workspace [1]. Welcome to join us on Slack.
 >
 > ## Join the Slack workspace
 >
 > You can join the Slack workspace by either of the following two ways:
 > 1. Click the invitation link posted on the project website [2].
 > 2. Ask anyone who already joined the Slack workspace to invite you.
 >
 > We recommend 2), if available. Due to Slack limitations, the
 invitation
 > link in 1) expires and needs manual updates after every 100 invites.
 If it
 > is expired, please reach out to the dev / user mailing lists.
 >
 > ## Community rules
 >
 > When using the community Slack workspace, please follow these
 community
 > rules:
 > * *Be respectful* - This is the most important rule!
 > * All important decisions and conclusions *must be reflected back to
 the
 > mailing lists*. "If it didn’t happen on a mailing list, it didn’t
 happen."
 > - The Apache Mottos [3]
 > * Use *Slack threads* to keep parallel conversations from
 overwhelming a
 > channel.
 > * Please *do not direct message* people for troubleshooting, Jira
 assigning
 > and PR review. These should be picked-up voluntarily.
 >
 >
 > ## Maintenance
 >
 >
 > Committers can refer to this wiki page [4] for information needed for
 > maintaining the Slack workspace.
 >
 >
 > Thanks Jark, Martijn and Robert for helping setting up the Slack
 workspace.
 >
 >
 > Best,
 >
 > Xintong
 >
 >
 > [1] https://apache-flink.slack.com/
 >
 > [2] https://flink.apache.org/community.html#slack
 >
 > [3] http://theapacheway.com/on-list/
 >
 > [4]
 https://cwiki.apache.org/confluence/display/FLINK/Slack+Management
 >

>>>


Re: how to connect to the flink-state store and use it as cache to serve APIs.

2022-06-29 Thread Yuan Mei
That's definitely something we want to achieve in the future term, and your
input is very valuable.

One problem with the current queryable state setup is that the service is
bounded to the life cycle of Flink Job, which limits the usage of the state
store/service.

Thanks for your insights.

Best
Yuan

On Wed, Jun 29, 2022 at 3:41 PM laxmi narayan  wrote:

>
> Hi Hangxiang,
>
> I was thinking , since we already store entire state in the checkpoint dir
> so why can't we expose it as a service through the Flink queryable state,
> in this way I can easily avoid introducing a cache and serve realtime APIs
> via this state itself and I can go to the database for the historical data.
>
>
>
> Thank you.
>
>
> On Wed, Jun 29, 2022 at 11:17 AM Hangxiang Yu  wrote:
>
>> Hi, laxmi.
>> There are two ways that users can access the state store currently:
>> 1. Queryable state [1] which you could access states in runtime.
>> 2. State Processor API [2] which you could access states (snapshot)
>> offline.
>>
>> But we have marked the Queryable state as "Reaching End-of-Life".
>> We are also trying to find a graceful and effective way for users to
>> debug and troubleshoot.
>> So could you share your case about what you want to use it for ?
>> Your feedback is important for us to design it in the long term. Thanks!
>>
>> [1]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/fault-tolerance/queryable_state/
>> [2]
>> https://nightlies.apache.org/flink/flink-docs-master/docs/libs/state_processor_api/
>> [3] https://flink.apache.org/roadmap.html
>>
>> Best,
>> Hangxiang.
>>
>> On Tue, Jun 28, 2022 at 8:26 PM laxmi narayan 
>> wrote:
>>
>>> Hi Team,
>>> I am not sure if this is the right use case for the state-store but I
>>> wanted to serve the APIs using queryable-state, what are the different ways
>>> to achieve this ?
>>>
>>> I have come across a version where we can use Job_Id to connect to the
>>> state, but is there any other way to expose a specific rest-endpoint etc ?
>>>
>>> Any sample example/github link would be nice.
>>>
>>>
>>>
>>> Thank you.
>>>
>>


Re: Where will the state be stored in the taskmanager when using rocksdbstatebend?

2022-09-07 Thread Yuan Mei
Hey Hjw,

Under the current Flink architecture (i.e., task states are stored locally
and periodically uploaded to remote durable storage during checkpointing),
there is no other way rather than scaling out your application to solve the
problem. This is equivalent to making the state size in each task smaller
so that it can fit into a single container.

We have seen similar issues from other users/customers, and have plans to
solve this problem in a more fundamental way to support remote states as
well (when the local quota is used up, the state can also directly writes
remotely).

For now, I would suggest increasing the parallelism of your job to solve
this problem.

Best
Yuan

On Tue, Sep 6, 2022 at 7:59 PM Alexander Fedulov 
wrote:

> Well, in that case, it is similar to the situation of hitting the limits
> of vertical scaling - you'll have to scale out horizontally.
> You could consider sizing down the number of CPU and RAM you allocate to
> each task manager, but instead increase their count (and your job's
> parallelism).
> It might come with its own downsides, so measure as you go. This might
> also be problematic if you have significant key skew for some of your key
> ranges.
>
> Best,
> Alex
>
> On Tue, Sep 6, 2022 at 8:09 AM hjw <1010445...@qq.com> wrote:
>
>> Hi,Alexander
>>
>> When Flink job deployed on Native k8s, taskmanager is a Pod.The data
>> directory size of a single container is limited in our company.Are there
>> any idea to deal with this ?
>>
>> --
>> Best,
>> Hjw
>>
>>
>>
>> -- 原始邮件 --
>> *发件人:* "Alexander Fedulov" ;
>> *发送时间:* 2022年9月6日(星期二) 凌晨3:19
>> *收件人:* "hjw"<1010445...@qq.com>;
>> *抄送:* "user";
>> *主题:* Re: Where will the state be stored in the taskmanager when using
>> rocksdbstatebend?
>>
>>
>> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/deployment/config/#state-backend-rocksdb-localdir
>> Make sure to use a local SSD disk (not NFS/EBS).
>>
>> Best,
>> Alexander Fedulov
>>
>> On Mon, Sep 5, 2022 at 7:24 PM hjw <1010445...@qq.com> wrote:
>>
>>> The EmbeddedRocksDBStateBackend holds in-flight data in a RocksDB
>>>  database that is (per default) stored in the
>>> TaskManager local data directories.
>>> Which path does local data directories store RocksDB database in
>>> TaskManager point to in operating system?
>>> If the job state is very large, I think I should  take some measures to
>>> deal with it.(mount a volume for local data directories store RocksDB
>>> database etc...)
>>>
>>> thx.
>>>
>>> --
>>> Best,
>>> Hjw
>>>
>>


Re: Limiting backpressure during checkpoints

2022-10-19 Thread Yuan Mei
Hey Robin,

Thanks for sharing the detailed information.  May I ask, when you are
saying "CPU usage is around 80% when checkpoints aren't running, and capped
at 100% when they are", do you see zigzag patterns of CPU usage, or is it
kept capped at 100% of CPU?

I think one possibility is that the sync phase of cp (the writebuffer flush
during the sync phase) triggers a rocksdb compaction, and we saw this
happens on Ververica services as well.

At this moment, maybe you can try to make the checkpoint less frequent
(increase the checkpoint interval) to reduce the frequency of compaction.
Please let me know whether this helps.

In long term, I think we probably need to separate the compaction process
from the internal db and control/schedule the compaction process ourselves
(compaction takes a good amount of CPU and reduces TPS).

Best.
Yuan



On Thu, Oct 13, 2022 at 11:39 PM Robin Cassan via user <
user@flink.apache.org> wrote:

> Hello all, hope you're well :)
> We are attempting to build a Flink job with minimal and stable latency (as
> much as possible) that consumes data from Kafka. Currently our main
> limitation happens when our job checkpoints the RocksDB state: backpressure
> is applied on the stream, causing latency. I am wondering if there are ways
> to configure Flink so that the checkpointing process affects the flow of
> data as little as possible?
>
> In our case, backpressure seems to arise from CPU consumption, because:
> - CPU usage is around 80% when checkpoints aren't running, and capped at
> 100% when they are
> - checkpoint alignment time is very low, using unaligned checkpoints
> doesn't appear to help with backpressure
> - network (async) part of the checkpoint should in theory not cause
> backpressure since resources would be used for the main stream during async
> waits, but I might be wrong
>
> What we would really like to achieve is isolating the compute resource
> used for checkpointing from the ones used for task slots. Which would of
> course mean that we need to oversize our cluster for having resources
> available for checkpointing even when it's not running, but also that we
> would get longer checkpoints compared to today where checkpoints seem to
> use CPU cores attributed to task slots. We are ok with that to some degree,
> but we don't know how to achieve this isolation. Do you have any clue?
>
> Lastly, we currently have nodes with 8 cores but allocate 6 task slots,
> and we have set the following settings:
>
> state.backend.rocksdb.thread.num: 6
> state.backend.rocksdb.writebuffer.count: 6
>
>
> Thanks all for your help!
>


Re: [ANNOUNCE] FRocksDB 6.20.3-ververica-2.0 released

2023-01-30 Thread Yuan Mei
Thanks Yanfei for driving the release!

Best
Yuan

On Mon, Jan 30, 2023 at 8:46 PM Jing Ge via user 
wrote:

> Hi Yanfei,
>
> Thanks for your effort. Looking forward to checking it.
>
> Best regards,
> Jing
>
> On Mon, Jan 30, 2023 at 1:42 PM Yanfei Lei  wrote:
>
>> It is very happy to announce the release of FRocksDB 6.20.3-ververica-2.0.
>>
>> Compiled files for Linux x86, Linux arm, Linux ppc64le, MacOS x86,
>> MacOS arm, and Windows are included in FRocksDB 6.20.3-ververica-2.0
>> jar, and the FRocksDB in Flink 1.17 would be updated to
>> 6.20.3-ververica-2.0.
>>
>> Release highlights:
>> - [FLINK-30457] Add periodic_compaction_seconds option to RocksJava[1].
>> - [FLINK-30321] Upgrade ZLIB of FRocksDB to 1.2.13[2].
>> - Avoid expensive ToString() call when not in debug[3].
>> - [FLINK-24932] Support build FRocksDB Java on Apple silicon[4].
>>
>> Maven artifacts for FRocksDB can be found at:
>> https://mvnrepository.com/artifact/com.ververica/frocksdbjni
>>
>> We would like to thank all efforts from the Apache Flink community
>> that made this release possible!
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-30457
>> [2] https://issues.apache.org/jira/browse/FLINK-30321
>> [3] https://github.com/ververica/frocksdb/pull/55
>> [4] https://issues.apache.org/jira/browse/FLINK-24932
>>
>> Best regards,
>> Yanfei
>> Ververica(Alibaba)
>>
>


Re: [ANNOUNCE] Apache Flink has won the 2023 SIGMOD Systems Award

2023-07-06 Thread Yuan Mei
Congrats everyone :-)

Best
Yuan

On Fri, Jul 7, 2023 at 11:29 AM Hang Ruan  wrote:

> Hi, Leonard.
>
> I would like to help to add this page. Please assign this issue to me.
> Thanks.
>
> Best,
> Hang
>
> Leonard Xu  于2023年7月7日周五 11:26写道:
>
>> Congrats to all !
>>
>> It will be helpful to promote Apache Flink if we can add a page to our
>> website like others[2]. I’ve created an issue to improve this.
>>
>>
>> Best,
>> Leonard
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-32555
>> [2] https://spark.apache.org/news/sigmod-system-award.html
>>
>


Re: [ANNOUNCE] Apache Flink 2.0.0 released

2025-03-24 Thread Yuan Mei
Thanks for driving this & Congrats!

Best
Yuan

On Mon, Mar 24, 2025 at 5:38 PM Leonard Xu  wrote:

> Congratulations!
>
> Thanks Xintong, Jark, Jiangjie and Martijn for the release management and
> all evolved.
>
> Best,
> Leonard
>
>
>
> > 2025年3月24日 16:24,Xintong Song  写道:
> >
> > The Apache Flink community is very happy to announce the release of
> Apache
> > Flink 2.0.0, which is the first formal release for the Apache Flink 2.x
> > series.
> >
> > Apache Flink® is an open-source stream processing framework for
> > distributed, high-performing, always-available, and accurate data
> streaming
> > applications.
> >
> > The release is available for download at:
> > https://flink.apache.org/downloads.html
> >
> > Please check out the release blog post for an overview of the
> improvements
> > for this release:
> >
> https://flink.apache.org/2025/03/24/apache-flink-2.0.0-a-new-era-of-real-time-data-processing/
> >
> > The full release notes are available in Jira:
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12335344
> >
> > We would like to thank all contributors of the Apache Flink community who
> > made this release possible!
> >
> > Best,
> >
> > Xintong
>
>