+1 on what Zhu Zhu said.
We also override the default to 10 s.
On Fri, Aug 30, 2019 at 8:58 PM Zhu Zhu wrote:
> In our production, we usually override the restart delay to be 10 s.
> We once encountered cases that external services are overwhelmed by
> reconnections from frequent restarted task
atter what decision we make this time, I'd
>> suggest to make it final and document in our release note explicitly.
>> Checking the 1.5.0 release note [1] [2] it seems we didn't mention about
>> the change on default restart delay and we'd better learn from it thi
Congra, Alex! Well deserved!
On Wed, Jan 3, 2024 at 2:31 AM David Radley wrote:
> Sorry for my typo.
>
> Many congratulations Alex!
>
> From: David Radley
> Date: Wednesday, 3 January 2024 at 10:23
> To: David Anderson
> Cc: dev@flink.apache.org
> Subject: Re: [EXTERNAL] [ANNOUNCE] New Apache
+1 on allowing user defined resourceId for taskmanager
On Sun, Mar 29, 2020 at 7:24 PM Yang Wang wrote:
> Hi Konstantin,
>
> I think it is a good idea. Currently, our users also report a similar issue
> with
> resourceId of standalone cluster. When we start a standalone cluster now,
> the `TaskM
We do use config like "restart-strategy:
org.foobar.MyRestartStrategyFactoryFactory". Mainly to add additional
metrics than the Flink provided ones.
On Thu, Sep 19, 2019 at 4:50 AM Zhu Zhu wrote:
> Thanks everyone for the input.
>
> The RestartStrategy customization is not recognized as a public
etrics you add in you
> customized restart strategy?
>
> Thanks,
> Zhu Zhu
>
> Steven Wu 于2019年9月20日周五 上午7:11写道:
>
>> We do use config like "restart-strategy:
>> org.foobar.MyRestartStrategyFactoryFactory". Mainly to add additional
>> metrics than the
tarted. If grained recovery (feature added 1.9.0) is enabled, the graph
> would not be restarted when task failures happen and the "fullRestart"
> value will not increment in such cases.
>
> I'd appreciate if you can help with these questions and we can make better
>
t respects
> fine grained recovery.
>
> [1] https://issues.apache.org/jira/browse/FLINK-14164
>
> Thanks,
> Zhu Zhu
>
> Steven Wu 于2019年9月24日周二 上午6:41写道:
>
>>
>> When we setup alert like "fullRestarts > 1" for some rolling window, we
>> want t
ra/browse/FLINK-14164
>
> Thanks,
> Zhu Zhu
>
> Steven Wu 于2019年9月25日周三 上午2:30写道:
>
>> Zhu Zhu,
>>
>> Sorry, I was using different terminology. yes, Flink meter is what I was
>> talking about regarding "fullRestarts" for threshold based aler
Congratulations, Becket!
On Wed, Oct 30, 2019 at 9:51 PM Shaoxuan Wang wrote:
> Congratulations, Becket!
>
> On Mon, Oct 28, 2019 at 6:08 PM Fabian Hueske wrote:
>
> > Hi everyone,
> >
> > I'm happy to announce that Becket Qin has joined the Flink PMC.
> > Let's congratulate and welcome Becket
Gary, FLIP-27 seems to get omitted in the 2nd update. below is the info
from update #1.
- FLIP-27: Refactor Source Interface [20]
- FLIP accepted. Implementation is in progress.
On Fri, Nov 1, 2019 at 7:01 AM Gary Yao wrote:
> Hi community,
>
> Because we have approximately one month of
Becket,
Regarding "UNBOUNDED source that stops at some point", I found it difficult
to grasp what UNBOUNDED really mean.
If we want to use Kafka source with an end/stop time, I guess you call it
UNBOUNDED kafka source that stops (aka BOUNDED-streaming). The
terminology is a little confusing to me
data as
> it
> >> comes.
> >> *
> >> * The source may run forever (until the program is terminated)
> or
> >> might actually end at some point,
> >> * based on some source-specific conditions. Because that is not
> >> transparen
I filed a small issue regarding readability for memory configurations. It
is not a blocking issue. I already attached a PR.
https://issues.apache.org/jira/browse/FLINK-15846
On Fri, Jan 31, 2020 at 9:20 PM Thomas Weise wrote:
> As part of testing the RC, I run into the following issue with a tes
Can someone grant me the contributor permission? we are thinking about
contributing to FLINK-9061. My jira id is "stevenz3wu"
Thanks,
Steven
; >>
> >> On Thu, Apr 26, 2018 at 4:40 PM, TechnoMage
> wrote:
> >>
> >>> Contributor permission is only granted after siginificant
> contributions.
> >>> The route to contributing is to use pull requests and have the commits
> >>> rev
Please prioritize a proper long-term fix for this issue. it is a big
scalability issue for high-parallelism job (e.g. over 1,000).
FLINK-10122 KafkaConsumer should use partitionable state over union state
if partition discovery is not active
On Fri, Sep 28, 2018 at 7:20 AM Till Rohrmann wrote:
> And each split has its own (internal) thread for reading from Kafka and
putting messages in an internal queue to pull from. This is similar to how
the current Kafka source is implemented, which has a separate fetcher
thread.
Aljoscha, in kafka case, one split may contain multiple kafka partitio
This proposal mentioned that SplitEnumerator might run on the JobManager or
in a single task on a TaskManager.
if enumerator is a single task on a taskmanager, then the job DAG can never
been embarrassingly parallel anymore. That will nullify the leverage of
fine-grained recovery for embarrassingl
this is an awesome feature.
> The name "Savepoint Connector" might indeed be not that good, as it
doesn't
point out the fact that with the current design, all kinds of snapshots
(savepoint / full or incremental checkpoints) can be read.
@Gordon can you add the above clarification to the FLIP page
long and sometimes unstable build is definitely a pain point.
I suspect the build failure here in flink-connector-kafka is not related to
my change. but there is no easy re-run the build on travis UI. Google
search showed a trick of close-and-open the PR will trigger rebuild. but
that could add no
Guowei,
Thanks a lot for the proposal and starting the discussion thread. Very
excited.
For the big question of "Is the sink an operator or a topology?", I have a
few related sub questions.
* Where should we run the committers?
* Is the committer parallel or single parallelism?
* Can a single cho
s by using a fixed pool
> > of transactional IDs and then cancelling all outstanding transactions
> > for the IDs when we restore from a savepoint. In order for this to work
> > we need to recycle the IDs, so there needs to be a back-channel from the
> > Committer to t
Right now, we use UnionState to store the `nextCheckpointId` in the Iceberg
sink use case, because we can't retrieve the checkpointId from
the FunctionInitializationContext during the restore case. But we can move
away from it if the restore context provides the checkpointId.
On Sat, Sep 12, 2020
llect method as well. So far we needed a single method commit(...) and
> > the bookkeeping of the committables could be handled by the framework. I
> > think something like an optional combiner in the GlobalCommitter would
> > be enough. What do you think?
> >
> > GlobalCommitter
stFile" (for the
> > collected thousands data files) as StateT. This allows us to absorb
> > extended commit outages without losing written/uploaded data files, as
> > operator state size is as small as one manifest file per checkpoint cycle
> > [2].
> > ---
heckpointId in API, as long as the internal bookkeeping
groups data files by checkpoints for streaming mode.
On Tue, Sep 15, 2020 at 6:58 AM Steven Wu wrote:
> > images don't make it through to the mailing lists. You would need to
> host the file somewhere and send a link.
>
>
iles by checkpoints for streaming >>mode.
>> >
>> > I think this problem(How to dedupe the combined committed data) also
>> > depends on where to place the agg/combine logic .
>> >
>> > 1. If the agg/combine takes place in the “commit” maybe we need to
Guowei
Just to add to what Aljoscha said regarding the unique id. Iceberg sink
checkpoints the unique id into state during snapshot. It also inserts the
unique id into the Iceberg snapshot metadata during commit. When a job
restores the state after failure, it needs to know if the restored
transac
Aljoscha,
> Instead the sink would have to check for each set of committables
seperately if they had already been committed. Do you think this is
feasible?
Yes, that is how it works in our internal implementation [1]. We don't use
checkpointId. We generate a manifest file (GlobalCommT) to bundle
s for users.
>
>
> A possible alternative option would be let the user build the topology
> himself. But considering we have two execution modes we could only use
> `Writer` and `Committer` to build the sink topology.
>
> ### Build Topology Option
>
> Sink {
>
anks :)
>>
>>
>> >> Is this called when restored from checkpoint/savepoint?
>>
>> Yes.
>>
>>
>> >>Iceberg sink needs to do a dup check here on which GlobalCommT were
>> committed and which weren't. Should it return the filtered/
an AggregateFunction instead of the simple merge() function.
>
> What do you think?
>
> Best,
> Aljoscha
>
> On 21.09.20 10:06, Piotr Nowojski wrote:
> > Hi Guowei,
> >
> >> I believe that we could support such an async sink writer
> >> very
It is fine to leave the CommitResult/RETRY outside the scope of framework.
Then the framework might need to provide some hooks in the
checkpoint/restore logic. because the commit happened in the post
checkpoint completion step, sink needs to update the internal state when
the commit is successful s
recoveredCommittables(List) ;
}
The most important need from the framework is to run GlobalCommitter in the
jobmanager. It involves the topology creation, checkpoint handling,
serializing the executions of commit() calls etc.
Thanks,
Steven
On Tue, Sep 22, 2020 at 6:39 AM Steven Wu wrote:
> It is fine
Guowei,
Thanks a lot for updating the wiki page. It looks great.
I noticed one inconsistency in the wiki with your last summary email for
GlobalCommitter interface. I think the version in the summary email is the
intended one, because rollover from previous failed commits can accumulate
a list.
C
ly.
> >> 1. The frame can not know which `GlobalCommT` to retry if we use the
> >> List as parameter when the `commit` returns `RETRY`.
> >> 2. Of course we can let the `commit` return more detailed info but it
> >> might be too complicated.
> >> 3. On the oth
sink
implementations.
Thanks,
Steven
On Fri, Sep 25, 2020 at 5:56 AM Steven Wu wrote:
> > 1. The frame can not know which `GlobalCommT` to retry if we use the
> > List as parameter when the `commit` returns `RETRY`.
> > 2. Of course we can let the `commit` return more detailed i
t the
> beginning in 1.12
>
> What do you think?
>
> Best,
> Guowei
>
>
> On Fri, Sep 25, 2020 at 9:30 PM Steven Wu wrote:
>
> > I should clarify my last email a little more.
> >
> > For the example of commits for checkpoints 1-100 failed, the job
+1 (non-binding)
Although I would love to continue the discussion for tweaking the
CommitResult/GlobaCommitter interface maybe during the implementation phase.
On Fri, Sep 25, 2020 at 5:35 AM Aljoscha Krettek
wrote:
> +1 (binding)
>
> Aljoscha
>
> On 25.09.20 14:26, Guowei Ma wrote:
> > From t
+1 (non-binding).
Some of our users have asked for this tradeoff of consistency over
availability for some cases.
On Mon, Oct 19, 2020 at 8:02 PM Zhijiang
wrote:
> Thanks for driving this effort, Yuan.
>
> +1 (binding) on my side.
>
> Best,
> Zhijiang
>
>
> -
I would love to see this FLIP-27 source interface improvement [1] made to
1.11.3.
[1] https://issues.apache.org/jira/browse/FLINK-19698
On Wed, Oct 28, 2020 at 12:32 AM Tzu-Li (Gordon) Tai
wrote:
> Thanks for the replies so far!
>
> Just to provide a brief update on the status of blockers for 1
ingBlockingQueue.
> >> > >
> >> > > commit 4ea95782b4c6a2538153d4d16ad3f4839c7de0fb
> >> > > [FLINK-19223][connectors] Simplify Availability Future Model in Base
> >> > > Connector
> >> > >
> >> > > comm
Basically, it would be great to get the latest code in the
flink-connector-files (FLIP-27).
On Sat, Oct 31, 2020 at 9:57 AM Steven Wu wrote:
> Stephan, it will be great if we can also backport the DataStreamUtils
> related commits that help with collecting output from unbounded streams.
However, with the base connector changes backported, you should be able to
> run the file connector code from master against 1.11.3.
>
> The collect() utils can be picked back, I see no issue with that (it is
> isolated utilities).
>
> Best,
> Stephan
>
>
> On Mon, Nov 2,
; iceberg source and later replace it by a dependency on the Flink file
> source?
>
> On Mon, Nov 2, 2020 at 8:33 PM Steven Wu wrote:
>
> > Stephan, thanks a lot for explaining the file connector. that makes
> sense.
> >
> > I was asking because we were trying to re
ctoring would be used
>> >>>>> down
>> >>>>> the line, e.g. by the ShuffleCoordinator. The OperatorCoordinator
>> API
>> >>>>> is a
>> >>>>> non-public API and before reading the code, I wasn't even aware how
> Additionally, the configurable variables (operator name/id) are
logically not attached to the coordinator, but operators, so to me it
just doesn't make sense to structure it like this.
Chesnay, maybe we should clarify the terminology. To me, pperators (like
FLIP-27 source) can have two parts (co
s good to me to implement the shuffle operator in the Iceberg
> project first.
> We can contribute it to Flink DataStream in the future if other
> projects/connectors also need it.
>
> Best,
> Jark
>
>
> On Wed, 18 Jan 2023 at 02:11, Steven Wu wrote:
>
>> Jark,
&
Hi,
We had a proposal to add a streaming shuffling stage in the Flink Iceberg
sink to to improve data clustering and tame the small files problem [1].
Here are a couple of common use cases.
* Event time partitioned table where we can get small files problem due to
skewed and long-tail distributio
Regarding the discussion on global committer [1] for sinks with global
transactions, there is no consensus on solving that problem in SinkV2. Will
it require any breaking change in SinkV2?
Also will SinkV1 be deprecated too? or it should happen sometime after
SinkFunction deprecation?
[1] https:/
> > > developers will *have a workable migration path from the old API to
> > the
> > > > > new API*.
> > > >
> > > >
> > > > From a user's perspective, the workable migration path is very
> > important.
> > > > Othe
I think this is a reasonable extension to `DataOutputSerializer`. Although
64 KB is not small, it is still possible to have long strings over that
limit. There are already precedents of extended APIs
`DataOutputSerializer`. E.g.
public void setPosition(int position) {
Preconditions.checkArgume
Hi,
I am trying to implement a custom range partitioner in the Flink Iceberg
sink. Want to publish some counter metrics for certain scenarios. This is
like the network metrics exposed in `TaskIOMetricGroup`.
This requires adding a new setup method to the custom `Partitioner`
interface. Like to ge
> However, a single source operator may read data from multiple
splits/partitions, e.g., multiple Kafka partitions, such that even with
watermark alignment the source operator may need to buffer excessive amount
of data if one split emits data faster than another.
For this part from the motivation
ng into the FLIP-182 and ff. threads. The goal of FLIP-182 is
>> to pause readers while consuming a split, while your approach pauses
>> readers before processing another split. So it feels more closely related
>> to the global min watermark - so it could either be part
related comment sometime ago
> > about it [1]. It sounds to me like you also need to solve this problem,
> > otherwise Iceberg users will encounter late records in case of some race
> > conditions between assigning new splits and completions of older.
> >
> > Best,
>
he split, so as long as it also knows
> exactly which splits have finished, it would know which splits to hold back.
>
> Best,
> Piotrek
>
> śr., 4 maj 2022 o 20:03 Steven Wu napisał(a):
>
>> Piotr, thanks a lot for your feedback.
>>
>> > I can see
ther framework information to the user space in the future.
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> >
> >
> >> On Fri, May 6, 2022 at 6:15 AM Thomas Weise wrote:
> >>
> >>> On Wed, May 4, 2022 at 11:03 AM Steven W
might be the same as => might NOT be the same as
On Fri, May 6, 2022 at 8:13 PM Steven Wu wrote:
> The conclusion of this discussion could be that we don't see much value in
> leveraging FLIP-182 with Iceberg source. That would totally be fine.
>
> For me, one big s
In Iceberg source, we have a data generator source that can control the
records per checkpoint cycle. Can we support sth like this in the
DataGeneratorSource?
https://github.com/apache/iceberg/blob/master/flink/v1.15/flink/src/test/java/org/apache/iceberg/flink/source/BoundedTestSource.java
public
oing to be tricky since in the new Source API the
> > >> checkpointing
> > >> > > aspects that you based your logic on are pushed further away from
> > the
> > >> > > low-level interfaces responsible for handling data and splits [1].
> > At
Till, thank you for your immense contributions to the project and the
community.
On Mon, Feb 28, 2022 at 9:16 PM Xintong Song wrote:
> Thanks for everything, Till. It has been a great honor working with you.
> Good luck with your new chapter~!
>
> Thank you~
>
> Xintong Song
>
>
>
> On Tue, Mar
Variable substitution (proposed here) is definitely useful.
For us, hierarchical override is more useful. E.g., we may have the
default value of "state.checkpoints.dir=path1" defined in flink-conf.yaml.
But maybe we want to override it to "state.checkpoints.dir=path2" via
environment variable in
t; > > > >
> > > > > Thanks for your response.
> > > > >
> > > > > 1. Not distinguishing JM/TM is reasonable, but what about the
> client
> > > > side.
> > > > > For Yarn/K8s deployment,
> > > > > th
Congrats, Guowei!
On Wed, Jan 20, 2021 at 10:32 AM Seth Wiesman wrote:
> Congratulations!
>
> On Wed, Jan 20, 2021 at 3:41 AM hailongwang <18868816...@163.com> wrote:
>
> > Congratulations, Guowei!
> >
> > Best,
> > Hailong
> >
> > 在 2021-01-20 15:55:24,"Till Rohrmann" 写道:
> > >Congrats, Guowei
Thanks a lot for the proposal, Robert and Till.
> No fixed parallelism for any of the operators
Regarding this limitation, can the scheduler only adjust the default
parallelism? if some operators set parallelism explicitly (like always 1),
just leave them unchanged.
On Fri, Jan 22, 2021 at 8:42
Till, thanks a lot for the proposal.
Even if the initial phase is only to support scale-up, maybe the
"ScaleUpController" interface should be called "RescaleController" so that
in the future scale-down can be added.
On Fri, Jan 22, 2021 at 7:03 AM Till Rohrmann wrote:
> Hi everyone,
>
> I would
discussed the PR with Thosmas offline. Thomas, please correct me if I
missed anything.
Right now, the PR differs from the FLIP-150 doc regarding the converter.
* Current PR uses the enumerator checkpoint state type as the input for the
converter
* FLIP-150 defines a new EndStateT interface.
It see
I am not sure if we should solve this problem in Flink. This is more like a
dynamic config problem that probably should be solved by some configuration
framework. Here is one post from google search:
https://medium.com/twodigits/dynamic-app-configuration-inject-configuration-at-run-time-using-sprin
ts?
>
> Thanks,
> Thomas
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-150%3A+Introduce+Hybrid+Source#FLIP150:IntroduceHybridSource-Prototypeimplementation
> [2]
> https://github.com/apache/flink/pull/15924/files#diff-e07478b3cad9810925ec784b61ec0026396839cc5b
gt;>>>> starting joining the mainstream, it would be helpful to have an event
>>>>> signaling the finishing of the bootstrap.
>>>>>
>>>>> ## Dynamic REST controlling
>>>>> Back to the specific feature that Jiangang pro
e take a few minutes to think if this is the most intuitive name for
> new users? I'm especially hoping that natives might give some ideas (or
> declare that Hybrid is perfect).
>
> [1] https://github.com/apache/flink/pull/15924#pullrequestreview-677376664
>
> On Sun, Jun 6, 2
;are checkpointed.
>
>
> Steven Wu [via Apache Flink User Mailing List archive.] <
> ml+s2336050n44278...@n4.nabble.com> 于2021年6月8日周二 下午2:15写道:
>
> >
> > I can see the benefits of control flow. E.g., it might help the old (and
> > inactive) FLIP-17 side input.
I'm wrong, @Yun, @Jark)
>>>>>> * Iteration: When a certain condition is met, we might want to
>>>>>> signal downstream operators with an event
>>>>>> * Mini-batch assembling: Flink currently uses special watermarks
>>>>>>
+1 (non-binding)
On Thu, Jul 1, 2021 at 4:59 AM Thomas Weise wrote:
> +1 (binding)
>
>
> On Thu, Jul 1, 2021 at 8:13 AM Arvid Heise wrote:
>
> > +1 (binding)
> >
> > Thank you and Thomas for driving this
> >
> > On Thu, Jul 1, 2021 at 7:50 AM 蒋晓峰 wrote:
> >
> > > Hi everyone,
> > >
> > >
> > >
Awesome! Congratulations, Guowei!
On Wed, Jul 7, 2021 at 4:25 AM Jingsong Li wrote:
> Congratulations, Guowei!
>
> Best,
> Jingsong
>
> On Wed, Jul 7, 2021 at 6:36 PM Arvid Heise wrote:
>
> > Congratulations!
> >
> > On Wed, Jul 7, 2021 at 11:30 AM Till Rohrmann
> > wrote:
> >
> > > Congratula
- The subtask observes the changes in the throughput and changes the
buffer size during the whole life period of the task.
- The subtask sends buffer size and number of available buffers to the
upstream to the corresponding subpartition.
- Upstream changes the buffer size correspondi
I am trying to understand what those two metrics really capture
> G setPendingBytesGauge(G pendingBytesGauge);
- use file source as an example, it captures the remaining bytes for
the current file split that the reader is processing? How would users
interpret or use this metric? enumera
This is awesome. Thank you, Chesney!
On Wed, Jul 14, 2021 at 1:50 AM Yun Tang wrote:
> Great news, thanks for Chesnay's work!
>
> Best
> Yun Tang
>
> From: Martijn Visser
> Sent: Wednesday, July 14, 2021 16:05
> To: dev@flink.apache.org
> Subject: Re: [NOTICE]
ot greater than it)
> > >
> > > So apart from adding buffer size information to the `AddCredit`
> message,
> > we
> > > will need to support a case where upstream subpartition has already
> > > produced a buffer with older size (for example 32KB), while the nex
gt;
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
>
> On Wed, Jul 14, 2021 at 9:00 AM Steven Wu wrote:
>
> > I am trying to understand what those two metrics really capture
> >
> > > G setPendingBytesGaug
h. @Becket Qin
> are you okay with the setLastFetchTimeGauge explanation or do you have
> alternative ideas?
>
> Best,
>
> Arvid
>
> On Fri, Jul 16, 2021 at 8:13 PM Steven Wu wrote:
>
> > To avoid confusion, can we either rename "SourceMetricGroup" to &
> if a failure happens after sequence of finish() -> snapshotState(), but
before notifyCheckpointComplete(), we will restore such a state and we
might end up sending some more records to such an operator.
I probably missed sth here. isn't this the case today already? Why is it a
concern for the pr
+1 (non-binding)
On Fri, Jul 30, 2021 at 3:55 AM Arvid Heise wrote:
> Dear devs,
>
> I'd like to open a vote on FLIP-179: Expose Standardized Operator Metrics
> [1] which was discussed in this thread [2].
> The vote will be open for at least 72 hours unless there is an objection
> or not enough
Fabian, thanks a lot for the proposal and starting the discussion.
We probably should first describe the different causes of small files and
what problems was this proposal trying to solve. I wrote a data shuffling
proposal [1] for Flink Iceberg sink (shared with Iceberg community [2]). It
can add
> although I think only using a customizable shuffle won't address the
generation of small files. One assumption is that at least the sink generates
one file per subtask, which can already be too many. Another problem is
that with low checkpointing intervals, the files do not meet the required
siz
+1 (non-binding)
On Wed, Aug 3, 2022 at 5:47 AM Martijn Visser
wrote:
> +1 (binding)
>
> Op wo 3 aug. 2022 om 14:33 schreef Piotr Nowojski :
>
> > +1 (binding)
> >
> > śr., 3 sie 2022 o 14:13 Thomas Weise napisał(a):
> >
> > > +1 (binding)
> > >
> > >
> > > On Sun, Jul 31, 2022 at 10:57 PM Seba
In the V1 sink interface, there is a GlobalCommitter for Iceberg. With the
V2 sink interface, GlobalCommitter has been deprecated by
WithPostCommitTopology. I thought the post commit stage is mainly for async
maintenance (like compaction).
Are we supposed to do sth similar to the GlobalCommitting
16, 2022 at 9:53 AM Steven Wu wrote:
>
> In the V1 sink interface, there is a GlobalCommitter for Iceberg. With the
> V2 sink interface, GlobalCommitter has been deprecated by
> WithPostCommitTopology. I thought the post commit stage is mainly for async
> maintenance (like compa
we are using the `WithPostCommitTopology`
> for global committer, we would lose the capability of using the post commit
> stage for small files compaction.
> On Tue, Aug 16, 2022 at 9:53 AM Steven Wu wrote:
> >
> > In the V1 sink interface, there is a GlobalCommitter for Iceberg
an actual commit to the Delta Log which should be done from a one
> > > place/instance.
> > >
> > > Currently I'm evaluating V2 for our connector and having, how Steven
> > > described it a "more natural, built-in concept/support of
> GlobalComm
Congrats, Martijn!
On Mon, Sep 12, 2022 at 1:49 PM Alexander Fedulov
wrote:
> Congrats, Martijn!
>
> On Mon, Sep 12, 2022 at 10:06 AM Jing Ge wrote:
>
> > Congrats!
> >
> > On Mon, Sep 12, 2022 at 9:38 AM Daisy Tsang wrote:
> >
> > > Congrats!
> > >
> > > On Mon, Sep 12, 2022 at 9:32 AM Martij
hmielewski [1] from Delta-Flink connector open source
>> > community
>> > > here [2].
>> > >
>> > > I'm totally agree with Steven on this. Sink's V1 GlobalCommitter is
>> > > something exactly what Flink-Delta Sink needs since it is
> https://drive.google.com/file/d/1kU0R9nLZneJBDAkgNiaRc90dLGycyTec/view?usp=sharing
> [4] https://lists.apache.org/thread/otscy199g1l9t3llvo8s2slntyn2r1jc
> [5]
> https://github.com/apache/flink/blob/release-1.15/flink-streaming-java/src/main/java/org/apache/flink/streaming/runtime/opera
[2]
> https://github.com/delta-io/connectors/blob/master/flink/src/main/java/io/delta/flink/sink/internal/committer/DeltaGlobalCommitter.java
> [3]
> https://drive.google.com/file/d/1kU0R9nLZneJBDAkgNiaRc90dLGycyTec/view?usp=sharing
> [4] https://lists.apache.org/thread/otscy199g1l9t3ll
Jing, thanks a lot for your reply. The linked google doc is not for this
FLIP, which is fully documented in the wiki page. The linked google doc is
the design doc to introduce shuffling in Flink Iceberg sink, which
motivated this FLIP proposal so that the shuffle coordinator can leverage
the introd
With the model of externalized Flink connector repo (which I fully
support), there is one challenge of supporting versions of two upstream
projects (similar to what Peter Vary mentioned earlier).
E.g., today the Flink Iceberg connector lives in Iceberg repo. We have
separate modules 1.13, 1.14, 1.
avour of exposing operator
> coordinators if there is a good reason behind that, but it is a more
> difficult topic and might be a larger effort than it seems at the first
> glance.
>
> Best,
> Piotrek
>
> wt., 4 paź 2022 o 19:41 Steven Wu napisał(a):
>
> > Jing, th
class for SourceCoordinatorContext.
> But I prefer to use the name `OperatorCoordinatorContextBase` or
> `CoordinatorContextBase` as the format like `SourceReaderBase`.
> I also agree to what Piotr said. Maybe more problems will occur when
> connectors start to use it.
>
> Best,
> Hang
&g
1 - 100 of 113 matches
Mail list logo