date:20230830

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-30 Thread Hang Ruan

Hi, Xuannan.

Thanks for preparing the FLIP.

After this FLIP, we will have two ways to report isProcessingBacklog: 1.
>From the source; 2. Judged by the watermark lag. What is the priority
between them?
For example, what is the status isProcessingBacklog when the source report
`isProcessingBacklog=false` and the watermark lag exceeds the threshold?

Best,
Hang

Xuannan Su  于2023年8月30日周三 10:06写道：

> Hi Jing,
>
> Thank you for the suggestion.
>
> The definition of watermark lag is the same as the watermarkLag metric in
> FLIP-33[1]. More specifically, the watermark lag calculation is computed at
> the time when a watermark is emitted downstream in the following way:
> watermarkLag = CurrentTime - Watermark. I have added this description to
> the FLIP.
>
> I hope this addresses your concern.
>
> Best,
> Xuannan
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
>
>
> > On Aug 28, 2023, at 01:04, Jing Ge  wrote:
> >
> > Hi Xuannan,
> >
> > Thanks for the proposal. +1 for me.
> >
> > There is one tiny thing that I am not sure if I understand it correctly.
> > Since there will be many different WatermarkStrategies and different
> > WatermarkGenerators. Could you please update the FLIP and add the
> > description of how the watermark lag is calculated exactly? E.g.
> Watermark
> > lag = A - B with A is the timestamp of the watermark emitted to the
> > downstream and B is(this is the part I am not really sure after
> reading
> > the FLIP).
> >
> > Best regards,
> > Jing
> >
> >
> > On Mon, Aug 21, 2023 at 9:03 AM Xuannan Su 
> wrote:
> >
> >> Hi Jark,
> >>
> >> Thanks for the comments.
> >>
> >> I agree that the current solution cannot support jobs that cannot define
> >> watermarks. However, after considering the pending-record-based
> solution, I
> >> believe the current solution is superior for the target use case as it
> is
> >> more intuitive for users. The backlog status gives users the ability to
> >> balance between throughput and latency. Making this trade-off decision
> >> based on the watermark lag is more intuitive from the user's
> perspective.
> >> For instance, a user can decide that if the job lags behind the current
> >> time by more than 1 hour, the result is not usable. In that case, we can
> >> optimize for throughput when the data lags behind by more than an hour.
> >> With the pending-record-based solution, it's challenging for users to
> >> determine when to optimize for throughput and when to prioritize
> latency.
> >>
> >> Regarding the limitations of the watermark-based solution:
> >>
> >> 1. The current solution can support jobs with sources that have event
> >> time. Users can always define a watermark at the source operator, even
> if
> >> it's not used by downstream operators, such as streaming join and
> unbounded
> >> aggregate.
> >>
> >> 2.I don't believe it's accurate to say that the watermark lag will keep
> >> increasing if no data is generated in Kafka. The watermark lag and
> backlog
> >> status are determined at the moment when the watermark is emitted to the
> >> downstream operator. If no data is emitted from the source, the
> watermark
> >> lag and backlog status will not be updated. If the WatermarkStrategy
> with
> >> idleness is used, the source becomes non-backlog when it becomes idle.
> >>
> >> 3. I think watermark lag is more intuitive to determine if a job is
> >> processing backlog data. Even when using pending records, it faces a
> >> similar issue. For example, if the source has 1K pending records, those
> >> records can span from 1 day  to 1 hour to 1 second. If the records span
> 1
> >> day, it's probably best to optimize for throughput. If they span 1
> hour, it
> >> depends on the business logic. If they span 1 second, optimizing for
> >> latency is likely the better choice.
> >>
> >> In summary, I believe the watermark-based solution is a superior choice
> >> for the target use case where watermark/event time can be defined.
> >> Additionally, I haven't come across a scenario that requires low-latency
> >> processing and reads from a source that cannot define watermarks. If we
> >> encounter such a use case, we can create another FLIP to address those
> >> needs in the future. What do you think?
> >>
> >>
> >> Best,
> >> Xuannan
> >>
> >>
> >>
> >>> On Aug 20, 2023, at 23:27, Jark Wu  >> imj...@gmail.com>> wrote:
> >>>
> >>> Hi Xuannan,
> >>>
> >>> Thanks for opening this discussion.
> >>>
> >>> This current proposal may work in the mentioned watermark cases.
> >>> However, it seems this is not a general solution for sources to
> determine
> >>> "isProcessingBacklog".
> >>> From my point of view, there are 3 limitations of the current proposal:
> >>> 1. It doesn't cover jobs that don't have watermark/event-time defined,
> >>> for example streaming join and unbounded aggregate. We may still need
> to
> >>> figure out solutions for them.
> >>> 2. Watermark lag can not be trusted, because it increases unlim

[jira] [Created] (FLINK-32995) TPC-DS end-to-end test fails with chmod: cannot access '../target/generator/dsdgen_linux':

2023-08-30 Thread Sergey Nuyanzin (Jira)

Sergey Nuyanzin created FLINK-32995:
---

 Summary: TPC-DS end-to-end test fails with chmod: cannot access 
'../target/generator/dsdgen_linux': 
 Key: FLINK-32995
 URL: https://issues.apache.org/jira/browse/FLINK-32995
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.19.0
Reporter: Sergey Nuyanzin


This build 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=52773&view=logs&j=af184cdd-c6d8-5084-0b69-7e9c67b35f7a&t=0f3adb59-eefa-51c6-2858-3654d9e0749d&l=5504
 fails as
{noformat}
Aug 29 10:03:20 [INFO] 10:03:20 Generating TPC-DS qualification data, this need 
several minutes, please wait...
chmod: cannot access '../target/generator/dsdgen_linux': No such file or 
directory
Aug 29 10:03:20 [FAIL] Test script contains errors.
Aug 29 10:03:20 Checking for errors...

{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32996) CheckpointAfterAllTasksFinishedITCase.testFailoverAfterSomeTasksFinished fails on AZP

2023-08-30 Thread Sergey Nuyanzin (Jira)

Sergey Nuyanzin created FLINK-32996:
---

 Summary:  
CheckpointAfterAllTasksFinishedITCase.testFailoverAfterSomeTasksFinished fails 
on AZP
 Key: FLINK-32996
 URL: https://issues.apache.org/jira/browse/FLINK-32996
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.18.0
Reporter: Sergey Nuyanzin


This build 
https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=52810&view=logs&j=8fd9202e-fd17-5b26-353c-ac1ff76c8f28&t=ea7cf968-e585-52cb-e0fc-f48de023a7ca&l=8366
fails as
{noformat}
Aug 30 02:20:32 02:20:32.726 [ERROR] Failures: 
Aug 30 02:20:32 02:20:32.726 [ERROR]   
CheckpointAfterAllTasksFinishedITCase.testFailoverAfterSomeTasksFinished:162 
Aug 30 02:20:32 expected: 20
Aug 30 02:20:32  but was: 40
Aug 30 02:20:32 02:20:32.726 [INFO] 

{noformat}

it is very likely it is related to FLINK-32907



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32997) [JUnit5 Migration] Module: flink-table-planner (StreamingTestBase)

2023-08-30 Thread Jiabao Sun (Jira)

Jiabao Sun created FLINK-32997:
--

 Summary: [JUnit5 Migration] Module: flink-table-planner 
(StreamingTestBase)
 Key: FLINK-32997
 URL: https://issues.apache.org/jira/browse/FLINK-32997
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.18.0
Reporter: Jiabao Sun
 Fix For: 1.19.0


JUnit5 Migration Module: flink-table-planner (StreamingTestBase)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32998) if function result not correct

2023-08-30 Thread zhou (Jira)

zhou created FLINK-32998:


 Summary: if function result not correct
 Key: FLINK-32998
 URL: https://issues.apache.org/jira/browse/FLINK-32998
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.15.4
Reporter: zhou
 Attachments: image-2023-08-30-18-29-16-277.png, 
image-2023-08-30-18-30-05-568.png

!image-2023-08-30-18-29-16-277.png!

!image-2023-08-30-18-30-05-568.png!

if function result not correct,not result in origin field value, cut off the 
filed(word) value 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-32999) Remove HBase connector from master branch

2023-08-30 Thread Sergey Nuyanzin (Jira)

Sergey Nuyanzin created FLINK-32999:
---

 Summary: Remove HBase connector from master branch
 Key: FLINK-32999
 URL: https://issues.apache.org/jira/browse/FLINK-32999
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / HBase
Reporter: Sergey Nuyanzin


The connector was externalized at FLINK-30061
Once it is released it would make sense to remove it from master branch



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33000) SqlGatewayServiceITCase should utilize TestExecutorExtension instead of using a ThreadFactory

2023-08-30 Thread Matthias Pohl (Jira)

Matthias Pohl created FLINK-33000:
-

 Summary: SqlGatewayServiceITCase should utilize 
TestExecutorExtension instead of using a ThreadFactory
 Key: FLINK-33000
 URL: https://issues.apache.org/jira/browse/FLINK-33000
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Gateway, Tests
Affects Versions: 1.17.1, 1.16.2, 1.18.0, 1.19.0
Reporter: Matthias Pohl


{{SqlGatewayServiceITCase}} uses a {{ExecutorThreadFactory}} for its 
asynchronous operations. Instead, one should use {{TestExecutorExtension}} to 
ensure proper cleanup of threads.

We might also want to remove the {{AbstractTestBase}} parent class because that 
uses JUnit4 whereas {{SqlGatewayServiceITCase}} is already based on JUnit5



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33001) KafkaSource in batch mode failing with exception if topic partition is empty

2023-08-30 Thread Abdul (Jira)

Abdul created FLINK-33001:
-

 Summary: KafkaSource in batch mode failing with exception if topic 
partition is empty
 Key: FLINK-33001
 URL: https://issues.apache.org/jira/browse/FLINK-33001
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.17.1, 1.14.6, 1.12.7
 Environment: The only workaround that works fine right now is to 
change the DEBUG level to INFO for logging. 

 
{code:java}
logger.KafkaPartitionSplitReader.name = 
org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader 

logger.KafkaPartitionSplitReader.level = INFO{code}
It is strange that changing this doesn't cause the above exception. 
Reporter: Abdul


If the Kafka topic is empty in Batch mode, there is an exception while 
processing it. This bug was supposedly fixed but unfortunately, the exception 
still occurs. The original bug was reported as this 
https://issues.apache.org/jira/browse/FLINK-27041


We tried to backport it but it still doesn't work. 
 * The problem will occur in case of DEBUG level of logger for class 
org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader
 * The same problems will occur in other versions of Flink, at least in the 
1.15 release branch and tag release-1.15.4
 * Same problem also occur in Flink 1.7.1 and 1.14

 

 

The minimal code to produce this is 

 
final StreamExecutionEnvironment env = 
StreamExecutionEnvironment.getExecutionEnvironment();
env.setRuntimeMode(RuntimeExecutionMode.BATCH);

KafkaSource kafkaSource = KafkaSource
.builder()
.setBootstrapServers("localhost:9092")
.setTopics("test_topic")
.setValueOnlyDeserializer(new 
SimpleStringSchema())
.setBounded(OffsetsInitializer.latest())
.build();

DataStream stream = env.fromSource(
kafkaSource,
WatermarkStrategy.noWatermarks(),   
"Kafka Source"  );
stream.print();

env.execute("Flink KafkaSource test job");
This produces exception: 
{code:java}
Caused by: java.lang.RuntimeException: One or more fetchers have encountered 
exception    at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcherManager.checkErrors(SplitFetcherManager.java:199)
    at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.getNextFetch(SourceReaderBase.java:154)
    at 
org.apache.flink.connector.base.source.reader.SourceReaderBase.pollNext(SourceReaderBase.java:116)
    at 
org.apache.flink.streaming.api.operators.SourceOperator.emitNext(SourceOperator.java:275)
    at 
org.apache.flink.streaming.runtime.io.StreamTaskSourceInput.emitNext(StreamTaskSourceInput.java:67)
    at 
org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:398)
    at 
org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:191)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:619)
    at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:583) 
   at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:758)    
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:573)    at 
java.lang.Thread.run(Thread.java:748)Caused by: java.lang.RuntimeException: 
SplitFetcher thread 0 received unexpected exception while polling the records   
 at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.runOnce(SplitFetcher.java:146)
    at 
org.apache.flink.connector.base.source.reader.fetcher.SplitFetcher.run(SplitFetcher.java:101)
    at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)    
at java.util.concurrent.FutureTask.run(FutureTask.java:266)    at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
   at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
   ... 1 moreCaused by: java.lang.IllegalStateException: You can only check 
the position for partitions assigned to this consumer.    at 
org.apache.kafka.clients.consumer.KafkaConsumer.position(KafkaConsumer.java:1737)
    at 
org.apache.kafka.clients.consumer.KafkaConsumer.position(KafkaConsumer.java:1704)
    at 
org.apache.flink.connector.kafka.source.reader.KafkaPartitionSplitReader.maybeLogSplitChangesHandlingResult(KafkaPartitionSplitReader.java:375)
    at 
org.apache.flink.connector.kafka.source.reader.K

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-30 Thread Jing Ge

Hi Xuannan,

Thanks for the clarification. That is the part where I am trying to
understand your thoughts. I have some follow-up questions:

1. It depends strongly on the watermarkStrategy and how customized
watermark generation looks like. It mixes business logic with technical
implementation and technical data processing mode. The value of the
watermark lag threshold must be set very carefully. If the value is too
small. any time, when the watermark generation logic is changed(business
logic changes lead to the threshold getting exceeded), the same job might
be running surprisingly in backlog processing mode, i.e. a butterfly
effect. A comprehensive documentation is required to avoid any confusion
for the users.
2. Like Jark already mentioned, use cases that do not have watermarks,
like pure processing-time based stream processing[1] are not covered. It is
more or less a trade-off solution that does not support such use cases and
appropriate documentation is required. Forcing them to explicitly generate
watermarks that are never needed just because of this does not sound like a
proper solution.
3. If I am not mistaken, it only works for use cases where event times are
very close to the processing times, because the wall clock is used to
calculate the watermark lag and the watermark is generated based on the
event time.

Best regards,
Jing

[1]
https://github.com/apache/flink/blob/2c50b4e956305426f478b726d4de4a640a16b810/flink-core/src/main/java/org/apache/flink/api/common/eventtime/WatermarkStrategy.java#L236

On Wed, Aug 30, 2023 at 4:06 AM Xuannan Su  wrote:

> Hi Jing,
>
> Thank you for the suggestion.
>
> The definition of watermark lag is the same as the watermarkLag metric in
> FLIP-33[1]. More specifically, the watermark lag calculation is computed at
> the time when a watermark is emitted downstream in the following way:
> watermarkLag = CurrentTime - Watermark. I have added this description to
> the FLIP.
>
> I hope this addresses your concern.
>
> Best,
> Xuannan
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
>
>
> > On Aug 28, 2023, at 01:04, Jing Ge  wrote:
> >
> > Hi Xuannan,
> >
> > Thanks for the proposal. +1 for me.
> >
> > There is one tiny thing that I am not sure if I understand it correctly.
> > Since there will be many different WatermarkStrategies and different
> > WatermarkGenerators. Could you please update the FLIP and add the
> > description of how the watermark lag is calculated exactly? E.g.
> Watermark
> > lag = A - B with A is the timestamp of the watermark emitted to the
> > downstream and B is(this is the part I am not really sure after
> reading
> > the FLIP).
> >
> > Best regards,
> > Jing
> >
> >
> > On Mon, Aug 21, 2023 at 9:03 AM Xuannan Su 
> wrote:
> >
> >> Hi Jark,
> >>
> >> Thanks for the comments.
> >>
> >> I agree that the current solution cannot support jobs that cannot define
> >> watermarks. However, after considering the pending-record-based
> solution, I
> >> believe the current solution is superior for the target use case as it
> is
> >> more intuitive for users. The backlog status gives users the ability to
> >> balance between throughput and latency. Making this trade-off decision
> >> based on the watermark lag is more intuitive from the user's
> perspective.
> >> For instance, a user can decide that if the job lags behind the current
> >> time by more than 1 hour, the result is not usable. In that case, we can
> >> optimize for throughput when the data lags behind by more than an hour.
> >> With the pending-record-based solution, it's challenging for users to
> >> determine when to optimize for throughput and when to prioritize
> latency.
> >>
> >> Regarding the limitations of the watermark-based solution:
> >>
> >> 1. The current solution can support jobs with sources that have event
> >> time. Users can always define a watermark at the source operator, even
> if
> >> it's not used by downstream operators, such as streaming join and
> unbounded
> >> aggregate.
> >>
> >> 2.I don't believe it's accurate to say that the watermark lag will keep
> >> increasing if no data is generated in Kafka. The watermark lag and
> backlog
> >> status are determined at the moment when the watermark is emitted to the
> >> downstream operator. If no data is emitted from the source, the
> watermark
> >> lag and backlog status will not be updated. If the WatermarkStrategy
> with
> >> idleness is used, the source becomes non-backlog when it becomes idle.
> >>
> >> 3. I think watermark lag is more intuitive to determine if a job is
> >> processing backlog data. Even when using pending records, it faces a
> >> similar issue. For example, if the source has 1K pending records, those
> >> records can span from 1 day  to 1 hour to 1 second. If the records span
> 1
> >> day, it's probably best to optimize for throughput. If they span 1
> hour, it
> >> depends on the business logic. If they span 1 second, optimizing for
> >>

[jira] [Created] (FLINK-33002) Bump snappy-java from 1.1.4 to 1.1.10.1

2023-08-30 Thread Martijn Visser (Jira)

Martijn Visser created FLINK-33002:
--

 Summary: Bump snappy-java from 1.1.4 to 1.1.10.1
 Key: FLINK-33002
 URL: https://issues.apache.org/jira/browse/FLINK-33002
 Project: Flink
  Issue Type: Technical Debt
  Components: Stateful Functions
Reporter: Martijn Visser
Assignee: Martijn Visser






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33003) Flink ML add isolationForest algorithm

2023-08-30 Thread zhaozijun (Jira)

zhaozijun created FLINK-33003:
-

 Summary: Flink ML add isolationForest algorithm
 Key: FLINK-33003
 URL: https://issues.apache.org/jira/browse/FLINK-33003
 Project: Flink
  Issue Type: New Feature
  Components: Library / Machine Learning
Reporter: zhaozijun
 Attachments: IsolationForest.zip

I want to use flink solve some problems related to anomaly detection, but 
currently flink ml lacks algorithms related to anomaly detection, so I want to 
add the isolation forest algorithm to library/flink ml. During the 
implementation process, when IterationBody is used, I try to understand the 
implementation of the Kmeans algorithm, and use iterative behavior to calculate 
the center point of the isolation forest algorithm, but in the test, I found 
that when the parallelism > 1, the number of iterations > 1, and there will be 
sometimes succeed sometimes fail (fail to find the broadcast variable). Please 
teachers help me to review and point out my problem. Thank you 🙏



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (FLINK-33004) Decoupling topology and network memory to support complex job topologies

2023-08-30 Thread dalongliu (Jira)

dalongliu created FLINK-33004:
-

 Summary: Decoupling topology and network memory to support complex 
job topologies
 Key: FLINK-33004
 URL: https://issues.apache.org/jira/browse/FLINK-33004
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Network
Affects Versions: 1.18.0, 1.19.0
Reporter: dalongliu


Currently, the default value of taskmanager.memory.network.fraction option in 
Flink is 0.1, and after the topology of the job is complex enough, it will run 
with an insufficient network buffer. We currently encountered this issue when 
running TPC-DS test set q9, and bypassed it by adjusting 
taskmanager.memory.network.fraction to 0.2. Theoretically, we should have 
network memory decoupled from the job topology so that arbitrarily complex jobs 
can be supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

2023-08-30 Thread Xuannan Su

Hi Jing,

Thanks for the reply.

1. You are absolutely right that the watermark lag threshold must be carefully 
set with a thorough understanding of watermark generation. It is crucial for 
users to take into account the WatermarkStrategy when setting the watermark lag 
threshold.

2. Regarding pure processing-time based stream processing jobs, alternative 
strategies will be implemented to determine whether the job is processing 
backlog data. I have outlined two possible strategies below:

- Based on the source operator's state. For example, when MySQL CDC source is 
reading snapshot, it can claim isBacklog=true.
- Based on metrics. For example, when busyTimeMsPerSecond (or 
backPressuredTimeMsPerSecond) > user_specified_threshold, then isBacklog=true.

As of the strategies proposed in this FLIP, it rely on generated watermarks. 
Therefore, if a user intends for the job to detect backlog status based on 
watermark, it is necessary to generate the watermark.

3. I'm afraid I'm not fully grasping your question. From my understanding, it 
should work in both cases. When event times are close to the processing time, 
resulting in watermarks close to the processing time, the job is not processing 
backlog data. On the other hand, when event times are far from processing time, 
causing watermarks to also be distant, if the lag surpasses the defined 
threshold, the job is considered processing backlog data.

Best,
Xuannan


> On Aug 31, 2023, at 02:56, Jing Ge  wrote:
> 
> Hi Xuannan,
> 
> Thanks for the clarification. That is the part where I am trying to
> understand your thoughts. I have some follow-up questions:
> 
> 1. It depends strongly on the watermarkStrategy and how customized
> watermark generation looks like. It mixes business logic with technical
> implementation and technical data processing mode. The value of the
> watermark lag threshold must be set very carefully. If the value is too
> small. any time, when the watermark generation logic is changed(business
> logic changes lead to the threshold getting exceeded), the same job might
> be running surprisingly in backlog processing mode, i.e. a butterfly
> effect. A comprehensive documentation is required to avoid any confusion
> for the users.
> 2. Like Jark already mentioned, use cases that do not have watermarks,
> like pure processing-time based stream processing[1] are not covered. It is
> more or less a trade-off solution that does not support such use cases and
> appropriate documentation is required. Forcing them to explicitly generate
> watermarks that are never needed just because of this does not sound like a
> proper solution.
> 3. If I am not mistaken, it only works for use cases where event times are
> very close to the processing times, because the wall clock is used to
> calculate the watermark lag and the watermark is generated based on the
> event time.
> 
> Best regards,
> Jing
> 
> [1]
> https://github.com/apache/flink/blob/2c50b4e956305426f478b726d4de4a640a16b810/flink-core/src/main/java/org/apache/flink/api/common/eventtime/WatermarkStrategy.java#L236
> 
> On Wed, Aug 30, 2023 at 4:06 AM Xuannan Su  wrote:
> 
>> Hi Jing,
>> 
>> Thank you for the suggestion.
>> 
>> The definition of watermark lag is the same as the watermarkLag metric in
>> FLIP-33[1]. More specifically, the watermark lag calculation is computed at
>> the time when a watermark is emitted downstream in the following way:
>> watermarkLag = CurrentTime - Watermark. I have added this description to
>> the FLIP.
>> 
>> I hope this addresses your concern.
>> 
>> Best,
>> Xuannan
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-33%3A+Standardize+Connector+Metrics
>> 
>> 
>>> On Aug 28, 2023, at 01:04, Jing Ge  wrote:
>>> 
>>> Hi Xuannan,
>>> 
>>> Thanks for the proposal. +1 for me.
>>> 
>>> There is one tiny thing that I am not sure if I understand it correctly.
>>> Since there will be many different WatermarkStrategies and different
>>> WatermarkGenerators. Could you please update the FLIP and add the
>>> description of how the watermark lag is calculated exactly? E.g.
>> Watermark
>>> lag = A - B with A is the timestamp of the watermark emitted to the
>>> downstream and B is(this is the part I am not really sure after
>> reading
>>> the FLIP).
>>> 
>>> Best regards,
>>> Jing
>>> 
>>> 
>>> On Mon, Aug 21, 2023 at 9:03 AM Xuannan Su 
>> wrote:
>>> 
 Hi Jark,
 
 Thanks for the comments.
 
 I agree that the current solution cannot support jobs that cannot define
 watermarks. However, after considering the pending-record-based
>> solution, I
 believe the current solution is superior for the target use case as it
>> is
 more intuitive for users. The backlog status gives users the ability to
 balance between throughput and latency. Making this trade-off decision
 based on the watermark lag is more intuitive from the user's
>> perspective.
 For instance, a user can decide that if the job lags behind the c

Re: 退订

2023-08-30 Thread liu ron

Please send email to user-unsubscr...@flink.apache.org if you want to
unsubscribe the mail from u...@flink.apache.org, and you can refer [1][2]
for more details.

[1]
https://flink.apache.org/zh/community/#%e9%82%ae%e4%bb%b6%e5%88%97%e8%a1%a8
[2] https://flink.apache.org/community.html#mailing-lists

Best,
Ron

喻凯  于2023年8月30日周三 14:17写道：

>
>

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

[jira] [Created] (FLINK-32995) TPC-DS end-to-end test fails with chmod: cannot access '../target/generator/dsdgen_linux':

[jira] [Created] (FLINK-32996) CheckpointAfterAllTasksFinishedITCase.testFailoverAfterSomeTasksFinished fails on AZP

[jira] [Created] (FLINK-32997) [JUnit5 Migration] Module: flink-table-planner (StreamingTestBase)

[jira] [Created] (FLINK-32998) if function result not correct

[jira] [Created] (FLINK-32999) Remove HBase connector from master branch

[jira] [Created] (FLINK-33000) SqlGatewayServiceITCase should utilize TestExecutorExtension instead of using a ThreadFactory

[jira] [Created] (FLINK-33001) KafkaSource in batch mode failing with exception if topic partition is empty

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

[jira] [Created] (FLINK-33002) Bump snappy-java from 1.1.4 to 1.1.10.1

[jira] [Created] (FLINK-33003) Flink ML add isolationForest algorithm

[jira] [Created] (FLINK-33004) Decoupling topology and network memory to support complex job topologies

Re: [DISCUSS] FLIP-328: Allow source operators to determine isProcessingBacklog based on watermark lag

Re: 退订

14 matches

Site Navigation

Mail list logo

Footer information