[jira] [Created] (FLINK-27569) Terminated Flink job restarted from empty state when execution.shutdown-on-application-finish is false

2022-05-11 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-27569:
--

 Summary: Terminated Flink job restarted from empty state when 
execution.shutdown-on-application-finish is false
 Key: FLINK-27569
 URL: https://issues.apache.org/jira/browse/FLINK-27569
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes, Runtime / Checkpointing
Affects Versions: 1.15.0
Reporter: Gyula Fora
 Attachments: Screenshot 2022-05-11 at 08.46.51.png, Screenshot 
2022-05-11 at 08.50.03.png

When Jobmanager HA is enabled and execution.shutdown-on-application-finish = 
false, terminated jobs (failed, cancelled etc) will be resubmitted from a 
compeltely empty state on jobmanager failover.

Please see the following situation. Flink 1.15, HA enabled, shutdown on app 
finish off:

1. Submit Flink application cluster
2. Call cancel with savepoint -> see logs below

job succesfully finishes with savepoint
{noformat}
2022-05-11 06:42:48,562 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Job 
 reached terminal state FINISHED.
2022-05-11 06:42:48,624 INFO  
org.apache.flink.runtime.dispatcher.StandaloneDispatcher     [] - Job 
 has been registered for cleanup in the 
JobResultStore after reaching a terminal state.
2022-05-11 06:42:48,626 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Stopping the JobMaster for job 'State machine job' 
().
2022-05-11 06:42:48,629 INFO  
org.apache.flink.runtime.checkpoint.DefaultCompletedCheckpointStore [] - 
Shutting down
2022-05-11 06:42:48,647 INFO  
org.apache.flink.kubernetes.highavailability.KubernetesCheckpointIDCounter [] - 
Shutting down.
2022-05-11 06:42:48,647 INFO  
org.apache.flink.kubernetes.highavailability.KubernetesCheckpointIDCounter [] - 
Removing counter from ConfigMap 
basic-checkpoint-ha-example--config-map
2022-05-11 06:42:48,652 INFO  
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool [] - 
Releasing slot [0cdb18eefcb2133049223214d4716fa0].
2022-05-11 06:42:48,653 INFO  
org.apache.flink.runtime.jobmaster.slotpool.DefaultDeclarativeSlotPool [] - 
Releasing slot [bf5ece74692d786f6ba2b067c76ee1d9].
2022-05-11 06:42:48,653 INFO  org.apache.flink.runtime.jobmaster.JobMaster      
           [] - Close ResourceManager connection 
220ea961c86ea8042fde2151fd05a5c9: Stopping JobMaster for job 'State machine 
job' ().
2022-05-11 06:42:48,653 INFO  
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
Stopping DefaultLeaderRetrievalService.
2022-05-11 06:42:48,653 INFO  
org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver [] 
- Stopping 
KubernetesLeaderRetrievalDriver{configMapName='basic-checkpoint-ha-example-cluster-config-map'}.
2022-05-11 06:42:48,655 INFO  
org.apache.flink.kubernetes.kubeclient.resources.KubernetesConfigMapSharedInformer
 [] - Stopped to watch for 
default/basic-checkpoint-ha-example-cluster-config-map, watching 
id:9a1bc36b-6a76-4970-96a0-945e9a12b66d
2022-05-11 06:42:48,655 INFO  
org.apache.flink.runtime.leaderretrieval.DefaultLeaderRetrievalService [] - 
Stopping DefaultLeaderRetrievalService.
2022-05-11 06:42:48,655 INFO  
org.apache.flink.kubernetes.highavailability.KubernetesLeaderRetrievalDriver [] 
- Stopping 
KubernetesLeaderRetrievalDriver{configMapName='basic-checkpoint-ha-example-cluster-config-map'}.
2022-05-11 06:42:48,655 INFO  
org.apache.flink.kubernetes.kubeclient.resources.KubernetesConfigMapSharedInformer
 [] - Stopped to watch for 
default/basic-checkpoint-ha-example-cluster-config-map, watching 
id:5facec4c-d888-43b4-88d0-d1f34912d35a
2022-05-11 06:42:48,655 INFO  
org.apache.flink.runtime.resourcemanager.active.ActiveResourceManager [] - 
Disconnect job manager 
969eeac09f5cf4813103003495204...@akka.tcp://flink@172.17.0.6:6123/user/rpc/jobmanager_2
 for job  from the resource manager.
2022-05-11 06:42:48,660 INFO  
org.apache.flink.runtime.leaderelection.DefaultLeaderElectionService [] - 
Stopping DefaultLeaderElectionService.
2022-05-11 06:42:48,723 INFO  
org.apache.flink.kubernetes.highavailability.KubernetesMultipleComponentLeaderElectionHaServices
 [] - Clean up the high availability data for job 
.
2022-05-11 06:42:48,753 INFO  
org.apache.flink.runtime.jobmanager.DefaultJobGraphStore     [] - Removed job 
graph  from 
KubernetesStateHandleStore{configMapName='basic-checkpoint-ha-example-cluster-config-map'}.
2022-05-11 06:42:48,758 INFO  
org.apache.flink.kubernetes.highavailability.KubernetesMultipleComponentLeaderElectionHaServices
 [] - Finished cleaning up the high availability data for job 
.
2022-05-11 06:42:50,321 IN

Re: Flink job restarted from empty state when execution.shutdown-on-application-finish is false

2022-05-11 Thread Gyula Fóra
I have opened a JIRA: https://issues.apache.org/jira/browse/FLINK-27569
with more details, logs and screenshots.

Gyula

On Wed, May 11, 2022 at 7:03 AM Gyula Fóra  wrote:

> Hi Devs!
>
> I ran into a concerning situation and would like to hear your thoughts on
> this.
>
> I am running Flink 1.15 on Kubernetes native mode (using the operator but
> that is besides the point here) with Flink Kubernetes HA enabled.
>
> We have set
> *execution.shutdown-on-application-finish = false*
>
> I noticed that if after the job failed/finished, if I kill the jobmanager
> pod (triggering a jobmanager failover), the job would be resubmitted from a
> completely empty state (as if starting for the first time).
>
> Has anyone encountered this issue? This makes using this config option
> pretty risky.
>
> Thank you
> Gyula
>


Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-05-11 Thread Becket Qin
Hi Sebastian,

Thanks for the reply and patient discussion. I agree this is a tricky
decision.


> Nevertheless, Piotr has valid concerns about Option c) which I see as
> follows:
> (1) An interface with default NOOP implementation makes the implementation
> optional. And in my opinion, a default implementation is and will remain a
> way of making implementation optional because even in future a developer
> can decide to implement the "old flavor" without support for pausable
> splits.
> (2) It may not be too critical but I also find it suboptimal that with a
> NOOP default implementation there is no way to check at runtime if
> SourceReader or SplitReader actually support pausing. (To do so, one would
> need a supportsX method which makes it again more complicated.)\


Based on the last few messages in the mailing list.  Piotr and I agreed
that the default implementation should just throw an
UnsupportedOperationException if the source is unpausable. So this
basically tells the Source developers that this feature is expected to be
supported. Because we cannot prevent end users from putting an unpausable
source into the watermark alignment group, that basically means watermark
alignment is an non-optional feature to the end users. So making that
expectation aligned with the source developers seems reasonable.  And if a
source does not support this feature, the end users should explicitly
remove that source from the watermark alignment group.

Personally speaking I think this is a simple and clean solution from both
the end user and source developers' standpoint.

Does this address your concerns?

Thanks,

Jiangjie (Becket) Qin

On Wed, May 11, 2022 at 2:52 PM Sebastian Mattheis 
wrote:

> Hi Piotr, Hi Becket, Hi everybody,
>
> we, Dawid and I, discussed the various suggestions/options and we would be
> okay either way because we find neither solution is perfect just because of
> the already present complexity.
>
> Option c) Adding methods to the interfaces of SourceReader and SplitReader
> Option a) Adding decorative interfaces to be used by SourceReader and
> SplitReader
>
> As of the current status (v. 12) of the FLIP [1], it is based on Option c)
> which we find acceptable because the complexity added is only a single
> method.
>
> Nevertheless, Piotr has valid concerns about Option c) which I see as
> follows:
> (1) An interface with default NOOP implementation makes the implementation
> optional. And in my opinion, a default implementation is and will remain a
> way of making implementation optional because even in future a developer
> can decide to implement the "old flavor" without support for pausable
> splits.
> (2) It may not be too critical but I also find it suboptimal that with a
> NOOP default implementation there is no way to check at runtime if
> SourceReader or SplitReader actually support pausing. (To do so, one would
> need a supportsX method which makes it again more complicated.)
>
> However, we haven't changed it because Option a) is also not optimal or
> straight-forward:
> (1) We need to add two distinct yet similar decorative interfaces since,
> as mentioned, the signatures of the methods are different. For example, we
> would need decorative interfaces like `SplitReaderWithPausableSplits` and
> `SourceReaderWithPausableSplits`.
> (2) As a consequence, we would need to somehow document how/where to
> implement both interfaces and how this relates to each other. This we could
> solve by adding a note in the interface of SourceReader and SplitReader and
> reference to the decorative interfaces but it still increases complexity
> too.
>
> In summary, we see both as acceptable and preferred over other options.
> The question is if we can find a solution or compromise that is acceptable
> for everybody to reach consensus.
>
> Please let us know what you think because we would be happy if we can
> conclude the discussion to avoid dropping the initiative on this FLIP.
>
> Regards,
> Sebastian
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=199540438
> (v. 12)
>
> On Thu, May 5, 2022 at 10:13 AM Piotr Nowojski 
> wrote:
>
>> Hi Guowei,
>>
>> as Dawid wrote a couple of messages back:
>>
>> > This is covered in the previous FLIP[1] which has been already
>> implemented in 1.15. In short, it must be enabled with the watermark
>> strategy which also configures drift and update interval
>>
>> So by default watermark alignment is disabled, regardless if a source
>> supports it or not.
>>
>> Best,
>> Piotrek
>>
>> czw., 5 maj 2022 o 09:56 Guowei Ma  napisał(a):
>>
>>> Hi,
>>>
>>> We know that in the case of Bounded input Flink supports the Batch
>>> execution mode. Currently in Batch execution mode, flink is executed on a
>>> stage-by-stage basis. In this way, perhaps watermark alignment might not
>>> gain much.
>>>
>>> So my question is: Is watermark alignment the default behavior(for
>>> implemented source only)? If so, have you considered evaluating the
>>> impact
>>>

Re: [ANNOUNCE] Apache Flink Table Store 0.1.0 released

2022-05-11 Thread Becket Qin
Really excited to see the very first release of the flink-table-store!

Kudos to everyone who helped with this effort!

Cheers,

Jiangjie (Becket) Qin


On Wed, May 11, 2022 at 1:55 PM Jingsong Lee 
wrote:

> The Apache Flink community is very happy to announce the release of Apache
> Flink Table Store 0.1.0.
>
> Apache Flink Table Store provides storage for building dynamic tables for
> both stream and batch processing in Flink, supporting high speed data
> ingestion and timely data query.
>
> Please check out the release blog post for an overview of the release:
> https://flink.apache.org/news/2022/05/11/release-table-store-0.1.0.html
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Maven artifacts for Flink Table Store can be found at:
> https://search.maven.org/search?q=g:org.apache.flink%20table-store
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12351234
>
> We would like to thank all contributors of the Apache Flink community who
> made this release possible!
>
> Best,
> Jingsong Lee
>


Re: Flink job restarted from empty state when execution.shutdown-on-application-finish is enabled

2022-05-11 Thread Yang Wang
I assume this is the responsibility of job result store[1]. However, it
seems that it does not work as expected.

[1].
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=195726435

Best,
Yang

Gyula Fóra  于2022年5月11日周三 12:55写道:

> Sorry I messed up the email, I meant false .
>
> So when we set it to not shut down … :)
>
> Gyula
>
> On Wed, 11 May 2022 at 05:06, Yun Tang  wrote:
>
> > Hi Gyula,
> >
> > Why are you sure that the configuration of
> > execution.shutdown-on-application-finish leading to this error? I noticed
> > that the default value of this configuration is just "true".
> >
> > From my understanding, the completed checkpoint store should only clear
> > its persisted checkpoint information on shutdown when the job status is
> > globally terminated.
> > Did you ever check the configmap, which used to store the completed
> > checkpoint store, that its content has been empty after you just trigger
> a
> > job manager failure?
> >
> > Best
> > Yun Tang
> >
> > 
> > From: Gyula F?ra 
> > Sent: Wednesday, May 11, 2022 3:41
> > To: dev 
> > Subject: Flink job restarted from empty state when
> > execution.shutdown-on-application-finish is enabled
> >
> > Hi Devs!
> >
> > I ran into a concerning situation and would like to hear your thoughts on
> > this.
> >
> > I am running Flink 1.15 on Kubernetes native mode (using the operator but
> > that is besides the point here) with Flink Kubernetes HA enabled.
> >
> > We have enabled
> > *execution.shutdown-on-application-finish = true*
> >
> > I noticed that if after the job failed/finished, if I kill the jobmanager
> > pod (triggering a jobmanager failover), the job would be resubmitted
> from a
> > completely empty state (as if starting for the first time).
> >
> > Has anyone encountered this issue? This makes using this config option
> > pretty risky.
> >
> > Thank you!
> > Gyula
> >
>


Re: Re: [ANNOUNCE] New Flink PMC member: Yang Wang

2022-05-11 Thread Yang Wang
Thanks for your warm welcome. It is my pleasure to work in such a nice
community.



Best,

Yang

Thomas Weise  于2022年5月11日周三 00:10写道:

> Congratulations, Yang!
>
> On Tue, May 10, 2022 at 3:15 AM Márton Balassi 
> wrote:
> >
> > Congrats, Yang. Well deserved :-)
> >
> > On Tue, May 10, 2022 at 9:16 AM Terry Wang  wrote:
> >
> > > Congrats Yang!
> > >
> > > On Mon, May 9, 2022 at 11:19 AM LuNing Wang 
> wrote:
> > >
> > > > Congrats Yang!
> > > >
> > > > Best,
> > > > LuNing Wang
> > > >
> > > > Dian Fu  于2022年5月7日周六 17:21写道:
> > > >
> > > > > Congrats Yang!
> > > > >
> > > > > Regards,
> > > > > Dian
> > > > >
> > > > > On Sat, May 7, 2022 at 12:51 PM Jacky Lau 
> > > wrote:
> > > > >
> > > > > > Congrats Yang and well Deserved!
> > > > > >
> > > > > > Best,
> > > > > > Jacky Lau
> > > > > >
> > > > > > Yun Gao  于2022年5月7日周六 10:44写道:
> > > > > >
> > > > > > > Congratulations Yang!
> > > > > > >
> > > > > > > Best,
> > > > > > > Yun Gao
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >  --Original Mail --
> > > > > > > Sender:David Morávek 
> > > > > > > Send Date:Sat May 7 01:05:41 2022
> > > > > > > Recipients:Dev 
> > > > > > > Subject:Re: [ANNOUNCE] New Flink PMC member: Yang Wang
> > > > > > > Nice! Congrats Yang, well deserved! ;)
> > > > > > >
> > > > > > > On Fri 6. 5. 2022 at 17:53, Peter Huang <
> > > huangzhenqiu0...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Congrats, Yang!
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > Best Regards
> > > > > > > > Peter Huang
> > > > > > > >
> > > > > > > > On Fri, May 6, 2022 at 8:46 AM Yu Li 
> wrote:
> > > > > > > >
> > > > > > > > > Congrats and welcome, Yang!
> > > > > > > > >
> > > > > > > > > Best Regards,
> > > > > > > > > Yu
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Fri, 6 May 2022 at 14:48, Paul Lam <
> paullin3...@gmail.com>
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Congrats, Yang! Well Deserved!
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Paul Lam
> > > > > > > > > >
> > > > > > > > > > > 2022年5月6日 14:38,Yun Tang  写道:
> > > > > > > > > > >
> > > > > > > > > > > Congratulations, Yang!
> > > > > > > > > > >
> > > > > > > > > > > Best
> > > > > > > > > > > Yun Tang
> > > > > > > > > > > 
> > > > > > > > > > > From: Jing Ge 
> > > > > > > > > > > Sent: Friday, May 6, 2022 14:24
> > > > > > > > > > > To: dev 
> > > > > > > > > > > Subject: Re: [ANNOUNCE] New Flink PMC member: Yang Wang
> > > > > > > > > > >
> > > > > > > > > > > Congrats Yang and well Deserved!
> > > > > > > > > > >
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Jing
> > > > > > > > > > >
> > > > > > > > > > > On Fri, May 6, 2022 at 7:38 AM Lincoln Lee <
> > > > > > lincoln.8...@gmail.com
> > > > > > > >
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> Congratulations Yang!
> > > > > > > > > > >>
> > > > > > > > > > >> Best,
> > > > > > > > > > >> Lincoln Lee
> > > > > > > > > > >>
> > > > > > > > > > >>
> > > > > > > > > > >> Őrhidi Mátyás  于2022年5月6日周五
> > > > 12:46写道:
> > > > > > > > > > >>
> > > > > > > > > > >>> Congrats Yang! Well deserved!
> > > > > > > > > > >>> Best,
> > > > > > > > > > >>> Matyas
> > > > > > > > > > >>>
> > > > > > > > > > >>> On Fri, May 6, 2022 at 5:30 AM huweihua <
> > > > > > huweihua@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > > >>>
> > > > > > > > > >  Congratulations Yang!
> > > > > > > > > > 
> > > > > > > > > >  Best,
> > > > > > > > > >  Weihua
> > > > > > > > > > 
> > > > > > > > > > 
> > > > > > > > > > >>>
> > > > > > > > > > >>
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best Regards,
> > > Terry Wang
> > >
>


[jira] [Created] (FLINK-27570) A checkpoint path error does not cause the job to stop

2022-05-11 Thread Underwood (Jira)
Underwood created FLINK-27570:
-

 Summary: A checkpoint path error does not cause the job to stop
 Key: FLINK-27570
 URL: https://issues.apache.org/jira/browse/FLINK-27570
 Project: Flink
  Issue Type: Bug
  Components: API / Core
Affects Versions: 1.14.4
 Environment: !image-2022-05-11-16-11-11-351.png!

!image-2022-05-11-16-11-15-009.png!

!image-2022-05-11-16-11-29-502.png!
Reporter: Underwood
 Attachments: image-2022-05-11-16-10-03-527.png, 
image-2022-05-11-16-10-19-849.png, image-2022-05-11-16-10-35-413.png, 
image-2022-05-11-16-12-11-818.png, image-2022-05-11-16-12-22-157.png

I configured the wrong checkpoint path when starting the job, and set:

 
{code:java}
conf.set (executioncheckpointingoptions. Tolerable_failure_number, 0);
env setRestartStrategy(RestartStrategies.noRestart());
{code}
 

The job is expected to stop due to a checkpoint error, but the job is still 
running.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27571) Recognize "less is better" benchmarks in regression detection script

2022-05-11 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-27571:
-

 Summary: Recognize "less is better" benchmarks in regression 
detection script
 Key: FLINK-27571
 URL: https://issues.apache.org/jira/browse/FLINK-27571
 Project: Flink
  Issue Type: Bug
  Components: Benchmarks
Affects Versions: 1.16.0
Reporter: Roman Khachatryan
 Attachments: Screenshot_2022-05-09_10-33-11.png

http://codespeed.dak8s.net:8000/timeline/#/?exe=5&ben=schedulingDownstreamTasks.BATCH&extr=on&quarts=on&equid=off&env=2&revs=200



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27572) Verify HA Metadata present before performint last-state restore

2022-05-11 Thread Gyula Fora (Jira)
Gyula Fora created FLINK-27572:
--

 Summary: Verify HA Metadata present before performint last-state 
restore
 Key: FLINK-27572
 URL: https://issues.apache.org/jira/browse/FLINK-27572
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Gyula Fora
 Fix For: kubernetes-operator-1.0.0


When we restore a job using the last-state logic we need to verify that the HA 
metadata has not been deleted. And if it's not there we need to simply throw an 
error because this requires manual user intervention.

This only applies when the FlinkDeployment is not already in a suspended state 
with recorded last state information.

The problem be reproduced easily in 1.14 by triggering a fatal job error. (turn 
of restart-strategy and kill TM for example). In these cases HA metadata will 
be removed, and the next last-state upgrade should throw an error instead of 
restoring from a completely empty state. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS ] HybridSouce Table & Sql api timeline

2022-05-11 Thread Ran Tao
Thanks, Thomas.
I noticed this issue and asked Jiang for details.
If he doesn't finish this work yet, I will draft the initial FLIP,
otherwise we can wait for Jiang's implementation.



Thomas Weise  于2022年5月11日周三 02:29写道:

> fyi there is a related ticket here:
> https://issues.apache.org/jira/browse/FLINK-22793
>
> On Mon, May 9, 2022 at 11:34 PM Becket Qin  wrote:
> >
> > Cool, I've granted you the permission.
> >
> > Cheers,
> >
> > Jiangjie (Becket) Qin
> >
> > On Tue, May 10, 2022 at 1:14 PM Ran Tao  wrote:
> >
> > > Hi, Becket. Thanks for your suggestions. My id is: Ran Tao
> > > And i will draft this flip in a few days. thanks~
> > >
> > > Becket Qin  于2022年5月10日周二 12:40写道:
> > >
> > > > Hi Ran,
> > > >
> > > > The FLIP process can be found here[1].
> > > >
> > > > You don't need to pass the vote, in fact the vote is based on the
> FLIP
> > > > wiki. So drafting the FLIP wiki would be the first step. After that
> you
> > > may
> > > > start a discussion thread in the mailing list so people can have the
> > > > discussion about the feature based on your FLIP wiki. Note that it is
> > > very
> > > > important to follow the structure of the FLIP template so the
> discussion
> > > > would be more efficient.
> > > >
> > > > In case you don't have the permission to add a FLIP page yet, please
> let
> > > me
> > > > know your Apache Confluence ID so I can grant you the permission.
> > > >
> > > > Thanks,
> > > >
> > > > Jiangjie (Becket) Qin
> > > >
> > > > [1]
> > > >
> > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
> > > >
> > > > On Tue, May 10, 2022 at 12:06 PM Ran Tao 
> wrote:
> > > >
> > > > > Hi, Martijn, Jacky. Thanks for your responding. It indeed need a
> > > designed
> > > > > doc or FLIP to illustrate some details and concrete implementation.
> > > > >
> > > > > And i'm glad to work on this issue. I wonder whether i can create a
> > > FLIP
> > > > > under discussion firstly
> > > > > to write the draft design of the implementation about table & sql
> api
> > > and
> > > > > some details, or we must pass voting?
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best,
> > > Ran Tao
> > >
>


-- 
Best,
Ran Tao


Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-11 Thread Konstantin Knauf
I don't think we can maintain two additional channels. Some people have
already concerns about covering one additional channel.

I think, a forum provides a better user experience than a mailing list.
Information is structured better, you can edit messages, sign up and search
is easier.

To make some progress, maybe we decide on chat vs forum vs none and then go
into a deeper discussion on the implementation or is there anything about
Slack that would be complete blocker for the implementation?



Am Mi., 11. Mai 2022 um 07:35 Uhr schrieb Xintong Song <
tonysong...@gmail.com>:

> I agree with Robert on reworking the "Community" and "Getting Help" pages
> to emphasize how we position the mailing lists and Slack, and on revisiting
> in 6-12 months.
>
> Concerning dedicated Apache Flink Slack vs. ASF Slack, I'm with
> Konstantin. I'd expect it to be easier for having more channels and keeping
> them organized, managing permissions for different roles, adding bots, etc.
>
> IMO, having Slack is about improving the communication efficiency when you
> are already in a discussion, and we expect such improvement would motivate
> users to interact more with each other. From that perspective, forums are
> not much better than mailing lists.
>
> I'm also open to forums as well, but not as an alternative to Slack. I
> definitely see how forums help in keeping information organized and easy to
> find. However, I'm a bit concerned about the maintenance overhead. I'm not
> very familiar with Discourse or Reddit. My impression is that they are not
> as easy to set up and maintain as Slack.
>
> Thank you~
>
> Xintong Song
>
>
> [1] https://asktug.com/
>
> On Tue, May 10, 2022 at 4:50 PM Konstantin Knauf 
> wrote:
>
>> Thanks for starting this discussion again. I am pretty much with Timo
>> here. Slack or Discourse as an alternative for the user community, and
>> mailing list for the contributing, design discussion, etc. I definitely see
>> the friction of joining a mailing list and understand if users are
>> intimidated.
>>
>> I am leaning towards a forum aka Discourse over a chat aka Slack. This is
>> about asking for help, finding information and thoughtful discussion more
>> so than casual chatting, right? For this a forum, where it is easier to
>> find and comment on older threads and topics just makes more sense to me. A
>> well-done Discourse forum is much more inviting and vibrant than a mailing
>> list. Just from a tool perspective, discourse would have the advantage of
>> being Open Source and so we could probably self-host it on an ASF machine.
>> [1]
>>
>> When it comes to Slack, I definitely see the benefit of a dedicated
>> Apache Flink Slack compared to ASF Slack. For example, we could have more
>> channels (e.g. look how many channels Airflow is using
>> http://apache-airflow.slack-archives.org) and we could generally
>> customize the experience more towards Apache Flink.  If we go for Slack,
>> let's definitely try to archive it like Airflow did. If we do this, we
>> don't necessarily need infinite message retention in Slack itself.
>>
>> Cheers,
>>
>> Konstantin
>>
>> [1] https://github.com/discourse/discourse
>>
>>
>> Am Di., 10. Mai 2022 um 10:20 Uhr schrieb Timo Walther <
>> twal...@apache.org>:
>>
>>> I also think that a real-time channel is long overdue. The Flink
>>> community in China has shown that such a platform can be useful for
>>> improving the collaboration within the community. The DingTalk channel of
>>> 10k+ users collectively helping each other is great to see. It could also
>>> reduce the burden from committers for answering frequently asked questions.
>>>
>>> Personally, I'm a mailing list fan esp. when it comes to design
>>> discussions. In my opinion, the dev@ mailing list should definitely
>>> stay where and how it is. However, I understand that users might not want
>>> to subscribe to a mailing list for a single question and get their mailbox
>>> filled with unrelated discussions afterwards. Esp. in a company setting it
>>> might not be easy to setup a dedicated email address for mailing lists and
>>> setting up rules is also not convenient.
>>>
>>> It would be great if we could use the ASF Slack. We should find an
>>> official, accessible channel. I would be open for the right tool. It might
>>> make sense to also look into Discourse or even Reddit? The latter would
>>> definitely be easier to index by a search engine. Discourse is actually
>>> made for modern real-time forums.
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> Am 10.05.22 um 09:59 schrieb David Anderson:
>>>
>>> Thank you @Xintong Song  for sharing the
>>> experience of the Flink China community.
>>>
>>> I'm become convinced we should give Slack a try, both for discussions
>>> among the core developers, and as a place where the community can reach out
>>> for help. I am in favor of using the ASF slack, as we will need a paid
>>> instance for this to go well, and joining it is easy enough (took me about
>>> 2 minutes). Thanks, 

[jira] [Created] (FLINK-27573) Configuring a new random job result store directory

2022-05-11 Thread Matyas Orhidi (Jira)
Matyas Orhidi created FLINK-27573:
-

 Summary: Configuring a new random job result store directory
 Key: FLINK-27573
 URL: https://issues.apache.org/jira/browse/FLINK-27573
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Matyas Orhidi


Create a random job result store directory to work around:

https://issues.apache.org/jira/browse/FLINK-27569



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27574) [QUESTION] In Flink k8s Application mode with HA can not using History Server for history backend

2022-05-11 Thread tanjialiang (Jira)
tanjialiang created FLINK-27574:
---

 Summary: [QUESTION] In Flink k8s Application mode with HA can not 
using History Server for history backend
 Key: FLINK-27574
 URL: https://issues.apache.org/jira/browse/FLINK-27574
 Project: Flink
  Issue Type: Technical Debt
Reporter: tanjialiang


In Flink k8s application mode with high-availability, it's job id always 
00, but in history server, it make job's id for the key. Can I modify 
the job id in HA?



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] FLIP-222: Support full query lifecycle statements in SQL client

2022-05-11 Thread Paul Lam
Hi Jark,

Thanks a lot for your opinions and suggestions! Please see my replies inline. 

> 1) the display of savepoint_path


Agreed. Adding it to the FLIP.

> 2) Please make a decision on multiple options in the FLIP.

Okay. I’ll keep one and move the other to the rejected alternatives section.

> 4) +1 SHOW QUERIES
> Btw, the displayed column "address" is a little confusing to me. 
> At the first glance, I'm not sure what address it is, JM RPC address? JM REST 
> address? Gateway address?
> If this is a link to the job's web UI URL, how about calling it "web_url" and 
> display in 
> "http://:" format?
> Besides, how about displaying "startTime" or "uptime" as well?

I’m good with these changes. Updating the FLIP according to your suggestions.

> 5) STOP/CANCEL QUERY vs DROP QUERY
> I'm +1 to DROP, because it's more compliant with SQL standard naming, i.e., 
> "SHOW/CREATE/DROP". 
> Separating STOP and CANCEL confuses users a lot what are the differences 
> between them. 
> I'm +1 to add the "PURGE" keyword to the DROP QUERY statement, which 
> indicates to stop query without savepoint. 
> Note that, PURGE doesn't mean stop with --drain flag. The drain flag will 
> flush all the registered timers 
> and windows which could lead to incorrect results when the job is resumed. I 
> think the drain flag is rarely used 
> (please correct me if I'm wrong), therefore, I suggest moving this feature 
> into future work when the needs are clear. 

I’m +1 to represent ungrateful cancel by PURGE. I think —drain flag is not used 
very often as you said, and we 
could just add a table config option to enable that flag.

> 7)  and  should be quoted
> All the  and  should be string literal, otherwise 
> it's hard to parse them.
> For example, STOP QUERY '’.

Good point! Adding it to the FLIP.

> 8) Examples
> Could you add an example that consists of all the statements to show how to 
> manage the full lifecycle of queries? 
> Including show queries, create savepoint, remove savepoint, stop query with a 
> savepoint, and restart query with savepoint. 

Agreed. Adding it to the FLIP as well.

Best,
Paul Lam

> 2022年5月7日 18:22,Jark Wu  写道:
> 
> Hi Paul, 
> 
> I think this FLIP has already in a good shape. I just left some additional 
> thoughts: 
> 
> 1) the display of savepoint_path
> Could the displayed savepoint_path include the scheme part? 
> E.g. `hdfs:///flink-savepoints/savepoint-cca7bc-bb1e257f0dab`
> IIUC, the scheme part is omitted when it's a local filesystem. 
> But the behavior would be clearer if including the scheme part in the design 
> doc. 
> 
> 2) Please make a decision on multiple options in the FLIP.
> It might give the impression that we will support all the options. 
> 
> 3) +1 SAVEPOINT and RELEASE SAVEPOINT
> Personally, I also prefer "SAVEPOINT " and "RELEASE SAVEPOINT 
> " 
> to "CREATE/DROP SAVEPOINT", as they have been used in mature databases.
> 
> 4) +1 SHOW QUERIES
> Btw, the displayed column "address" is a little confusing to me. 
> At the first glance, I'm not sure what address it is, JM RPC address? JM REST 
> address? Gateway address?
> If this is a link to the job's web UI URL, how about calling it "web_url" and 
> display in 
> "http://:" format?
> Besides, how about displaying "startTime" or "uptime" as well?
> 
> 5) STOP/CANCEL QUERY vs DROP QUERY
> I'm +1 to DROP, because it's more compliant with SQL standard naming, i.e., 
> "SHOW/CREATE/DROP". 
> Separating STOP and CANCEL confuses users a lot what are the differences 
> between them. 
> I'm +1 to add the "PURGE" keyword to the DROP QUERY statement, which 
> indicates to stop query without savepoint. 
> Note that, PURGE doesn't mean stop with --drain flag. The drain flag will 
> flush all the registered timers 
> and windows which could lead to incorrect results when the job is resumed. I 
> think the drain flag is rarely used 
> (please correct me if I'm wrong), therefore, I suggest moving this feature 
> into future work when the needs are clear. 
> 
> 6) Table API
> I think it makes sense to support the new statements in Table API. 
> We should try to make the Gateway and CLI simple which just forward statement 
> to the underlying TableEnvironemnt. 
> JAR statements are being re-implemented in Table API as well, see FLIP-214[1].
> 
> 7)  and  should be quoted
> All the  and  should be string literal, otherwise 
> it's hard to parse them.
> For example, STOP QUERY ''.
> 
> 8) Examples
> Could you add an example that consists of all the statements to show how to 
> manage the full lifecycle of queries? 
> Including show queries, create savepoint, remove savepoint, stop query with a 
> savepoint, and restart query with savepoint. 
> 
> Best,
> Jark
> 
> [1]: 
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-214+Support+Advanced+Function+DDL?src=contextnavpagetreemode
>  
> 
> 
> 
> On Fri, 6 May 2022 at 1

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-05-11 Thread Becket Qin
+dev

Hi Sebastian,

Thank you for the summary. Please see the detailed replies inline. As a
recap of my suggestions.

1. Pausable splits API.
  a) Add default implementations to methods "pauseOrResumeSplits" in both
SourceReader and SplitReader where both default implementations throw
 UnsupportedOperationException.

2. User story.
a) We tell users to enable the watermark alignment as they like. This
is exactly what the current Flink API is.
b) We tell the source developers, please implement pausable splits,
otherwise bad things may happen. Think of it like you are expected to
implement SourceReader#snapshotState() properly, otherwise exceptions will
be thrown when users enable checkpointing.

Thanks,

Jiangjie (Becket) Qin

On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis 
wrote:

> Hi Becket, Hi everybody,
>
> I'm sorry if I misread the messages but I could not derive an agreement
> from the mailing list. Nevertheless, if I understand you right the
> suggestion is:
>
> * Add default implementations to methods "pauseOrResumeSplits" in both
> SourceReader and SplitReader where both default implementations throw
> UnsupportedOperationException.
>
Yes.

* Add "supportsPauseOrResumeSplits" to the Source interface. (In the
> following, I refer to supporting this as "pausable splits".)
>
We may no longer need this if pausable splits are expected to be
implemented by the source developers, i.e. non-optional. Having this method
would then be somewhat misleading as it looks like the sources that do not
support pausable splits are also acceptable in the long term. So API wise,
I'd say maybe we should remove this for this FLIP, although I believe this
supportXXX pattern itself is still attractive for optional features.


>
> To make the conclusions explicit:
>
> 1. The implementation of pauseOrResumeSplits in both interfaces
> SourceReader and SplitReader are optional where the default is that it
> doesn't support it. (--> This means that the implementation is still
> optional for the source developer.)
>
It is optional for backwards compatibility with existing sources, as they
may still compile without code change. But starting from this FLIP, Flink
will always optimistically assume that all the sources support pausable
splits. If a source does not support pausable splits, it goes to an error
handling path when watermark alignment is enabled on it. This is different
from a usual optional feature, where no error is expected.


> 2. If watermark alignment is enabled in the application code by adding
> withWatermarkAlignment to the WatermarkStrategy while SourceReader or
> SplitReader do not support pausableSplits, we throw an
> UnsupportedOperationException.
>
Yes.


> 3. With regard to your statement:
>
>> [...] basically means watermark alignment is an non-optional feature to
>> the end users.
>
> You actually mean that "pausable splits" are non-optional for the app
> developer if watermark alignment is enabled. However, watermark alignment
> is optional and can be enabled/disabled.
>
Yes, watermark alignment can be enabled/disabled in individual sources in
Flink jobs, which basically means the code supporting watermark alignment
has to already be there. That again means the Source developers are also
expected to support pausable splits by default. So this way we essentially
tell the end users that you may enable / disable this feature as you wish,
and tell the source developers that you SHOULD implement this because the
end users may turn it on/off at will. And if the source does not support
pausable splits, that goes to an error handling path when watermark
alignment is enabled on it. So users know they have to explicitly exclude
this source.


>
> So far it's totally clear to me and I hope this is what you mean. I also
> agree with both statements:
>
> So making that expectation aligned with the source developers seems
>> reasonable.
>>
>
> I think this is a simple and clean solution from both the end user and
>> source developers' standpoint.
>>
>
> However, a last conclusion derives from 3. and is an open question for me:
>
> 4. The feature of "pausable splits" is now tightly bound to watermark
> alignment, i.e., if sources do not support "pausable splits" one can not
> enable watermark alignment for these sources. This dependency is not the
> current status of watermark alignment implementation because it is/was
> implemented without pausable splits. Do we want to introduce this
> dependency? (This is an open question. I cannot judge that.)
>
The watermark alignment basically relies on the pausable splits, right? So
personally I found it quite reasonable that if the source does not support
pausable splits, end users cannot enable watermark alignment on it.


> If something is wrong, please correct me.
>
> Regards,
> Sebastian
>
> On Wed, May 11, 2022 at 9:05 AM Becket Qin  wrote:
>
>> Hi Sebastian,
>>
>> Thanks for the reply and patient discussion. I agree this is a tricky
>> decision.
>>
>>
>>> Ne

[jira] [Created] (FLINK-27575) Support PyFlink in Python3.9

2022-05-11 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-27575:


 Summary: Support PyFlink in Python3.9
 Key: FLINK-27575
 URL: https://issues.apache.org/jira/browse/FLINK-27575
 Project: Flink
  Issue Type: New Feature
  Components: API / Python
Affects Versions: 1.16.0
Reporter: Huang Xingbo
 Fix For: 1.16.0


Currently, PyFlink only supports Python 3.6,3.7 & 3.8. We need to support 
Python3.9 in release-1.16



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[QUESTION] In Flink k8s Application mode with HA can not using History Server for history backend

2022-05-11 Thread 谭家良
In Flink k8s application mode with high-availability, it's job id always 
00, but in history server, it make job's id for the key. Can I modify 
the job id in HA? Maybe the History Server must keep the ClusterID and the 
JobID for it's key?




Best,

tanjialiang.

| |
tanjialiang
|
|
tanjl_w...@126.com
|

[jira] [Created] (FLINK-27576) Flink will request new pod when jm pod is delete, but will remove when TaskExecutor exceeded the idle timeout

2022-05-11 Thread zhisheng (Jira)
zhisheng created FLINK-27576:


 Summary: Flink will request new pod when jm pod is delete, but 
will remove when TaskExecutor exceeded the idle timeout 
 Key: FLINK-27576
 URL: https://issues.apache.org/jira/browse/FLINK-27576
 Project: Flink
  Issue Type: Bug
  Components: Deployment / Kubernetes
Affects Versions: 1.12.0
Reporter: zhisheng
 Attachments: image-2022-05-11-20-06-58-955.png, 
image-2022-05-11-20-08-01-739.png, jobmanager_log.txt

flink 1.12.0 enable the ha(zk) and checkpoint, when i use kubectl delete the jm 
pod, the job will  request new jm pod failover from the last checkpoint , it is 
ok.  But it will request new tm pod again, but not use actually, the new tm pod 
will closed when TaskExecutor exceeded the idle timeout . actually it will use 
the old tm, why need to request for new tm pod? whether the job will fail if 
the cluster has no resource for the new tm?Can we optimize and reuse the old tm 
directly?

 

[^jobmanager_log.txt]

^!image-2022-05-11-20-06-58-955.png!^

^!image-2022-05-11-20-08-01-739.png!^



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-05-11 Thread Sebastian Mattheis
Hi Becket,

Thanks a lot for your fast and detailed response. For me, it converges and
dropping the supportsX method sounds very reasonable to me. (Side note:
With "pausable splits" enabled as "default" I think we misunderstood. As
you described now "default" I understand as that it should be the new
recommended way of implementation, and I think that is fully valid. Before,
I understood "default" here as the default implementation, i.e., throwing
UnsupportedOperationException, which is the exact opposite. :) )

Nevertheless: As mentioned, an open question for me is if watermark
alignment should enforce pausable splits. For clarification, the current
documentation [1] says:

*Note:* As of 1.15, Flink supports aligning across tasks of the same source
> and/or different sources. It does not support aligning
> splits/partitions/shards in the same task.
>
> In a case where there are e.g. two Kafka partitions that produce
> watermarks at different pace, that get assigned to the same task watermark
> might not behave as expected. Fortunately, worst case it should not perform
> worse than without alignment.
>
> Given the limitation above, we suggest applying watermark alignment in two
> situations:
>
>1. You have two different sources (e.g. Kafka and File) that produce
>watermarks at different speeds
>2. You run your source with parallelism equal to the number of
>splits/shards/partitions, which results in every subtask being assigned a
>single unit of work.
>
> I personally see no issue in implementing and I see no reason against
implementing this dependency of watermark alignment and pausable splits. (I
think this would even be a good path towards shaping watermark alignment in
1.16.) However, "I don't see" means that I would be happy to hear Dawid's
and Piotrek's opinions as they implemented watermark alignment based on
FLIP-182 [2] and I don't want to miss relevant rationale/background info
from their side.

*@Piotrek* *@Dawid *What do you think?

Regards,
Sebastian

[1]
https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
[2]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode

On Wed, May 11, 2022 at 1:30 PM Becket Qin  wrote:

> +dev
>
> Hi Sebastian,
>
> Thank you for the summary. Please see the detailed replies inline. As a
> recap of my suggestions.
>
> 1. Pausable splits API.
>   a) Add default implementations to methods "pauseOrResumeSplits" in both
> SourceReader and SplitReader where both default implementations throw
>  UnsupportedOperationException.
>
> 2. User story.
> a) We tell users to enable the watermark alignment as they like. This
> is exactly what the current Flink API is.
> b) We tell the source developers, please implement pausable splits,
> otherwise bad things may happen. Think of it like you are expected to
> implement SourceReader#snapshotState() properly, otherwise exceptions will
> be thrown when users enable checkpointing.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
> sebast...@ververica.com> wrote:
>
>> Hi Becket, Hi everybody,
>>
>> I'm sorry if I misread the messages but I could not derive an agreement
>> from the mailing list. Nevertheless, if I understand you right the
>> suggestion is:
>>
>> * Add default implementations to methods "pauseOrResumeSplits" in both
>> SourceReader and SplitReader where both default implementations throw
>> UnsupportedOperationException.
>>
> Yes.
>
> * Add "supportsPauseOrResumeSplits" to the Source interface. (In the
>> following, I refer to supporting this as "pausable splits".)
>>
> We may no longer need this if pausable splits are expected to be
> implemented by the source developers, i.e. non-optional. Having this method
> would then be somewhat misleading as it looks like the sources that do not
> support pausable splits are also acceptable in the long term. So API wise,
> I'd say maybe we should remove this for this FLIP, although I believe this
> supportXXX pattern itself is still attractive for optional features.
>
>
>>
>> To make the conclusions explicit:
>>
>> 1. The implementation of pauseOrResumeSplits in both interfaces
>> SourceReader and SplitReader are optional where the default is that it
>> doesn't support it. (--> This means that the implementation is still
>> optional for the source developer.)
>>
> It is optional for backwards compatibility with existing sources, as they
> may still compile without code change. But starting from this FLIP, Flink
> will always optimistically assume that all the sources support pausable
> splits. If a source does not support pausable splits, it goes to an error
> handling path when watermark alignment is enabled on it. This is different
> from a usual optional feature, where no error is expected.
>
>
>> 2. If watermark alignment is enabled in the

Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-05-11 Thread Piotr Nowojski
Hi,

Actually previously I thought about having a decorative interface and
whenever watermark alignment is enabled, checking that the source
implements the decorative interface. If not, throwing an exception.

The option with default methods in the source interfaces throwing
`UnsupportedOperationException` I think still suffers from the same
problems I mentioned before. It's still an optional implementation and at
the same time it's clogging the base interface. I think I would still vote
soft -1 on this option, but I wouldn't block it in case I am out-voted.

Best,
Piotrek

śr., 11 maj 2022 o 14:22 Sebastian Mattheis 
napisał(a):

> Hi Becket,
>
> Thanks a lot for your fast and detailed response. For me, it converges and
> dropping the supportsX method sounds very reasonable to me. (Side note:
> With "pausable splits" enabled as "default" I think we misunderstood. As
> you described now "default" I understand as that it should be the new
> recommended way of implementation, and I think that is fully valid. Before,
> I understood "default" here as the default implementation, i.e., throwing
> UnsupportedOperationException, which is the exact opposite. :) )
>
> Nevertheless: As mentioned, an open question for me is if watermark
> alignment should enforce pausable splits. For clarification, the current
> documentation [1] says:
>
> *Note:* As of 1.15, Flink supports aligning across tasks of the same
>> source and/or different sources. It does not support aligning
>> splits/partitions/shards in the same task.
>>
>> In a case where there are e.g. two Kafka partitions that produce
>> watermarks at different pace, that get assigned to the same task watermark
>> might not behave as expected. Fortunately, worst case it should not perform
>> worse than without alignment.
>>
>> Given the limitation above, we suggest applying watermark alignment in
>> two situations:
>>
>>1. You have two different sources (e.g. Kafka and File) that produce
>>watermarks at different speeds
>>2. You run your source with parallelism equal to the number of
>>splits/shards/partitions, which results in every subtask being assigned a
>>single unit of work.
>>
>> I personally see no issue in implementing and I see no reason against
> implementing this dependency of watermark alignment and pausable splits. (I
> think this would even be a good path towards shaping watermark alignment in
> 1.16.) However, "I don't see" means that I would be happy to hear Dawid's
> and Piotrek's opinions as they implemented watermark alignment based on
> FLIP-182 [2] and I don't want to miss relevant rationale/background info
> from their side.
>
> *@Piotrek* *@Dawid *What do you think?
>
> Regards,
> Sebastian
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-release-1.15/docs/dev/datastream/event-time/generating_watermarks/#watermark-alignment-_beta_
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources?src=contextnavpagetreemode
>
> On Wed, May 11, 2022 at 1:30 PM Becket Qin  wrote:
>
>> +dev
>>
>> Hi Sebastian,
>>
>> Thank you for the summary. Please see the detailed replies inline. As a
>> recap of my suggestions.
>>
>> 1. Pausable splits API.
>>   a) Add default implementations to methods "pauseOrResumeSplits" in both
>> SourceReader and SplitReader where both default implementations throw
>>  UnsupportedOperationException.
>>
>> 2. User story.
>> a) We tell users to enable the watermark alignment as they like. This
>> is exactly what the current Flink API is.
>> b) We tell the source developers, please implement pausable splits,
>> otherwise bad things may happen. Think of it like you are expected to
>> implement SourceReader#snapshotState() properly, otherwise exceptions will
>> be thrown when users enable checkpointing.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>> On Wed, May 11, 2022 at 4:45 PM Sebastian Mattheis <
>> sebast...@ververica.com> wrote:
>>
>>> Hi Becket, Hi everybody,
>>>
>>> I'm sorry if I misread the messages but I could not derive an agreement
>>> from the mailing list. Nevertheless, if I understand you right the
>>> suggestion is:
>>>
>>> * Add default implementations to methods "pauseOrResumeSplits" in both
>>> SourceReader and SplitReader where both default implementations throw
>>> UnsupportedOperationException.
>>>
>> Yes.
>>
>> * Add "supportsPauseOrResumeSplits" to the Source interface. (In the
>>> following, I refer to supporting this as "pausable splits".)
>>>
>> We may no longer need this if pausable splits are expected to be
>> implemented by the source developers, i.e. non-optional. Having this method
>> would then be somewhat misleading as it looks like the sources that do not
>> support pausable splits are also acceptable in the long term. So API wise,
>> I'd say maybe we should remove this for this FLIP, although I believe this
>> supportXXX pattern itself is still attractive for optional features.
>>
>>
>>>
>>> To make the c

Re: [QUESTION] In Flink k8s Application mode with HA can not using History Server for history backend

2022-05-11 Thread 谭家良
Sorry, I found this is not a the right place to ask some question. I'll ask 
question in the u...@flink.apache.org.


| |
谭家良
|
|
tanjl_w...@126.com
|
On 5/11/2022 19:52,谭家良 wrote:
In Flink k8s application mode with high-availability, it's job id always 
00, but in history server, it make job's id for the key. Can I modify 
the job id in HA? Maybe the History Server must keep the ClusterID and the 
JobID for it's key?




Best,

tanjialiang.

| |
tanjialiang
|
|
tanjl_w...@126.com
|

[DISCUSS] Release first version of Elasticsearch connector from external repository

2022-05-11 Thread Martijn Visser
Hi everyone,

As mentioned in the previous update [1] we were working on the final step
for moving out the Elasticsearch connector, which was related to the
documentation integration. This has now been completed: the documentation
you see on
https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/elasticsearch/
is generated via the documentation in the Elasticsearch external
repository.

In our plan we outlined that when we moved things out, we would create a
first release out of this repository [2]. I would like to take that step to
complete this first connector and move on to the next one.

If there's any feedback or questions, do let me know. Else I'll reach out
to some PMCs to help facilitate this release and we'll open a VOTE thread
shortly.

Best regards,

Martijn Visser
https://twitter.com/MartijnVisser82
https://github.com/MartijnVisser

[1] https://lists.apache.org/thread/8k1xonqt7hn0xldbky1cxfx3fzh6sj7h
[2] https://lists.apache.org/thread/vqbjoo94wwqcvo32c80dkmp7r5gmy68r


Re: [DISCUSS] Release first version of Elasticsearch connector from external repository

2022-05-11 Thread Konstantin Knauf
Hi Martijn,

+1 to do a release which is compatible with Flink 1.5.x. With the release,
we should add something like a compatibility matrix to the documentation,
but I am sure that's already on your list.

Cheers,

Konstantin




Am Mi., 11. Mai 2022 um 20:10 Uhr schrieb Martijn Visser <
martijnvis...@apache.org>:

> Hi everyone,
>
> As mentioned in the previous update [1] we were working on the final step
> for moving out the Elasticsearch connector, which was related to the
> documentation integration. This has now been completed: the documentation
> you see on
>
> https://nightlies.apache.org/flink/flink-docs-master/docs/connectors/datastream/elasticsearch/
> is generated via the documentation in the Elasticsearch external
> repository.
>
> In our plan we outlined that when we moved things out, we would create a
> first release out of this repository [2]. I would like to take that step to
> complete this first connector and move on to the next one.
>
> If there's any feedback or questions, do let me know. Else I'll reach out
> to some PMCs to help facilitate this release and we'll open a VOTE thread
> shortly.
>
> Best regards,
>
> Martijn Visser
> https://twitter.com/MartijnVisser82
> https://github.com/MartijnVisser
>
> [1] https://lists.apache.org/thread/8k1xonqt7hn0xldbky1cxfx3fzh6sj7h
> [2] https://lists.apache.org/thread/vqbjoo94wwqcvo32c80dkmp7r5gmy68r
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk


Incompatible data types while using firehose sink

2022-05-11 Thread Zain Haider Nemati
Hi Folks,
Getting this error when sinking data to a firehosesink, would really
appreciate some help !

DataStream inputStream = env.addSource(new
FlinkKafkaConsumer<>("xxx", new SimpleStringSchema(), properties));

Properties sinkProperties = new Properties();

sinkProperties.put(AWSConfigConstants.AWS_REGION, "xxx");

sinkProperties.put(AWSConfigConstants.AWS_ACCESS_KEY_ID, "xxx");

sinkProperties.put(AWSConfigConstants.AWS_SECRET_ACCESS_KEY,
"xxx");

KinesisFirehoseSink kdfSink =
KinesisFirehoseSink.builder()

.setFirehoseClientProperties(sinkProperties)

.setSerializationSchema(new SimpleStringSchema())

.setDeliveryStreamName("xxx")

.setMaxBatchSize(350)

.build();

inputStream.sinkTo(kdfSink);


incompatible types:
org.apache.flink.connector.firehose.sink.KinesisFirehoseSink
cannot be converted to
org.apache.flink.api.connector.sink.Sink


Re: [DISCUSS] FLIP-226: Introduce Schema Evolution on Table Store

2022-05-11 Thread Jingsong Li
Hi all,

If there are no more comments, I'm going to start a vote.

Best,
Jingsong

On Tue, May 10, 2022 at 10:37 AM Jingsong Li  wrote:

> Hi Jark,
>
> Thanks for your feedback.
>
> > 1) Does table-store support evolve schemas multiple times during a
> checkpoint?
>
> In this case this checkpoint is split into multiple commits, e.g.:
> - commit1: write 1 million rows
> - commit1: write 1 million rows
> - commit2: evolve mode 1
> - commit3: write 1 million lines
> 
>
> Some works needs to be done on the connector side.
>
> > 2) Does ADD COLUMN support add a NOT-NULL column?
>
> I tend not to support it at this time.
> The other strategy is to support it, but report errors when reading data
> with the new shcema, which ensures that data can be read with the old
> schema.
>
> > 3) What's the matrix of type evolution? Do you support modifying a column
> to any type?
>
> For type evolution, we currently only support types that are supported by
> implicit conversions. (From Flink LogicalTypeCasts)
> Three modes can be supported in future to allow the user to select
> - Default implicit conversions
> - Allow implicit and explicit conversions
> - Throw exceptions when cast fail.
> - Return null when cast fail.
>
> I have updated FLIP.
>
> Best,
> Jingsong
>
> On Mon, May 9, 2022 at 8:14 PM Jark Wu  wrote:
>
>> Thanks for proposing this exciting feature, Jingsong!
>>
>> I only have a few questions:
>>
>> 1) Does table-store support evolve schemas multiple times during a
>> checkpoint?
>> For example, cp1 -> write 1M rows (may flush file store) -> evolve schema1
>> ->
>> write 1M rows (may flush file store again) -> evolve schema2 -> write 1M
>> rows -> cp2
>>
>> That means the schemas of new data files are different in this snapshot.
>> Besides, it may need to register schemas before the checkpoint is
>> complete.
>>
>> 2) Does ADD COLUMN support add a NOT-NULL column?
>>
>> 3) What's the matrix of type evolution? Do you support modifying a column
>> to any type?
>>
>> Best,
>> Jark
>>
>>
>>
>> On Mon, 9 May 2022 at 16:44, Caizhi Weng  wrote:
>>
>> > Hi all!
>> >
>> > +1 for this FLIP. By adding schema information into data files we can
>> not
>> > only support schema evolution, which is a very useful feature for data
>> > storages, but also make it easier for table store to integrate with
>> other
>> > systems.
>> >
>> > For example timestamp type in Hive does not support precision. With this
>> > extra schema information however we can directly deduce the precision
>> of a
>> > schema column.
>> >
>> > Jingsong Li  于2022年4月29日周五 17:54写道:
>> >
>> > > Hi devs,
>> > >
>> > > I want to start a discussion about Schema Evolution on the Flink Table
>> > > Store. [1]
>> > >
>> > > In FLINK-21634, We plan to support many schema changes in Flink SQL.
>> > > But for the current Table Store, it may result in wrong data, unclear
>> > > evolutions.
>> > >
>> > > In general, the user has these operations for schema:
>> > > - Add column: Adding a column to a table.
>> > > - Modify column type.
>> > > - Drop column: Drop a column.
>> > > - Rename column: For example, rename the "name_1" column to "name_2".
>> > >
>> > > Another schema change is partition keys, the data is changing over
>> > > time, for example, a table with day partition, as the business
>> > > continues to grow, the new partition of the table by day will become
>> > > larger and the business wants to change to hourly partitions.
>> > >
>> > > A simple approach is to rewrite all the existing data when modifying
>> the
>> > > schema.
>> > > But this expensive way is not acceptable to the user, so we need to
>> > > support and define it clearly.
>> > > Modifying the schema does not rewrite the existing data, when reading
>> > > the original data needs to evolve to the current schema.
>> > >
>> > > Look forward to your feedback!
>> > >
>> > > [1]
>> > >
>> >
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-226%3A+Introduce+Schema+Evolution+on+Table+Store
>> > >
>> > > Best,
>> > > Jingsong
>> > >
>> >
>>
>


Re: [DISCUSS] FLIP-217 Support watermark alignment of source splits

2022-05-11 Thread Becket Qin
Thanks for the clarification, Piotr and Sebastian.

It looks like the key problem is still whether the implementation of
pausable splits in the Sources should be optional or not.

I think it might be helpful to agree on the definition of optional in our
case. To me:
Optional = "You CAN leave the method unimplemented, and that is fine."
Non-Optional = "You CAN leave the method unimplemented, but you SHOULD NOT,
because people assume this works."

I think one sufficient condition of a Non-Optional feature is that if the
feature is exposed through the framework API, Flink should expect the
pluggables to support this feature by default. Otherwise the availability
of that feature becomes undefined.

Please note that so far we do not assume whether the feature is in
the original API or it is added later. A newly added feature can also be
non-optional, although it might take some time for all the pluggable
developers to catch up, and they should still work if the new feature is
not used until they catch up. In contrast, we may never expect an optional
feature to catch up, because leaving it unimplemented is also blessed.

Let's take the checkpointing as an example. Imagine Flink did not support
checkpointing before release 1.16. And now we are trying to add
checkpointing to Flink. So we exposed the checkpoint configuration to the
end users. In the meantime, will we tell the pluggable (e.g. operators,
connectors) developers that methods like "snapshotState()" is optional? If
we do that, the availability of checkpointing in Flink would be severely
weakened. But apparently we should still allow the existing implementations
to work without checkpointing. It looks to me that adding the method to the
pluggable interfaces with a default implementation throwing
"UnsupportedOperationException" would be the solution here. Please note
that in this case, having the default implementation does not mean this is
optional. It is just the technique to support backwards compatibility in
the feature evolution. The fact that this method is in the base interface
suggests it is not optional, so the developers SHOULD implement it.

When it comes to this FLIP, I think it meets the criteria of non-optional
features, so we should just use the evolution path of non-optional features.

Thanks,

Jiangjie (Becket) Qin



On Wed, May 11, 2022 at 9:14 PM Piotr Nowojski  wrote:

> Hi,
>
> Actually previously I thought about having a decorative interface and
> whenever watermark alignment is enabled, checking that the source
> implements the decorative interface. If not, throwing an exception.
>
> The option with default methods in the source interfaces throwing
> `UnsupportedOperationException` I think still suffers from the same
> problems I mentioned before. It's still an optional implementation and at
> the same time it's clogging the base interface. I think I would still vote
> soft -1 on this option, but I wouldn't block it in case I am out-voted.
>
> Best,
> Piotrek
>
> śr., 11 maj 2022 o 14:22 Sebastian Mattheis 
> napisał(a):
>
> > Hi Becket,
> >
> > Thanks a lot for your fast and detailed response. For me, it converges
> and
> > dropping the supportsX method sounds very reasonable to me. (Side note:
> > With "pausable splits" enabled as "default" I think we misunderstood. As
> > you described now "default" I understand as that it should be the new
> > recommended way of implementation, and I think that is fully valid.
> Before,
> > I understood "default" here as the default implementation, i.e., throwing
> > UnsupportedOperationException, which is the exact opposite. :) )
> >
> > Nevertheless: As mentioned, an open question for me is if watermark
> > alignment should enforce pausable splits. For clarification, the current
> > documentation [1] says:
> >
> > *Note:* As of 1.15, Flink supports aligning across tasks of the same
> >> source and/or different sources. It does not support aligning
> >> splits/partitions/shards in the same task.
> >>
> >> In a case where there are e.g. two Kafka partitions that produce
> >> watermarks at different pace, that get assigned to the same task
> watermark
> >> might not behave as expected. Fortunately, worst case it should not
> perform
> >> worse than without alignment.
> >>
> >> Given the limitation above, we suggest applying watermark alignment in
> >> two situations:
> >>
> >>1. You have two different sources (e.g. Kafka and File) that produce
> >>watermarks at different speeds
> >>2. You run your source with parallelism equal to the number of
> >>splits/shards/partitions, which results in every subtask being
> assigned a
> >>single unit of work.
> >>
> >> I personally see no issue in implementing and I see no reason against
> > implementing this dependency of watermark alignment and pausable splits.
> (I
> > think this would even be a good path towards shaping watermark alignment
> in
> > 1.16.) However, "I don't see" means that I would be happy to hear Dawid's
> > and Piotre

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-11 Thread Xintong Song
>
> To make some progress, maybe we decide on chat vs forum vs none and then
> go into a deeper discussion on the implementation or is there anything
> about Slack that would be complete blocker for the implementation?
>

Sure, then I'd be +1 for chat. From my side, the initiative is more about
making communication more efficient, rather than making information easier
to find.

Thank you~

Xintong Song



On Wed, May 11, 2022 at 5:39 PM Konstantin Knauf  wrote:

> I don't think we can maintain two additional channels. Some people have
> already concerns about covering one additional channel.
>
> I think, a forum provides a better user experience than a mailing list.
> Information is structured better, you can edit messages, sign up and search
> is easier.
>
> To make some progress, maybe we decide on chat vs forum vs none and then
> go into a deeper discussion on the implementation or is there anything
> about Slack that would be complete blocker for the implementation?
>
>
>
> Am Mi., 11. Mai 2022 um 07:35 Uhr schrieb Xintong Song <
> tonysong...@gmail.com>:
>
>> I agree with Robert on reworking the "Community" and "Getting Help" pages
>> to emphasize how we position the mailing lists and Slack, and on revisiting
>> in 6-12 months.
>>
>> Concerning dedicated Apache Flink Slack vs. ASF Slack, I'm with
>> Konstantin. I'd expect it to be easier for having more channels and keeping
>> them organized, managing permissions for different roles, adding bots, etc.
>>
>> IMO, having Slack is about improving the communication efficiency when
>> you are already in a discussion, and we expect such improvement would
>> motivate users to interact more with each other. From that perspective,
>> forums are not much better than mailing lists.
>>
>> I'm also open to forums as well, but not as an alternative to Slack. I
>> definitely see how forums help in keeping information organized and easy to
>> find. However, I'm a bit concerned about the maintenance overhead. I'm not
>> very familiar with Discourse or Reddit. My impression is that they are not
>> as easy to set up and maintain as Slack.
>>
>> Thank you~
>>
>> Xintong Song
>>
>>
>> [1] https://asktug.com/
>>
>> On Tue, May 10, 2022 at 4:50 PM Konstantin Knauf 
>> wrote:
>>
>>> Thanks for starting this discussion again. I am pretty much with Timo
>>> here. Slack or Discourse as an alternative for the user community, and
>>> mailing list for the contributing, design discussion, etc. I definitely see
>>> the friction of joining a mailing list and understand if users are
>>> intimidated.
>>>
>>> I am leaning towards a forum aka Discourse over a chat aka Slack. This
>>> is about asking for help, finding information and thoughtful discussion
>>> more so than casual chatting, right? For this a forum, where it is easier
>>> to find and comment on older threads and topics just makes more sense to
>>> me. A well-done Discourse forum is much more inviting and vibrant than a
>>> mailing list. Just from a tool perspective, discourse would have the
>>> advantage of being Open Source and so we could probably self-host it on an
>>> ASF machine. [1]
>>>
>>> When it comes to Slack, I definitely see the benefit of a dedicated
>>> Apache Flink Slack compared to ASF Slack. For example, we could have more
>>> channels (e.g. look how many channels Airflow is using
>>> http://apache-airflow.slack-archives.org) and we could generally
>>> customize the experience more towards Apache Flink.  If we go for Slack,
>>> let's definitely try to archive it like Airflow did. If we do this, we
>>> don't necessarily need infinite message retention in Slack itself.
>>>
>>> Cheers,
>>>
>>> Konstantin
>>>
>>> [1] https://github.com/discourse/discourse
>>>
>>>
>>> Am Di., 10. Mai 2022 um 10:20 Uhr schrieb Timo Walther <
>>> twal...@apache.org>:
>>>
 I also think that a real-time channel is long overdue. The Flink
 community in China has shown that such a platform can be useful for
 improving the collaboration within the community. The DingTalk channel of
 10k+ users collectively helping each other is great to see. It could also
 reduce the burden from committers for answering frequently asked questions.

 Personally, I'm a mailing list fan esp. when it comes to design
 discussions. In my opinion, the dev@ mailing list should definitely
 stay where and how it is. However, I understand that users might not want
 to subscribe to a mailing list for a single question and get their mailbox
 filled with unrelated discussions afterwards. Esp. in a company setting it
 might not be easy to setup a dedicated email address for mailing lists and
 setting up rules is also not convenient.

 It would be great if we could use the ASF Slack. We should find an
 official, accessible channel. I would be open for the right tool. It might
 make sense to also look into Discourse or even Reddit? The latter would
 definitely be easier to index by a search engine. 

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-11 Thread Jingsong Li
Hi all,

Regarding using ASF slack. I share the problems I saw in the Apache Druid
community. [1]

> As you may have heard, it’s become increasingly difficult for new users
without an @apache.org email address to join the ASF #druid Slack channel.
ASF Infra disabled the option to publicly provide a link to the workspace
to anyone who wanted it, after encountering issues with spammers.

> Per Infra’s guidance (https://infra.apache.org/slack.html), new community
members should only be invited as single-channel guests. Unfortunately,
single-channel guests are unable to extend invitations to new members,
including their colleagues who are using Druid. Only someone with full
member privileges is able to extend an invitation to new members. This lack
of consistency doesn’t make the community feel inclusive.

> There is a workaround in place (
https://github.com/apache/druid-website-src/pull/278) – users can send an
email to druid-u...@googlegroups.com to request an invite to the Slack
channel from an existing member – but this still poses a barrier to entry,
and isn’t a viable permanent solution. It also creates potential privacy
issues as not everyone is at liberty to announce they’re using Druid nor
wishes to display their email address in a public forum.

[1] https://lists.apache.org/thread/f36tvfwfo2ssf1x3jb4q0v2pftdyo5z5

Best,
Jingsong

On Thu, May 12, 2022 at 10:22 AM Xintong Song  wrote:

> To make some progress, maybe we decide on chat vs forum vs none and then
>> go into a deeper discussion on the implementation or is there anything
>> about Slack that would be complete blocker for the implementation?
>>
>
> Sure, then I'd be +1 for chat. From my side, the initiative is more about
> making communication more efficient, rather than making information easier
> to find.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, May 11, 2022 at 5:39 PM Konstantin Knauf 
> wrote:
>
>> I don't think we can maintain two additional channels. Some people have
>> already concerns about covering one additional channel.
>>
>> I think, a forum provides a better user experience than a mailing list.
>> Information is structured better, you can edit messages, sign up and search
>> is easier.
>>
>> To make some progress, maybe we decide on chat vs forum vs none and then
>> go into a deeper discussion on the implementation or is there anything
>> about Slack that would be complete blocker for the implementation?
>>
>>
>>
>> Am Mi., 11. Mai 2022 um 07:35 Uhr schrieb Xintong Song <
>> tonysong...@gmail.com>:
>>
>>> I agree with Robert on reworking the "Community" and "Getting Help"
>>> pages to emphasize how we position the mailing lists and Slack, and on
>>> revisiting in 6-12 months.
>>>
>>> Concerning dedicated Apache Flink Slack vs. ASF Slack, I'm with
>>> Konstantin. I'd expect it to be easier for having more channels and keeping
>>> them organized, managing permissions for different roles, adding bots, etc.
>>>
>>> IMO, having Slack is about improving the communication efficiency when
>>> you are already in a discussion, and we expect such improvement would
>>> motivate users to interact more with each other. From that perspective,
>>> forums are not much better than mailing lists.
>>>
>>> I'm also open to forums as well, but not as an alternative to Slack. I
>>> definitely see how forums help in keeping information organized and easy to
>>> find. However, I'm a bit concerned about the maintenance overhead. I'm not
>>> very familiar with Discourse or Reddit. My impression is that they are not
>>> as easy to set up and maintain as Slack.
>>>
>>> Thank you~
>>>
>>> Xintong Song
>>>
>>>
>>> [1] https://asktug.com/
>>>
>>> On Tue, May 10, 2022 at 4:50 PM Konstantin Knauf 
>>> wrote:
>>>
 Thanks for starting this discussion again. I am pretty much with Timo
 here. Slack or Discourse as an alternative for the user community, and
 mailing list for the contributing, design discussion, etc. I definitely see
 the friction of joining a mailing list and understand if users are
 intimidated.

 I am leaning towards a forum aka Discourse over a chat aka Slack. This
 is about asking for help, finding information and thoughtful discussion
 more so than casual chatting, right? For this a forum, where it is easier
 to find and comment on older threads and topics just makes more sense to
 me. A well-done Discourse forum is much more inviting and vibrant than a
 mailing list. Just from a tool perspective, discourse would have the
 advantage of being Open Source and so we could probably self-host it on an
 ASF machine. [1]

 When it comes to Slack, I definitely see the benefit of a dedicated
 Apache Flink Slack compared to ASF Slack. For example, we could have more
 channels (e.g. look how many channels Airflow is using
 http://apache-airflow.slack-archives.org) and we could generally
 customize the experience more towards Apache Flink.  If we go for Slack,
 let's d

Re: [Discuss] Creating an Apache Flink slack workspace

2022-05-11 Thread Yun Tang
Hi all,

I think forum might be a good choice for search and maintain. However, unlike 
slack workspace, it seems no existing popular product could be leveraged easily.

Thus, I am +1 to create an Apache Flink slack channel. If the ASF slack cannot 
be joined easily for most of users, I prefer to set up our own slack workspace.

Best
Yun Tang

From: Jingsong Li 
Sent: Thursday, May 12, 2022 10:49
To: Xintong Song 
Cc: dev ; user 
Subject: Re: [Discuss] Creating an Apache Flink slack workspace

Hi all,

Regarding using ASF slack. I share the problems I saw in the Apache Druid
community. [1]

> As you may have heard, it’s become increasingly difficult for new users
without an @apache.org email address to join the ASF #druid Slack channel.
ASF Infra disabled the option to publicly provide a link to the workspace
to anyone who wanted it, after encountering issues with spammers.

> Per Infra’s guidance (https://infra.apache.org/slack.html), new community
members should only be invited as single-channel guests. Unfortunately,
single-channel guests are unable to extend invitations to new members,
including their colleagues who are using Druid. Only someone with full
member privileges is able to extend an invitation to new members. This lack
of consistency doesn’t make the community feel inclusive.

> There is a workaround in place (
https://github.com/apache/druid-website-src/pull/278) – users can send an
email to druid-u...@googlegroups.com to request an invite to the Slack
channel from an existing member – but this still poses a barrier to entry,
and isn’t a viable permanent solution. It also creates potential privacy
issues as not everyone is at liberty to announce they’re using Druid nor
wishes to display their email address in a public forum.

[1] https://lists.apache.org/thread/f36tvfwfo2ssf1x3jb4q0v2pftdyo5z5

Best,
Jingsong

On Thu, May 12, 2022 at 10:22 AM Xintong Song  wrote:

> To make some progress, maybe we decide on chat vs forum vs none and then
>> go into a deeper discussion on the implementation or is there anything
>> about Slack that would be complete blocker for the implementation?
>>
>
> Sure, then I'd be +1 for chat. From my side, the initiative is more about
> making communication more efficient, rather than making information easier
> to find.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Wed, May 11, 2022 at 5:39 PM Konstantin Knauf 
> wrote:
>
>> I don't think we can maintain two additional channels. Some people have
>> already concerns about covering one additional channel.
>>
>> I think, a forum provides a better user experience than a mailing list.
>> Information is structured better, you can edit messages, sign up and search
>> is easier.
>>
>> To make some progress, maybe we decide on chat vs forum vs none and then
>> go into a deeper discussion on the implementation or is there anything
>> about Slack that would be complete blocker for the implementation?
>>
>>
>>
>> Am Mi., 11. Mai 2022 um 07:35 Uhr schrieb Xintong Song <
>> tonysong...@gmail.com>:
>>
>>> I agree with Robert on reworking the "Community" and "Getting Help"
>>> pages to emphasize how we position the mailing lists and Slack, and on
>>> revisiting in 6-12 months.
>>>
>>> Concerning dedicated Apache Flink Slack vs. ASF Slack, I'm with
>>> Konstantin. I'd expect it to be easier for having more channels and keeping
>>> them organized, managing permissions for different roles, adding bots, etc.
>>>
>>> IMO, having Slack is about improving the communication efficiency when
>>> you are already in a discussion, and we expect such improvement would
>>> motivate users to interact more with each other. From that perspective,
>>> forums are not much better than mailing lists.
>>>
>>> I'm also open to forums as well, but not as an alternative to Slack. I
>>> definitely see how forums help in keeping information organized and easy to
>>> find. However, I'm a bit concerned about the maintenance overhead. I'm not
>>> very familiar with Discourse or Reddit. My impression is that they are not
>>> as easy to set up and maintain as Slack.
>>>
>>> Thank you~
>>>
>>> Xintong Song
>>>
>>>
>>> [1] https://asktug.com/
>>>
>>> On Tue, May 10, 2022 at 4:50 PM Konstantin Knauf 
>>> wrote:
>>>
 Thanks for starting this discussion again. I am pretty much with Timo
 here. Slack or Discourse as an alternative for the user community, and
 mailing list for the contributing, design discussion, etc. I definitely see
 the friction of joining a mailing list and understand if users are
 intimidated.

 I am leaning towards a forum aka Discourse over a chat aka Slack. This
 is about asking for help, finding information and thoughtful discussion
 more so than casual chatting, right? For this a forum, where it is easier
 to find and comment on older threads and topics just makes more sense to
 me. A well-done Discourse forum is much more inviting and vibrant than a
 mailing list. Jus

Re: [DISCUSS] FLIP-221 Abstraction for lookup source cache and metric

2022-05-11 Thread Александр Смирнов
Hi Jark!

Sorry for the late response. I would like to make some comments and
clarify my points.

1) I agree with your first statement. I think we can achieve both
advantages this way: put the Cache interface in flink-table-common,
but have implementations of it in flink-table-runtime. Therefore if a
connector developer wants to use existing cache strategies and their
implementations, he can just pass lookupConfig to the planner, but if
he wants to have its own cache implementation in his TableFunction, it
will be possible for him to use the existing interface for this
purpose (we can explicitly point this out in the documentation). In
this way all configs and metrics will be unified. WDYT?

> If a filter can prune 90% of data in the cache, we will have 90% of lookup 
> requests that can never be cached

2) Let me clarify the logic filters optimization in case of LRU cache.
It looks like Cache>. Here we always
store the response of the dimension table in cache, even after
applying calc function. I.e. if there are no rows after applying
filters to the result of the 'eval' method of TableFunction, we store
the empty list by lookup keys. Therefore the cache line will be
filled, but will require much less memory (in bytes). I.e. we don't
completely filter keys, by which result was pruned, but significantly
reduce required memory to store this result. If the user knows about
this behavior, he can increase the 'max-rows' option before the start
of the job. But actually I came up with the idea that we can do this
automatically by using the 'maximumWeight' and 'weigher' methods of
GuavaCache [1]. Weight can be the size of the collection of rows
(value of cache). Therefore cache can automatically fit much more
records than before.

> Flink SQL has provided a standard way to do filters and projects pushdown, 
> i.e., SupportsFilterPushDown and SupportsProjectionPushDown.
> Jdbc/hive/HBase haven't implemented the interfaces, don't mean it's hard to 
> implement.

It's debatable how difficult it will be to implement filter pushdown.
But I think the fact that currently there is no database connector
with filter pushdown at least means that this feature won't be
supported soon in connectors. Moreover, if we talk about other
connectors (not in Flink repo), their databases might not support all
Flink filters (or not support filters at all). I think users are
interested in supporting cache filters optimization  independently of
supporting other features and solving more complex problems (or
unsolvable at all).

3) I agree with your third statement. Actually in our internal version
I also tried to unify the logic of scanning and reloading data from
connectors. But unfortunately, I didn't find a way to unify the logic
of all ScanRuntimeProviders (InputFormat, SourceFunction, Source,...)
and reuse it in reloading ALL cache. As a result I settled on using
InputFormat, because it was used for scanning in all lookup
connectors. (I didn't know that there are plans to deprecate
InputFormat in favor of FLIP-27 Source). IMO usage of FLIP-27 source
in ALL caching is not good idea, because this source was designed to
work in distributed environment (SplitEnumerator on JobManager and
SourceReaders on TaskManagers), not in one operator (lookup join
operator in our case). There is even no direct way to pass splits from
SplitEnumerator to SourceReader (this logic works through
SplitEnumeratorContext, which requires
OperatorCoordinator.SubtaskGateway to send AddSplitEvents). Usage of
InputFormat for ALL cache seems much more clearer and easier. But if
there are plans to refactor all connectors to FLIP-27, I have the
following ideas: maybe we can refuse from lookup join ALL cache in
favor of simple join with multiple scanning of batch source? The point
is that the only difference between lookup join ALL cache and simple
join with batch source is that in the first case scanning is performed
multiple times, in between which state (cache) is cleared (correct me
if I'm wrong). So what if we extend the functionality of simple join
to support state reloading + extend the functionality of scanning
batch source multiple times (this one should be easy with new FLIP-27
source, that unifies streaming/batch reading - we will need to change
only SplitEnumerator, which will pass splits again after some TTL).
WDYT? I must say that this looks like a long-term goal and will make
the scope of this FLIP even larger than you said. Maybe we can limit
ourselves to a simpler solution now (InputFormats).

So to sum up, my points is like this:
1) There is a way to make both concise and flexible interfaces for
caching in lookup join.
2) Cache filters optimization is important both in LRU and ALL caches.
3) It is unclear when filter pushdown will be supported in Flink
connectors, some of the connectors might not have the opportunity to
support filter pushdown + as I know, currently filter pushdown works
only for scanning (not lookup). So cache filters + projections
o