[jira] [Created] (FLINK-16690) Refactor StreamTaskTest to reuse TestTaskBuilder and MockStreamTaskBuilder
Zhijiang created FLINK-16690: Summary: Refactor StreamTaskTest to reuse TestTaskBuilder and MockStreamTaskBuilder Key: FLINK-16690 URL: https://issues.apache.org/jira/browse/FLINK-16690 Project: Flink Issue Type: Task Components: Runtime / Task, Tests Reporter: Zhijiang Fix For: 1.11.0 We can reuse existing TestTaskBuilder and MockStreamTaskBuilder for constructing Task and StreamTask easily in tests to simplify StreamTaskTest case. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16691) Adjust Python UDF document to remind users to install PyFlink on the cluster
Huang Xingbo created FLINK-16691: Summary: Adjust Python UDF document to remind users to install PyFlink on the cluster Key: FLINK-16691 URL: https://issues.apache.org/jira/browse/FLINK-16691 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Huang Xingbo Fix For: 1.11.0 Since FLINK-16304 has merged, PyFlink won't append the pyflink.zip to the PYTHONPATH of Python UDF worker. Users need to install PyFlink on cluster manually. We need to emphasize this on the documentation. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16692) flink joblistener can register from config
jackylau created FLINK-16692: Summary: flink joblistener can register from config Key: FLINK-16692 URL: https://issues.apache.org/jira/browse/FLINK-16692 Project: Flink Issue Type: Improvement Components: API / Core Affects Versions: 1.10.0 Reporter: jackylau Fix For: 1.11.0 we should do as spark does ,which can register listener from conf such as "spark.extraListeners"。 And it will be convinient for users when users just want to set hook -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16693) Legacy planner incompatible with Timestamp backed by LocalDateTime
Paul Lin created FLINK-16693: Summary: Legacy planner incompatible with Timestamp backed by LocalDateTime Key: FLINK-16693 URL: https://issues.apache.org/jira/browse/FLINK-16693 Project: Flink Issue Type: Bug Components: Table SQL / Legacy Planner Affects Versions: 1.10.0 Reporter: Paul Lin Recently I upgraded a simple application that inserts static data into a table from 1.9.0 to 1.10.0, and encountered a timestamp type incompatibility problem during the table sink validation. The SQL is like: ``` insert into kafka.test.tbl_a # schema: (user_name STRING, user_id INT, login_time TIMESTAMP) select ("ann", 1000, TIMESTAMP "2019-12-30 00:00:00") ``` And the error thrown: ``` Field types of query result and registered TableSink `kafka`.`test`.`tbl_a` do not match. Query result schema: [EXPR$0: String, EXPR$1: Integer, EXPR$2: Timestamp] TableSink schema:[user_name: String, user_id: Integer, login_time: LocalDateTime] ``` After some digging, I found the root cause might be that since FLINK-14645 timestamp fields defined via TableFactory had been bridged to LocalDateTime, but timestamp functions are still backed by java.sql.Timestamp. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16694) Resuming Externalized Checkpoint end-to-end test failed on travis
Piotr Nowojski created FLINK-16694: -- Summary: Resuming Externalized Checkpoint end-to-end test failed on travis Key: FLINK-16694 URL: https://issues.apache.org/jira/browse/FLINK-16694 Project: Flink Issue Type: Bug Components: Runtime / Metrics, Tests Affects Versions: 1.9.2 Reporter: Piotr Nowojski Running 'Resuming Externalized Checkpoint (rocks, incremental, scale down) end-to-end test' failed on travis with the error: {code:java} The job exceeded the maximum log length, and has been terminated. {code} https://api.travis-ci.org/v3/job/664469537/log.txt Probably because of metrics logging. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16696) Incomplete REST API docs
Roman Khachatryan created FLINK-16696: - Summary: Incomplete REST API docs Key: FLINK-16696 URL: https://issues.apache.org/jira/browse/FLINK-16696 Project: Flink Issue Type: Bug Components: Documentation, Runtime / REST Affects Versions: 1.10.0 Reporter: Roman Khachatryan Rest api docs doesn't provide enough details about /jobs/:jobid/savepoints/:triggerid endpoint (how to get completed savepoint location): [https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#api] Javadoc covers it better: [https://ci.apache.org/projects/flink/flink-docs-release-1.10/api/java/index.html?org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.html] [http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Savepoint-Location-from-Flink-REST-API-td33808.html] -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16695) PrometheusReporterEndToEndITCase failed on travis
Piotr Nowojski created FLINK-16695: -- Summary: PrometheusReporterEndToEndITCase failed on travis Key: FLINK-16695 URL: https://issues.apache.org/jira/browse/FLINK-16695 Project: Flink Issue Type: Bug Components: Runtime / Metrics Affects Versions: 1.9.2 Reporter: Piotr Nowojski https://api.travis-ci.org/v3/job/664469523/log.txt {code:java} 03:22:54.057 [INFO] --- 03:22:54.060 [INFO] T E S T S 03:22:54.061 [INFO] --- 03:22:54.373 [INFO] Running org.apache.flink.metrics.prometheus.tests.PrometheusReporterEndToEndITCase 03:22:55.634 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.256 s <<< FAILURE! - in org.apache.flink.metrics.prometheus.tests.PrometheusReporterEndToEndITCase 03:22:55.637 [ERROR] testReporter(org.apache.flink.metrics.prometheus.tests.PrometheusReporterEndToEndITCase) Time elapsed: 0.879 s <<< ERROR! java.io.IOException: Process execution failed due error. Error output: at org.apache.flink.metrics.prometheus.tests.PrometheusReporterEndToEndITCase.testReporter(PrometheusReporterEndToEndITCase.java:118) 03:22:55.964 [INFO] 03:22:55.964 [INFO] Results: 03:22:55.964 [INFO] 03:22:55.964 [ERROR] Errors: 03:22:55.964 [ERROR] PrometheusReporterEndToEndITCase.testReporter:118 » IO Process execution faile... 03:22:55.964 [INFO] 03:22:55.964 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0 {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16697) Disable JMX rebinding
Colm O hEigeartaigh created FLINK-16697: --- Summary: Disable JMX rebinding Key: FLINK-16697 URL: https://issues.apache.org/jira/browse/FLINK-16697 Project: Flink Issue Type: Improvement Reporter: Colm O hEigeartaigh Fix For: 1.11.0 Disable JMX rebinding. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16698) fllink need catalog listener to do such as preCreate/PreDrop* afterCreate/AfterDrop* things
jackylau created FLINK-16698: Summary: fllink need catalog listener to do such as preCreate/PreDrop* afterCreate/AfterDrop* things Key: FLINK-16698 URL: https://issues.apache.org/jira/browse/FLINK-16698 Project: Flink Issue Type: Improvement Components: Table SQL / Planner Affects Versions: 1.10.0 Reporter: jackylau Fix For: 1.11.0 In order to support other things such as atlas or authentication, i think flink need catalog listener to do such as preCreate/PreDrop* afterCreate/AfterDrop* things, just like spark/hive does -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16699) Support accessing secured services via K8s secrets
Canbin Zheng created FLINK-16699: Summary: Support accessing secured services via K8s secrets Key: FLINK-16699 URL: https://issues.apache.org/jira/browse/FLINK-16699 Project: Flink Issue Type: Improvement Components: Deployment / Kubernetes Reporter: Canbin Zheng Fix For: 1.11.0 Kubernetes [Secrets|https://kubernetes.io/docs/concepts/configuration/secret/] can be used to provide credentials for a Flink application to access secured services. This ticket proposes to # Support to mount user-specified K8s Secrets into the JobManager/TaskManager Container # Support to use a user-specified K8s Secret through an environment variable. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: Flink YARN app terminated before the client receives the result
Hi Tison & Till, Changing *thenAccept *into *thenAcceptAsync *in the MiniDispatcher#cancelJob does not help to solve the problem in my environment. However, I have found that adding a* Thread.sleep(2000) *before the return of JobCancellationHandler#handleRequest solved the problem (at least the symptom goes away). As this is only a dirty hack, I will try to get a more decent solution to this problem. Sincerely, Weike On Tue, Mar 17, 2020 at 11:11 PM tison wrote: > JIRA created as https://jira.apache.org/jira/browse/FLINK-16637 > > Best, > tison. > > > Till Rohrmann 于2020年3月17日周二 下午5:57写道: > >> @Tison could you create an issue to track the problem. Please also link >> the uploaded log file for further debugging. >> >> I think the reason why it worked in Flink 1.9 could have been that we had >> a async callback in the longer chain which broke the flow of execution and >> allowed to send the response. This is no longer the case. As an easy fix >> one could change thenAccept into thenAcceptAsync in the >> MiniDispatcher#cancelJob. But this only solves symptoms. Maybe we should >> think about allowing not only StatusHandler to close asynchronously. At the >> moment we say that all other handler shut down immediately (see >> AbstractHandler#closeHandlerAsync). But the problem with this change would >> be that all handler would become stateful because they would need to >> remember whether a request is currently ongoing or not. >> >> Cheers, >> Till >> >> On Tue, Mar 17, 2020 at 9:01 AM DONG, Weike >> wrote: >> >>> Hi Tison & Till and all, >>> >>> I have uploaded the client, taskmanager and jobmanager log to Gist ( >>> https://gist.github.com/kylemeow/500b6567368316ec6f5b8f99b469a49f), and >>> I >>> can reproduce this bug every time when trying to cancel Flink 1.10 jobs >>> on >>> YARN. >>> >>> Besides, in earlier Flink versions like 1.9, the REST API for *cancelling >>> job with a savepoint *sometimes throws exceptions to the client side due >>> to >>> early shutdown of the server, even though the savepoint was successfully >>> completed by reviewing the log, however when using the newly introduced >>> *stop* API, that bug disappeared, however, *cancel* API seems to be buggy >>> now. >>> >>> Best, >>> Weike >>> >>> On Tue, Mar 17, 2020 at 10:17 AM tison wrote: >>> >>> > edit: previously after the cancellation we have a longer call chain to >>> > #jobReachedGloballyTerminalState which does the archive job & JM >>> graceful >>> > showdown, which might take some time so that ... >>> > >>> > Best, >>> > tison. >>> > >>> > >>> > tison 于2020年3月17日周二 上午10:13写道: >>> > >>> >> Hi Weike & Till, >>> >> >>> >> I agree with Till and it is also the analysis from my side. However, >>> it >>> >> seems even if we don't have FLINK-15116, it is still possible that we >>> >> complete the cancel future but the cluster got shutdown before it >>> properly >>> >> delivered the response. >>> >> >>> >> There is one thing strange that this behavior almost reproducible, it >>> >> should be a possible order but not always. Maybe previous we have to >>> >> firstly cancel the job which has a long call chain so that it happens >>> we >>> >> have enough time to delivered the response. >>> >> >>> >> But the resolution looks like we introduce some >>> >> synchronization/finalization logics that clear these outstanding >>> future >>> >> with best effort before the cluster(RestServer) down. >>> >> >>> >> Best, >>> >> tison. >>> >> >>> >> >>> >> Till Rohrmann 于2020年3月17日周二 上午4:12写道: >>> >> >>> >>> Hi Weike, >>> >>> >>> >>> could you share the complete logs with us? Attachments are being >>> >>> filtered out by the Apache mail server but it works if you upload >>> the logs >>> >>> somewhere (e.g. https://gist.github.com/) and then share the link >>> with >>> >>> us. Ideally you run the cluster with DEBUG log settings. >>> >>> >>> >>> I assume that you are running Flink 1.10, right? >>> >>> >>> >>> My suspicion is that this behaviour has been introduced with >>> FLINK-15116 >>> >>> [1]. It looks as if we complete the shutdown future in >>> >>> MiniDispatcher#cancelJob before we return the response to the >>> >>> RestClusterClient. My guess is that this triggers the shutdown of the >>> >>> RestServer which then is not able to serve the response to the >>> client. I'm >>> >>> pulling in Aljoscha and Tison who introduced this change. They might >>> be >>> >>> able to verify my theory and propose a solution for it. >>> >>> >>> >>> [1] https://issues.apache.org/jira/browse/FLINK-15116 >>> >>> >>> >>> Cheers, >>> >>> Till >>> >>> >>> >>> On Fri, Mar 13, 2020 at 7:50 AM DONG, Weike >> > >>> >>> wrote: >>> >>> >>> Hi Yangze and all, >>> >>> I have tried numerous times, and this behavior persists. >>> >>> Below is the tail log of taskmanager.log: >>> >>> 2020-03-13 12:06:14.240 [flink-akka.actor.default-dispatcher-3] INFO >>> org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl - >>> Free slot >>> >>>
Re: Flink YARN app terminated before the client receives the result
Yes you are right that `thenAcceptAsync` only breaks the control flow but it does not guarantee that the `RestServer` has actually sent the response to the client. Maybe we also need something similar to FLINK-10309 [1]. The problem I see with this approach is that it makes all RestHandlers stateful. [1] https://issues.apache.org/jira/browse/FLINK-10309 Cheers, Till On Fri, Mar 20, 2020 at 2:26 PM DONG, Weike wrote: > Hi Tison & Till, > > Changing *thenAccept *into *thenAcceptAsync *in the > MiniDispatcher#cancelJob does not help to solve the problem in my > environment. However, I have found that adding a* Thread.sleep(2000) *before > the return of JobCancellationHandler#handleRequest solved the problem (at > least the symptom goes away). As this is only a dirty hack, I will try to > get a more decent solution to this problem. > > Sincerely, > Weike > > On Tue, Mar 17, 2020 at 11:11 PM tison wrote: > >> JIRA created as https://jira.apache.org/jira/browse/FLINK-16637 >> >> Best, >> tison. >> >> >> Till Rohrmann 于2020年3月17日周二 下午5:57写道: >> >>> @Tison could you create an issue to track the problem. Please also link >>> the uploaded log file for further debugging. >>> >>> I think the reason why it worked in Flink 1.9 could have been that we >>> had a async callback in the longer chain which broke the flow of execution >>> and allowed to send the response. This is no longer the case. As an easy >>> fix one could change thenAccept into thenAcceptAsync in the >>> MiniDispatcher#cancelJob. But this only solves symptoms. Maybe we should >>> think about allowing not only StatusHandler to close asynchronously. At the >>> moment we say that all other handler shut down immediately (see >>> AbstractHandler#closeHandlerAsync). But the problem with this change would >>> be that all handler would become stateful because they would need to >>> remember whether a request is currently ongoing or not. >>> >>> Cheers, >>> Till >>> >>> On Tue, Mar 17, 2020 at 9:01 AM DONG, Weike >>> wrote: >>> Hi Tison & Till and all, I have uploaded the client, taskmanager and jobmanager log to Gist ( https://gist.github.com/kylemeow/500b6567368316ec6f5b8f99b469a49f), and I can reproduce this bug every time when trying to cancel Flink 1.10 jobs on YARN. Besides, in earlier Flink versions like 1.9, the REST API for *cancelling job with a savepoint *sometimes throws exceptions to the client side due to early shutdown of the server, even though the savepoint was successfully completed by reviewing the log, however when using the newly introduced *stop* API, that bug disappeared, however, *cancel* API seems to be buggy now. Best, Weike On Tue, Mar 17, 2020 at 10:17 AM tison wrote: > edit: previously after the cancellation we have a longer call chain to > #jobReachedGloballyTerminalState which does the archive job & JM graceful > showdown, which might take some time so that ... > > Best, > tison. > > > tison 于2020年3月17日周二 上午10:13写道: > >> Hi Weike & Till, >> >> I agree with Till and it is also the analysis from my side. However, it >> seems even if we don't have FLINK-15116, it is still possible that we >> complete the cancel future but the cluster got shutdown before it properly >> delivered the response. >> >> There is one thing strange that this behavior almost reproducible, it >> should be a possible order but not always. Maybe previous we have to >> firstly cancel the job which has a long call chain so that it happens we >> have enough time to delivered the response. >> >> But the resolution looks like we introduce some >> synchronization/finalization logics that clear these outstanding future >> with best effort before the cluster(RestServer) down. >> >> Best, >> tison. >> >> >> Till Rohrmann 于2020年3月17日周二 上午4:12写道: >> >>> Hi Weike, >>> >>> could you share the complete logs with us? Attachments are being >>> filtered out by the Apache mail server but it works if you upload the logs >>> somewhere (e.g. https://gist.github.com/) and then share the link with >>> us. Ideally you run the cluster with DEBUG log settings. >>> >>> I assume that you are running Flink 1.10, right? >>> >>> My suspicion is that this behaviour has been introduced with FLINK-15116 >>> [1]. It looks as if we complete the shutdown future in >>> MiniDispatcher#cancelJob before we return the response to the >>> RestClusterClient. My guess is that this triggers the shutdown of the >>> RestServer which then is not able to serve the response to the client. I'm >>> pulling in Aljoscha and Tison who introduced this change. They might be >>> able to verify my theory
Re: [DISCUSS] Link Stateful Functions from the Flink Website
+1 for the list of proposed changes - What is "Stateful Functions"? link - Documentation -> Stateful Functions Docs - StateFun on GitHub Cheers, Till On Thu, Mar 12, 2020 at 12:22 PM Stephan Ewen wrote: > Thank you all, for chiming in! > > I like the ideas suggested to update the website, I am also +1 for an > update the the main image and make sure we show. Maybe something like [1], > which would also help to increase SQL visibility. > I guess that would be a bit of a bigger effort, and I would kick off that > bigger discussion in a few days or so. > Until then, as an immediate "band aid", I would then add these here (unless > there are concerns): > - What is "Stateful Functions"? link > - Documentation -> Stateful Functions Docs > - StateFun on GitHub > > Best, > Stephan > > [1] > > https://onedrive.live.com/?authkey=%21ABxm14umFhJOXFQ&cid=E2B4A3561C93FA7B&id=E2B4A3561C93FA7B%213535&parId=E2B4A3561C93FA7B%21342&o=OneUp > > > On Thu, Mar 12, 2020 at 7:39 AM Zhijiang .invalid> > wrote: > > > Thanks for launching this discussion @Stephan! > > > > +1 for the idea of linking stateful functions in Flink website. It is > time > > to increase the exposure of this secret weapon for attracting more > > attentions. > > It is benefit for interested users to try it out in desired scenarios and > > also benefit for developing it well. > > > > > > Best, > > Zhijiang > > -- > > From:Hequn Cheng > > Send Time:2020 Mar. 11 (Wed.) 15:36 > > To:dev > > Subject:Re: [DISCUSS] Link Stateful Functions from the Flink Website > > > > Hi, > > > > Thanks a lot for raising the discussion @Stephan. > > +1 to increase the visibilities of the Stateful Functions. > > > > Another option I'm think is adding a section(named Stateful Functions or > > Flink Projects?) > > under the "Latest Blog Posts". The advantage is we can add a picture and > > some descriptions here. > > A picture may attract more attention from the users when he/she visit the > > website. > > The picture can be the same one in [1]. > > > > In the future, if we have multiple Flink individual projects, we can also > > turn the section into a Table list > > to expose all of them. > > > > What do you think? > > > > Best, > > Hequn > > > > [1] https://ci.apache.org/projects/flink/flink-statefun-docs-master/ > > > > On Tue, Mar 10, 2020 at 11:13 PM Tzu-Li (Gordon) Tai < > tzuli...@apache.org> > > wrote: > > > > > +1 on the suggestion to add "What is Stateful Functions" to the left > > > navigation bar. > > > That might also mean it would be nice to have a slight rework to the > main > > > image on the website, illustrating the use cases of Flink (this one > [1]). > > > On the image it does mention "Event-Driven Applications", but there's > > > somewhat missing a more direct connection from that term to the > Stateful > > > Functions project. > > > > > > As for what the "What is Stateful Functions?" button directs to, maybe > > that > > > should point to a general concepts page. Initially, we can begin with > the > > > README contents on the project repo [2]. > > > As for the actual Statefun documentation link [3], I think we should > link > > > that from an item in the "Documentation" pull-down list. > > > > > > One last thing to increase visibility of the Statefun project just a > bit > > > more: > > > There's a "Flink on Github" button on the very bottom of the navigation > > > bar. > > > What do you think about adding a "Flink Stateful Functions on Github" > > > button there as well? > > > > > > Cheers, > > > Gordon > > > > > > [1] > > > > > > > > > https://github.com/apache/flink-web/blob/asf-site/img/flink-home-graphic.png > > > [2] https://github.com/apache/flink-statefun/blob/master/README.md > > > [3] https://ci.apache.org/projects/flink/flink-statefun-docs-master/ > > > > > > > > > On Tue, Mar 10, 2020 at 8:29 PM Yu Li wrote: > > > > > > > +1 on adding a "What is Stateful Functions" link below the "What is > > > Apache > > > > Flink" entry and integrating into the Flink docs gradually (instead > of > > > > hiding it behind until fully integrated). > > > > > > > > Best Regards, > > > > Yu > > > > > > > > > > > > On Tue, 10 Mar 2020 at 19:33, Stephan Ewen wrote: > > > > > > > > > Hi all! > > > > > > > > > > I think it would be nice to mention Stateful Function on the Flink > > > > website. > > > > > At the moment, Stateful Functions is very hard to discover, and > with > > > the > > > > > first release of it under Apache Flink, it would be a good time to > > > change > > > > > that. > > > > > > > > > > My proposal would be to add a "What is Stateful Functions?" below > the > > > > "What > > > > > is Apache Flink" entry in the sidenav, and point it to > > > > > https://ci.apache.org/projects/flink/flink-statefun-docs-master/ > > > > > It is not ideal, yet, but it may serve as an intermediate solution > > > until > > > > we > > > > > can make more involved attempt to rethink the website
[jira] [Created] (FLINK-16700) Document Kinesis I/O Module
Seth Wiesman created FLINK-16700: Summary: Document Kinesis I/O Module Key: FLINK-16700 URL: https://issues.apache.org/jira/browse/FLINK-16700 Project: Flink Issue Type: Improvement Components: Stateful Functions Reporter: Seth Wiesman Fix For: statefun-1.1 -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] JMX remote monitoring integration with Flink
Hi Rong Rong, you are right that it JMX is quite hard to use in production due to the mentioned problems with discovering the port. There is actually already a JIRA ticket [1] discussing this problem. It just never gained enough traction to be tackled. In general, I agree that it would be nice to have a REST API with which one can obtain JVM specific information about a Flink process. This information could also contain a potentially open JMX port. [1] https://issues.apache.org/jira/browse/FLINK-5552 Cheers, Till On Fri, Mar 13, 2020 at 2:02 PM Forward Xu wrote: > Hi RongRong, > Thank you for bringing this discussion, it is indeed not appropriate to > occupy additional ports in the production environment to provide jmxrmi > services. I think [2] RestApi or JobManager/TaskManager UI is a good idea. > > Best, > Forward > > Rong Rong 于2020年3月13日周五 下午8:54写道: > > > Hi All, > > > > Has anyone tried to manage production Flink applications through JMX > remote > > monitoring & management[1]? > > > > We were experimenting to enable JMXRMI on Flink by default in production > > and would like to share some of our thoughts: > > ** Is there any straightforward way to dynamically allocate JMXRMI remote > > ports?* > > - It is unrealistic to use JMXRMI static port in production > environment, > > however we have to go all around the logging system to make the dynamic > > remote port number printed out in the log files - this seems very > > inconvenient. > > - I think it would be very handy if we can show the JMXRMI remote > > information on JobManager/TaskManager UI, or via REST API. (I am thinking > > about something similar to [2]) > > > > ** Is there any performance overhead enabling JMX for a Flink > application?* > > - We haven't seen any significant performance impact in our > experiments. > > However the experiment is not that well-rounded and the observation is > > inconclusive. > > - I was wondering would it be a good idea to put some benchmark in the > > regression tests[3] to see what's the overhead would be? > > > > It would be highly appreciated if anyone could share some experiences or > > provide any suggestions in how we can improve the JMX remote integration > > with Flink. > > > > > > Thanks, > > Rong > > > > > > [1] > > > > > https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html > > [2] > > > https://samza.apache.org/learn/documentation/0.14/jobs/web-ui-rest-api.html > > [3] http://codespeed.dak8s.net:8000/ > > >
[jira] [Created] (FLINK-16701) Elasticsearch sink support alias for indices.
Leonard Xu created FLINK-16701: -- Summary: Elasticsearch sink support alias for indices. Key: FLINK-16701 URL: https://issues.apache.org/jira/browse/FLINK-16701 Project: Flink Issue Type: Improvement Components: Connectors / ElasticSearch Affects Versions: 1.11.0 Reporter: Leonard Xu Fix For: 1.11.0 This is related to [FLINK-15400|https://issues.apache.org/jira/browse/FLINK-15400] FLINK-15400 will only support dynamic index, and do not support the alias. Because supporting alias both need in Streaming API and Table API, so I think split the original design to two PRs make sense. PR for FLINK-15400: support dynamic index for ElasticsearchTableSink PR for this issue: support alias for Streaming API and Table API -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16702) develop JDBCCatalogFactory for service discovery
Bowen Li created FLINK-16702: Summary: develop JDBCCatalogFactory for service discovery Key: FLINK-16702 URL: https://issues.apache.org/jira/browse/FLINK-16702 Project: Flink Issue Type: Sub-task Components: Connectors / JDBC Reporter: Bowen Li Assignee: Bowen Li Fix For: 1.11.0 -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (FLINK-16703) AkkaRpcActor state machine does record transition to terminating state.
Dmitri Chmelev created FLINK-16703: -- Summary: AkkaRpcActor state machine does record transition to terminating state. Key: FLINK-16703 URL: https://issues.apache.org/jira/browse/FLINK-16703 Project: Flink Issue Type: Bug Components: Runtime / Coordination Affects Versions: 1.10.0, 1.9.0, 1.8.0, 1.11.0, 2.0.0 Reporter: Dmitri Chmelev As part of FLINK-11551, the state machine of AkkaRpcActor has been updated to include 'terminating' and 'terminated' states. However, when actor termination request is handled, the resulting 'terminating' state is not updated by the FSM. [https://github.com/apache/flink/blame/master/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L175] As a side-effect, {{isRunning()}} predicate can return that the actor is still running after terminate was initiated and to still handle messages. I believe the fix is trivial and the private field {{state}} should be updated with the return value of the call to {{state.terminate()}}. Feel free to adjust the priority accordingly. -- This message was sent by Atlassian Jira (v8.3.4#803005)
Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource
+1. I would suggest to take a step even further and see what users really need to test/try/play with table API and Flink SQL. Besides this one, here're some more sources and sinks that I have developed or used previously to facilitate building Flink table/SQL pipelines. 1. random input data source - should generate random data at a specified rate according to schema - purposes - test Flink pipeline and data can end up in external storage correctly - stress test Flink sink as well as tuning up external storage 2. print data sink - should print data in row format in console - purposes - make it easier to test Flink SQL job e2e in IDE - test Flink pipeline and ensure output data format/value is correct 3. no output data sink - just swallow output data without doing anything - purpose - evaluate and tune performance of Flink source and the whole pipeline. Users' don't need to worry about sink back pressure These may be taken into consideration all together as an effort to lower the threshold of running Flink SQL/table API, and facilitate users' daily work. Cheers, Bowen On Thu, Mar 19, 2020 at 10:32 PM Jingsong Li wrote: > Hi all, > > I heard some users complain that table is difficult to test. Now with SQL > client, users are more and more inclined to use it to test rather than > program. > The most common example is Kafka source. If users need to test their SQL > output and checkpoint, they need to: > > - 1.Launch a Kafka standalone, create a Kafka topic . > - 2.Write a program, mock input records, and produce records to Kafka > topic. > - 3.Then test in Flink. > > The step 1 and 2 are annoying, although this test is E2E. > > Then I found StatefulSequenceSource, it is very good because it has deal > with checkpoint things, so it is very good to checkpoint mechanism.Usually, > users are turned on checkpoint in production. > > With computed columns, user are easy to create a sequence source DDL same > to Kafka DDL. Then they can test inside Flink, don't need launch other > things. > > Have you consider this? What do you think? > > CC: @Aljoscha Krettek the author > of StatefulSequenceSource. > > Best, > Jingsong Lee >
Re: [DISCUSS] Releasing Flink 1.10.1
Hi All, Here is the status update of issues in 1.10.1 watch list: * Blockers (7) - [Under Discussion] FLINK-16018 Improve error reporting when submitting batch job (instead of AskTimeoutException) - [Under Discussion] FLINK-16142 Memory Leak causes Metaspace OOM error on repeated job submission - [PR Submitted] FLINK-16170 SearchTemplateRequest ClassNotFoundException when use flink-sql-connector-elasticsearch7 - [PR Submitted] FLINK-16262 Class loader problem with FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory - [Closed] FLINK-16406 Increase default value for JVM Metaspace to minimise its OutOfMemoryError - [Closed] FLINK-16454 Update the copyright year in NOTICE files - [Open] FLINK-16576 State inconsistency on restore with memory state backends * Critical (4) - [Closed] FLINK-16047 Blink planner produces wrong aggregate results with state clean up - [PR Submitted] FLINK-16070 Blink planner can not extract correct unique key for UpsertStreamTableSink - [Under Discussion] FLINK-16225 Metaspace Out Of Memory should be handled as Fatal Error in TaskManager - [Open] FLINK-16408 Bind user code class loader to lifetime of a slot Please let me know if you see any new blockers to add to the list. Thanks. Best Regards, Yu On Wed, 18 Mar 2020 at 00:11, Yu Li wrote: > Thanks for the updates Till! > > For FLINK-16018, maybe we could create two sub-tasks for easy and complete > fix separately, and only include the easy one in 1.10.1? Or please just > feel free to postpone the whole task to 1.10.2 if "all or nothing" policy > is preferred (smile). Thanks. > > Best Regards, > Yu > > > On Tue, 17 Mar 2020 at 23:33, Till Rohrmann wrote: > >> +1 for a soonish bug fix release. Thanks for volunteering as our release >> manager Yu. >> >> I think we can soon merge the increase of metaspace size and improving the >> error message. The assumption is that we currently don't have too many >> small Flink 1.10 deployments with a process size <= 1GB. Of course, the >> sooner we release the bug fix release, the fewer deployments will be >> affected by this change. >> >> For FLINK-16018, I think there would be an easy solution which we could >> include in the bug fix release. The proper fix will most likely take a bit >> longer, though. >> >> Cheers, >> Till >> >> On Fri, Mar 13, 2020 at 8:08 PM Andrey Zagrebin >> wrote: >> >> > > @Andrey and @Xintong - could we have a quick poll on the user mailing >> > list >> > > about increasing the metaspace size in Flink 1.10.1? Specifically >> asking >> > > for who has very small TM setups? >> > >> > There has been a survey about this topic since 10 days: >> > >> > `[Survey] Default size for the new JVM Metaspace limit in 1.10` >> > I can ask there about the small TM setups. >> > >> > On Fri, Mar 13, 2020 at 5:19 PM Yu Li wrote: >> > >> > > Another blocker for 1.10.1: FLINK-16576 State inconsistency on restore >> > with >> > > memory state backends >> > > >> > > Let me recompile the watching list with recent feedbacks. There're >> > totally >> > > 45 issues with Blocker/Critical priority for 1.10.1, out of which 14 >> > > already resolved and 31 left, and the below ones are watched (meaning >> > that >> > > once the below ones got fixed, we will kick of the releasing with left >> > ones >> > > postponed to 1.10.2 unless objections): >> > > >> > > * Blockers (7) >> > > - [Under Discussion] FLINK-16018 Improve error reporting when >> > submitting >> > > batch job (instead of AskTimeoutException) >> > > - [Under Discussion] FLINK-16142 Memory Leak causes Metaspace OOM >> error >> > > on repeated job submission >> > > - [PR Submitted] FLINK-16170 SearchTemplateRequest >> > ClassNotFoundException >> > > when use flink-sql-connector-elasticsearch7 >> > > - [PR Submitted] FLINK-16262 Class loader problem with >> > > FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory >> > > - [Closed] FLINK-16406 Increase default value for JVM Metaspace to >> > > minimise its OutOfMemoryError >> > > - [Open] FLINK-16454 Update the copyright year in NOTICE files >> > > - [Open] FLINK-16576 State inconsistency on restore with memory >> state >> > > backends >> > > >> > > * Critical (4) >> > > - [Open] FLINK-16047 Blink planner produces wrong aggregate results >> > with >> > > state clean up >> > > - [PR Submitted] FLINK-16070 Blink planner can not extract correct >> > unique >> > > key for UpsertStreamTableSink >> > > - [Under Discussion] FLINK-16225 Metaspace Out Of Memory should be >> > > handled as Fatal Error in TaskManager >> > > - [Open] FLINK-16408 Bind user code class loader to lifetime of a >> slot >> > > >> > > Please raise your hand if you find any other issues should be put into >> > this >> > > list, thanks. >> > > >> > > Please also note that 1.10.2 version is already created (thanks for >> the >> > > help @jincheng), and feel free to adjust/assign fix version to it when >> > > necessary. >> > > >> > > Best Regards,