[jira] [Created] (FLINK-16690) Refactor StreamTaskTest to reuse TestTaskBuilder and MockStreamTaskBuilder

2020-03-20 Thread Zhijiang (Jira)
Zhijiang created FLINK-16690:


 Summary: Refactor StreamTaskTest to reuse TestTaskBuilder and 
MockStreamTaskBuilder
 Key: FLINK-16690
 URL: https://issues.apache.org/jira/browse/FLINK-16690
 Project: Flink
  Issue Type: Task
  Components: Runtime / Task, Tests
Reporter: Zhijiang
 Fix For: 1.11.0


We can reuse existing TestTaskBuilder and MockStreamTaskBuilder for 
constructing Task and StreamTask easily in tests to simplify StreamTaskTest 
case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16691) Adjust Python UDF document to remind users to install PyFlink on the cluster

2020-03-20 Thread Huang Xingbo (Jira)
Huang Xingbo created FLINK-16691:


 Summary: Adjust Python UDF document to remind users to install 
PyFlink on the cluster
 Key: FLINK-16691
 URL: https://issues.apache.org/jira/browse/FLINK-16691
 Project: Flink
  Issue Type: Improvement
  Components: Documentation
Reporter: Huang Xingbo
 Fix For: 1.11.0


Since FLINK-16304 has merged, PyFlink won't append the pyflink.zip to the 
PYTHONPATH of Python UDF worker. Users need to install PyFlink on cluster 
manually. We need to emphasize this on the documentation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16692) flink joblistener can register from config

2020-03-20 Thread jackylau (Jira)
jackylau created FLINK-16692:


 Summary: flink joblistener can register from config
 Key: FLINK-16692
 URL: https://issues.apache.org/jira/browse/FLINK-16692
 Project: Flink
  Issue Type: Improvement
  Components: API / Core
Affects Versions: 1.10.0
Reporter: jackylau
 Fix For: 1.11.0


we should do as spark does ,which can register listener from conf such as 

"spark.extraListeners"。 And it will be convinient for users when users just 
want to set hook



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16693) Legacy planner incompatible with Timestamp backed by LocalDateTime

2020-03-20 Thread Paul Lin (Jira)
Paul Lin created FLINK-16693:


 Summary: Legacy planner incompatible with Timestamp backed by 
LocalDateTime
 Key: FLINK-16693
 URL: https://issues.apache.org/jira/browse/FLINK-16693
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Legacy Planner
Affects Versions: 1.10.0
Reporter: Paul Lin


Recently I upgraded a simple application that inserts static data into a table 
from 1.9.0 to 1.10.0, and 
encountered a timestamp type incompatibility problem during the table sink 
validation.

The SQL is like:
```
insert into kafka.test.tbl_a # schema: (user_name STRING, user_id INT, 
login_time TIMESTAMP)
select ("ann", 1000, TIMESTAMP "2019-12-30 00:00:00")
```

And the error thrown:
```
Field types of query result and registered TableSink `kafka`.`test`.`tbl_a` do 
not match.
  Query result schema: [EXPR$0: String, EXPR$1: Integer, EXPR$2: Timestamp]
  TableSink schema:[user_name: String, user_id: Integer, login_time: 
LocalDateTime]
```

After some digging, I found the root cause might be that since FLINK-14645 
timestamp fields defined via TableFactory had been bridged to LocalDateTime, 
but timestamp functions are still backed by java.sql.Timestamp.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16694) Resuming Externalized Checkpoint end-to-end test failed on travis

2020-03-20 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-16694:
--

 Summary: Resuming Externalized Checkpoint end-to-end test failed 
on travis
 Key: FLINK-16694
 URL: https://issues.apache.org/jira/browse/FLINK-16694
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Metrics, Tests
Affects Versions: 1.9.2
Reporter: Piotr Nowojski


Running 'Resuming Externalized Checkpoint (rocks, incremental, scale down) 
end-to-end test' failed on travis with the error:

{code:java}
The job exceeded the maximum log length, and has been terminated.
{code}

https://api.travis-ci.org/v3/job/664469537/log.txt

Probably because of metrics logging.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16696) Incomplete REST API docs

2020-03-20 Thread Roman Khachatryan (Jira)
Roman Khachatryan created FLINK-16696:
-

 Summary: Incomplete REST API docs
 Key: FLINK-16696
 URL: https://issues.apache.org/jira/browse/FLINK-16696
 Project: Flink
  Issue Type: Bug
  Components: Documentation, Runtime / REST
Affects Versions: 1.10.0
Reporter: Roman Khachatryan


Rest api docs doesn't provide enough details about 
/jobs/:jobid/savepoints/:triggerid endpoint (how to get completed savepoint 
location):

[https://ci.apache.org/projects/flink/flink-docs-stable/monitoring/rest_api.html#api]

 

Javadoc covers it better:

[https://ci.apache.org/projects/flink/flink-docs-release-1.10/api/java/index.html?org/apache/flink/runtime/rest/handler/job/savepoints/SavepointHandlers.html]

 

[http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Savepoint-Location-from-Flink-REST-API-td33808.html]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16695) PrometheusReporterEndToEndITCase failed on travis

2020-03-20 Thread Piotr Nowojski (Jira)
Piotr Nowojski created FLINK-16695:
--

 Summary: PrometheusReporterEndToEndITCase failed on travis
 Key: FLINK-16695
 URL: https://issues.apache.org/jira/browse/FLINK-16695
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Metrics
Affects Versions: 1.9.2
Reporter: Piotr Nowojski


https://api.travis-ci.org/v3/job/664469523/log.txt

{code:java}
03:22:54.057 [INFO] ---
03:22:54.060 [INFO]  T E S T S
03:22:54.061 [INFO] ---
03:22:54.373 [INFO] Running 
org.apache.flink.metrics.prometheus.tests.PrometheusReporterEndToEndITCase
03:22:55.634 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 1.256 s <<< FAILURE! - in 
org.apache.flink.metrics.prometheus.tests.PrometheusReporterEndToEndITCase
03:22:55.637 [ERROR] 
testReporter(org.apache.flink.metrics.prometheus.tests.PrometheusReporterEndToEndITCase)
  Time elapsed: 0.879 s  <<< ERROR!
java.io.IOException: Process execution failed due error. Error output:
at 
org.apache.flink.metrics.prometheus.tests.PrometheusReporterEndToEndITCase.testReporter(PrometheusReporterEndToEndITCase.java:118)

03:22:55.964 [INFO] 
03:22:55.964 [INFO] Results:
03:22:55.964 [INFO] 
03:22:55.964 [ERROR] Errors: 
03:22:55.964 [ERROR]   PrometheusReporterEndToEndITCase.testReporter:118 » IO 
Process execution faile...
03:22:55.964 [INFO] 
03:22:55.964 [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16697) Disable JMX rebinding

2020-03-20 Thread Colm O hEigeartaigh (Jira)
Colm O hEigeartaigh created FLINK-16697:
---

 Summary: Disable JMX rebinding
 Key: FLINK-16697
 URL: https://issues.apache.org/jira/browse/FLINK-16697
 Project: Flink
  Issue Type: Improvement
Reporter: Colm O hEigeartaigh
 Fix For: 1.11.0


Disable JMX rebinding.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16698) fllink need catalog listener to do such as preCreate/PreDrop* afterCreate/AfterDrop* things

2020-03-20 Thread jackylau (Jira)
jackylau created FLINK-16698:


 Summary: fllink need catalog listener to do such as 
preCreate/PreDrop* afterCreate/AfterDrop* things
 Key: FLINK-16698
 URL: https://issues.apache.org/jira/browse/FLINK-16698
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.10.0
Reporter: jackylau
 Fix For: 1.11.0


In order to support other things such as atlas or authentication, i think flink 
need catalog listener to do such as preCreate/PreDrop* afterCreate/AfterDrop* 
things, just like spark/hive does



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16699) Support accessing secured services via K8s secrets

2020-03-20 Thread Canbin Zheng (Jira)
Canbin Zheng created FLINK-16699:


 Summary: Support accessing secured services via K8s secrets
 Key: FLINK-16699
 URL: https://issues.apache.org/jira/browse/FLINK-16699
 Project: Flink
  Issue Type: Improvement
  Components: Deployment / Kubernetes
Reporter: Canbin Zheng
 Fix For: 1.11.0


Kubernetes [Secrets|https://kubernetes.io/docs/concepts/configuration/secret/] 
can be used to provide credentials for a Flink application to access secured 
services.  This ticket proposes to 
 # Support to mount user-specified K8s Secrets into the JobManager/TaskManager 
Container
 # Support to use a user-specified K8s Secret through an environment variable.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Flink YARN app terminated before the client receives the result

2020-03-20 Thread DONG, Weike
Hi Tison & Till,

Changing *thenAccept *into *thenAcceptAsync *in the
MiniDispatcher#cancelJob does not help to solve the problem in my
environment. However, I have found that adding a* Thread.sleep(2000) *before
the return of JobCancellationHandler#handleRequest solved the problem (at
least the symptom goes away). As this is only a dirty hack, I will try to
get a more decent solution to this problem.

Sincerely,
Weike

On Tue, Mar 17, 2020 at 11:11 PM tison  wrote:

> JIRA created as https://jira.apache.org/jira/browse/FLINK-16637
>
> Best,
> tison.
>
>
> Till Rohrmann  于2020年3月17日周二 下午5:57写道:
>
>>  @Tison could you create an issue to track the problem. Please also link
>> the uploaded log file for further debugging.
>>
>> I think the reason why it worked in Flink 1.9 could have been that we had
>> a async callback in the longer chain which broke the flow of execution and
>> allowed to send the response. This is no longer the case. As an easy fix
>> one could change thenAccept into thenAcceptAsync in the
>> MiniDispatcher#cancelJob. But this only solves symptoms. Maybe we should
>> think about allowing not only StatusHandler to close asynchronously. At the
>> moment we say that all other handler shut down immediately (see
>> AbstractHandler#closeHandlerAsync). But the problem with this change would
>> be that all handler would become stateful because they would need to
>> remember whether a request is currently ongoing or not.
>>
>> Cheers,
>> Till
>>
>> On Tue, Mar 17, 2020 at 9:01 AM DONG, Weike 
>> wrote:
>>
>>> Hi Tison & Till and all,
>>>
>>> I have uploaded the client, taskmanager and jobmanager log to Gist (
>>> https://gist.github.com/kylemeow/500b6567368316ec6f5b8f99b469a49f), and
>>> I
>>> can reproduce this bug every time when trying to cancel Flink 1.10 jobs
>>> on
>>> YARN.
>>>
>>> Besides, in earlier Flink versions like 1.9, the REST API for *cancelling
>>> job with a savepoint *sometimes throws exceptions to the client side due
>>> to
>>> early shutdown of the server, even though the savepoint was successfully
>>> completed by reviewing the log, however when using the newly introduced
>>> *stop* API, that bug disappeared, however, *cancel* API seems to be buggy
>>> now.
>>>
>>> Best,
>>> Weike
>>>
>>> On Tue, Mar 17, 2020 at 10:17 AM tison  wrote:
>>>
>>> > edit: previously after the cancellation we have a longer call chain to
>>> > #jobReachedGloballyTerminalState which does the archive job & JM
>>> graceful
>>> > showdown, which might take some time so that ...
>>> >
>>> > Best,
>>> > tison.
>>> >
>>> >
>>> > tison  于2020年3月17日周二 上午10:13写道:
>>> >
>>> >> Hi Weike & Till,
>>> >>
>>> >> I agree with Till and it is also the analysis from my side. However,
>>> it
>>> >> seems even if we don't have FLINK-15116, it is still possible that we
>>> >> complete the cancel future but the cluster got shutdown before it
>>> properly
>>> >> delivered the response.
>>> >>
>>> >> There is one thing strange that this behavior almost reproducible, it
>>> >> should be a possible order but not always. Maybe previous we have to
>>> >> firstly cancel the job which has a long call chain so that it happens
>>> we
>>> >> have enough time to delivered the response.
>>> >>
>>> >> But the resolution looks like we introduce some
>>> >> synchronization/finalization logics that clear these outstanding
>>> future
>>> >> with best effort before the cluster(RestServer) down.
>>> >>
>>> >> Best,
>>> >> tison.
>>> >>
>>> >>
>>> >> Till Rohrmann  于2020年3月17日周二 上午4:12写道:
>>> >>
>>> >>> Hi Weike,
>>> >>>
>>> >>> could you share the complete logs with us? Attachments are being
>>> >>> filtered out by the Apache mail server but it works if you upload
>>> the logs
>>> >>> somewhere (e.g. https://gist.github.com/) and then share the link
>>> with
>>> >>> us. Ideally you run the cluster with DEBUG log settings.
>>> >>>
>>> >>> I assume that you are running Flink 1.10, right?
>>> >>>
>>> >>> My suspicion is that this behaviour has been introduced with
>>> FLINK-15116
>>> >>> [1]. It looks as if we complete the shutdown future in
>>> >>> MiniDispatcher#cancelJob before we return the response to the
>>> >>> RestClusterClient. My guess is that this triggers the shutdown of the
>>> >>> RestServer which then is not able to serve the response to the
>>> client. I'm
>>> >>> pulling in Aljoscha and Tison who introduced this change. They might
>>> be
>>> >>> able to verify my theory and propose a solution for it.
>>> >>>
>>> >>> [1] https://issues.apache.org/jira/browse/FLINK-15116
>>> >>>
>>> >>> Cheers,
>>> >>> Till
>>> >>>
>>> >>> On Fri, Mar 13, 2020 at 7:50 AM DONG, Weike >> >
>>> >>> wrote:
>>> >>>
>>>  Hi Yangze and all,
>>> 
>>>  I have tried numerous times, and this behavior persists.
>>> 
>>>  Below is the tail log of taskmanager.log:
>>> 
>>>  2020-03-13 12:06:14.240 [flink-akka.actor.default-dispatcher-3] INFO
>>>   org.apache.flink.runtime.taskexecutor.slot.TaskSlotTableImpl  -
>>> Free slot
>>> >>>

Re: Flink YARN app terminated before the client receives the result

2020-03-20 Thread Till Rohrmann
Yes you are right that `thenAcceptAsync` only breaks the control flow but
it does not guarantee that the `RestServer` has actually sent the response
to the client. Maybe we also need something similar to FLINK-10309 [1]. The
problem I see with this approach is that it makes all RestHandlers stateful.

[1] https://issues.apache.org/jira/browse/FLINK-10309

Cheers,
Till

On Fri, Mar 20, 2020 at 2:26 PM DONG, Weike  wrote:

> Hi Tison & Till,
>
> Changing *thenAccept *into *thenAcceptAsync *in the
> MiniDispatcher#cancelJob does not help to solve the problem in my
> environment. However, I have found that adding a* Thread.sleep(2000) *before
> the return of JobCancellationHandler#handleRequest solved the problem (at
> least the symptom goes away). As this is only a dirty hack, I will try to
> get a more decent solution to this problem.
>
> Sincerely,
> Weike
>
> On Tue, Mar 17, 2020 at 11:11 PM tison  wrote:
>
>> JIRA created as https://jira.apache.org/jira/browse/FLINK-16637
>>
>> Best,
>> tison.
>>
>>
>> Till Rohrmann  于2020年3月17日周二 下午5:57写道:
>>
>>>  @Tison could you create an issue to track the problem. Please also link
>>> the uploaded log file for further debugging.
>>>
>>> I think the reason why it worked in Flink 1.9 could have been that we
>>> had a async callback in the longer chain which broke the flow of execution
>>> and allowed to send the response. This is no longer the case. As an easy
>>> fix one could change thenAccept into thenAcceptAsync in the
>>> MiniDispatcher#cancelJob. But this only solves symptoms. Maybe we should
>>> think about allowing not only StatusHandler to close asynchronously. At the
>>> moment we say that all other handler shut down immediately (see
>>> AbstractHandler#closeHandlerAsync). But the problem with this change would
>>> be that all handler would become stateful because they would need to
>>> remember whether a request is currently ongoing or not.
>>>
>>> Cheers,
>>> Till
>>>
>>> On Tue, Mar 17, 2020 at 9:01 AM DONG, Weike 
>>> wrote:
>>>
 Hi Tison & Till and all,

 I have uploaded the client, taskmanager and jobmanager log to Gist (
 https://gist.github.com/kylemeow/500b6567368316ec6f5b8f99b469a49f),
 and I
 can reproduce this bug every time when trying to cancel Flink 1.10 jobs
 on
 YARN.

 Besides, in earlier Flink versions like 1.9, the REST API for
 *cancelling
 job with a savepoint *sometimes throws exceptions to the client side
 due to
 early shutdown of the server, even though the savepoint was successfully
 completed by reviewing the log, however when using the newly introduced
 *stop* API, that bug disappeared, however, *cancel* API seems to be
 buggy
 now.

 Best,
 Weike

 On Tue, Mar 17, 2020 at 10:17 AM tison  wrote:

 > edit: previously after the cancellation we have a longer call chain to
 > #jobReachedGloballyTerminalState which does the archive job & JM
 graceful
 > showdown, which might take some time so that ...
 >
 > Best,
 > tison.
 >
 >
 > tison  于2020年3月17日周二 上午10:13写道:
 >
 >> Hi Weike & Till,
 >>
 >> I agree with Till and it is also the analysis from my side. However,
 it
 >> seems even if we don't have FLINK-15116, it is still possible that we
 >> complete the cancel future but the cluster got shutdown before it
 properly
 >> delivered the response.
 >>
 >> There is one thing strange that this behavior almost reproducible, it
 >> should be a possible order but not always. Maybe previous we have to
 >> firstly cancel the job which has a long call chain so that it
 happens we
 >> have enough time to delivered the response.
 >>
 >> But the resolution looks like we introduce some
 >> synchronization/finalization logics that clear these outstanding
 future
 >> with best effort before the cluster(RestServer) down.
 >>
 >> Best,
 >> tison.
 >>
 >>
 >> Till Rohrmann  于2020年3月17日周二 上午4:12写道:
 >>
 >>> Hi Weike,
 >>>
 >>> could you share the complete logs with us? Attachments are being
 >>> filtered out by the Apache mail server but it works if you upload
 the logs
 >>> somewhere (e.g. https://gist.github.com/) and then share the link
 with
 >>> us. Ideally you run the cluster with DEBUG log settings.
 >>>
 >>> I assume that you are running Flink 1.10, right?
 >>>
 >>> My suspicion is that this behaviour has been introduced with
 FLINK-15116
 >>> [1]. It looks as if we complete the shutdown future in
 >>> MiniDispatcher#cancelJob before we return the response to the
 >>> RestClusterClient. My guess is that this triggers the shutdown of
 the
 >>> RestServer which then is not able to serve the response to the
 client. I'm
 >>> pulling in Aljoscha and Tison who introduced this change. They
 might be
 >>> able to verify my theory

Re: [DISCUSS] Link Stateful Functions from the Flink Website

2020-03-20 Thread Till Rohrmann
+1 for the list of proposed changes

 - What is "Stateful Functions"? link
  - Documentation -> Stateful Functions Docs
  - StateFun on GitHub

Cheers,
Till

On Thu, Mar 12, 2020 at 12:22 PM Stephan Ewen  wrote:

> Thank you all, for chiming in!
>
> I like the ideas suggested to update the website, I am also +1 for an
> update the the main image and make sure we show. Maybe something like [1],
> which would also help to increase SQL visibility.
> I guess that would be a bit of a bigger effort, and I would kick off that
> bigger discussion in a few days or so.
>
Until then, as an immediate "band aid", I would then add these here (unless
> there are concerns):
>   - What is "Stateful Functions"? link
>   - Documentation -> Stateful Functions Docs
>   - StateFun on GitHub
>
> Best,
> Stephan
>
> [1]
>
> https://onedrive.live.com/?authkey=%21ABxm14umFhJOXFQ&cid=E2B4A3561C93FA7B&id=E2B4A3561C93FA7B%213535&parId=E2B4A3561C93FA7B%21342&o=OneUp
>
>
> On Thu, Mar 12, 2020 at 7:39 AM Zhijiang  .invalid>
> wrote:
>
> > Thanks for launching this discussion @Stephan!
> >
> > +1 for the idea of linking stateful functions in Flink website. It is
> time
> > to increase the exposure of this secret weapon for attracting more
> > attentions.
> > It is benefit for interested users to try it out in desired scenarios and
> > also benefit for developing it well.
> >
> >
> > Best,
> > Zhijiang
> > --
> > From:Hequn Cheng 
> > Send Time:2020 Mar. 11 (Wed.) 15:36
> > To:dev 
> > Subject:Re: [DISCUSS] Link Stateful Functions from the Flink Website
> >
> > Hi,
> >
> > Thanks a lot for raising the discussion @Stephan.
> > +1 to increase the visibilities of the Stateful Functions.
> >
> > Another option I'm think is adding a section(named Stateful Functions or
> > Flink Projects?)
> > under the "Latest Blog Posts". The advantage is we can add a picture and
> > some descriptions here.
> > A picture may attract more attention from the users when he/she visit the
> > website.
> > The picture can be the same one in [1].
> >
> > In the future, if we have multiple Flink individual projects, we can also
> > turn the section into a Table list
> > to expose all of them.
> >
> > What do you think?
> >
> > Best,
> > Hequn
> >
> > [1] https://ci.apache.org/projects/flink/flink-statefun-docs-master/
> >
> > On Tue, Mar 10, 2020 at 11:13 PM Tzu-Li (Gordon) Tai <
> tzuli...@apache.org>
> > wrote:
> >
> > > +1 on the suggestion to add "What is Stateful Functions" to the left
> > > navigation bar.
> > > That might also mean it would be nice to have a slight rework to the
> main
> > > image on the website, illustrating the use cases of Flink (this one
> [1]).
> > > On the image it does mention "Event-Driven Applications", but there's
> > > somewhat missing a more direct connection from that term to the
> Stateful
> > > Functions project.
> > >
> > > As for what the "What is Stateful Functions?" button directs to, maybe
> > that
> > > should point to a general concepts page. Initially, we can begin with
> the
> > > README contents on the project repo [2].
> > > As for the actual Statefun documentation link [3], I think we should
> link
> > > that from an item in the "Documentation" pull-down list.
> > >
> > > One last thing to increase visibility of the Statefun project just a
> bit
> > > more:
> > > There's a "Flink on Github" button on the very bottom of the navigation
> > > bar.
> > > What do you think about adding a "Flink Stateful Functions on Github"
> > > button there as well?
> > >
> > > Cheers,
> > > Gordon
> > >
> > > [1]
> > >
> > >
> >
> https://github.com/apache/flink-web/blob/asf-site/img/flink-home-graphic.png
> > > [2] https://github.com/apache/flink-statefun/blob/master/README.md
> > > [3] https://ci.apache.org/projects/flink/flink-statefun-docs-master/
> > >
> > >
> > > On Tue, Mar 10, 2020 at 8:29 PM Yu Li  wrote:
> > >
> > > > +1 on adding a "What is Stateful Functions" link below the "What is
> > > Apache
> > > > Flink" entry and integrating into the Flink docs gradually (instead
> of
> > > > hiding it behind until fully integrated).
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Tue, 10 Mar 2020 at 19:33, Stephan Ewen  wrote:
> > > >
> > > > > Hi all!
> > > > >
> > > > > I think it would be nice to mention Stateful Function on the Flink
> > > > website.
> > > > > At the moment, Stateful Functions is very hard to discover, and
> with
> > > the
> > > > > first release of it under Apache Flink, it would be a good time to
> > > change
> > > > > that.
> > > > >
> > > > > My proposal would be to add a "What is Stateful Functions?" below
> the
> > > > "What
> > > > > is Apache Flink" entry in the sidenav, and point it to
> > > > > https://ci.apache.org/projects/flink/flink-statefun-docs-master/
> > > > > It is not ideal, yet, but it may serve as an intermediate solution
> > > until
> > > > we
> > > > > can make more involved attempt to rethink the website 

[jira] [Created] (FLINK-16700) Document Kinesis I/O Module

2020-03-20 Thread Seth Wiesman (Jira)
Seth Wiesman created FLINK-16700:


 Summary: Document Kinesis I/O Module
 Key: FLINK-16700
 URL: https://issues.apache.org/jira/browse/FLINK-16700
 Project: Flink
  Issue Type: Improvement
  Components: Stateful Functions
Reporter: Seth Wiesman
 Fix For: statefun-1.1






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] JMX remote monitoring integration with Flink

2020-03-20 Thread Till Rohrmann
Hi Rong Rong,

you are right that it JMX is quite hard to use in production due to the
mentioned problems with discovering the port. There is actually already a
JIRA ticket [1] discussing this problem. It just never gained enough
traction to be tackled.

In general, I agree that it would be nice to have a REST API with which one
can obtain JVM specific information about a Flink process. This information
could also contain a potentially open JMX port.

[1] https://issues.apache.org/jira/browse/FLINK-5552

Cheers,
Till

On Fri, Mar 13, 2020 at 2:02 PM Forward Xu  wrote:

> Hi  RongRong,
> Thank you for bringing this discussion, it is indeed not appropriate to
> occupy additional ports in the production environment to provide jmxrmi
> services. I think [2] RestApi or JobManager/TaskManager UI is a good idea.
>
> Best,
> Forward
>
> Rong Rong  于2020年3月13日周五 下午8:54写道:
>
> > Hi All,
> >
> > Has anyone tried to manage production Flink applications through JMX
> remote
> > monitoring & management[1]?
> >
> > We were experimenting to enable JMXRMI on Flink by default in production
> > and would like to share some of our thoughts:
> > ** Is there any straightforward way to dynamically allocate JMXRMI remote
> > ports?*
> >   - It is unrealistic to use JMXRMI static port in production
> environment,
> > however we have to go all around the logging system to make the dynamic
> > remote port number printed out in the log files - this seems very
> > inconvenient.
> >   - I think it would be very handy if we can show the JMXRMI remote
> > information on JobManager/TaskManager UI, or via REST API. (I am thinking
> > about something similar to [2])
> >
> > ** Is there any performance overhead enabling JMX for a Flink
> application?*
> >   - We haven't seen any significant performance impact in our
> experiments.
> > However the experiment is not that well-rounded and the observation is
> > inconclusive.
> >   - I was wondering would it be a good idea to put some benchmark in the
> > regression tests[3] to see what's the overhead would be?
> >
> > It would be highly appreciated if anyone could share some experiences or
> > provide any suggestions in how we can improve the JMX remote integration
> > with Flink.
> >
> >
> > Thanks,
> > Rong
> >
> >
> > [1]
> >
> >
> https://docs.oracle.com/javase/8/docs/technotes/guides/management/agent.html
> > [2]
> >
> https://samza.apache.org/learn/documentation/0.14/jobs/web-ui-rest-api.html
> > [3] http://codespeed.dak8s.net:8000/
> >
>


[jira] [Created] (FLINK-16701) Elasticsearch sink support alias for indices.

2020-03-20 Thread Leonard Xu (Jira)
Leonard Xu created FLINK-16701:
--

 Summary: Elasticsearch sink support alias for indices.
 Key: FLINK-16701
 URL: https://issues.apache.org/jira/browse/FLINK-16701
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / ElasticSearch
Affects Versions: 1.11.0
Reporter: Leonard Xu
 Fix For: 1.11.0


This is related to 
[FLINK-15400|https://issues.apache.org/jira/browse/FLINK-15400]  FLINK-15400 
will only support dynamic index, and do not support the alias.  Because 
supporting alias both need in Streaming API and Table API, so I think split the 
original design to two PRs make sense.
PR for FLINK-15400:

        support dynamic index for ElasticsearchTableSink
PR for this issue:

          support alias for Streaming API and Table API



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16702) develop JDBCCatalogFactory for service discovery

2020-03-20 Thread Bowen Li (Jira)
Bowen Li created FLINK-16702:


 Summary: develop JDBCCatalogFactory for service discovery
 Key: FLINK-16702
 URL: https://issues.apache.org/jira/browse/FLINK-16702
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / JDBC
Reporter: Bowen Li
Assignee: Bowen Li
 Fix For: 1.11.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (FLINK-16703) AkkaRpcActor state machine does record transition to terminating state.

2020-03-20 Thread Dmitri Chmelev (Jira)
Dmitri Chmelev created FLINK-16703:
--

 Summary: AkkaRpcActor state machine does record transition to 
terminating state.
 Key: FLINK-16703
 URL: https://issues.apache.org/jira/browse/FLINK-16703
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.10.0, 1.9.0, 1.8.0, 1.11.0, 2.0.0
Reporter: Dmitri Chmelev


As part of FLINK-11551, the state machine of AkkaRpcActor has been updated to 
include 'terminating' and 'terminated' states. However, when actor termination 
request is handled, the resulting 'terminating' state is not updated by the FSM.

[https://github.com/apache/flink/blame/master/flink-runtime/src/main/java/org/apache/flink/runtime/rpc/akka/AkkaRpcActor.java#L175]

As a side-effect, {{isRunning()}} predicate can return that the actor is still 
running after terminate was initiated and to still handle messages.

I believe the fix is trivial and the private field {{state}} should be updated 
with the return value of the call to {{state.terminate()}}.

Feel free to adjust the priority accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] Introduce TableFactory for StatefulSequenceSource

2020-03-20 Thread Bowen Li
+1.

I would suggest to take a step even further and see what users really need
to test/try/play with table API and Flink SQL. Besides this one, here're
some more sources and sinks that I have developed or used previously to
facilitate building Flink table/SQL pipelines.


   1. random input data source
  - should generate random data at a specified rate according to schema
  - purposes
 - test Flink pipeline and data can end up in external storage
 correctly
 - stress test Flink sink as well as tuning up external storage
  2. print data sink
  - should print data in row format in console
  - purposes
 - make it easier to test Flink SQL job e2e in IDE
 - test Flink pipeline and ensure output data format/value is
 correct
  3. no output data sink
  - just swallow output data without doing anything
  - purpose
 - evaluate and tune performance of Flink source and the whole
 pipeline. Users' don't need to worry about sink back pressure

These may be taken into consideration all together as an effort to lower
the threshold of running Flink SQL/table API, and facilitate users' daily
work.

Cheers,
Bowen


On Thu, Mar 19, 2020 at 10:32 PM Jingsong Li  wrote:

> Hi all,
>
> I heard some users complain that table is difficult to test. Now with SQL
> client, users are more and more inclined to use it to test rather than
> program.
> The most common example is Kafka source. If users need to test their SQL
> output and checkpoint, they need to:
>
> - 1.Launch a Kafka standalone, create a Kafka topic .
> - 2.Write a program, mock input records, and produce records to Kafka
> topic.
> - 3.Then test in Flink.
>
> The step 1 and 2 are annoying, although this test is E2E.
>
> Then I found StatefulSequenceSource, it is very good because it has deal
> with checkpoint things, so it is very good to checkpoint mechanism.Usually,
> users are turned on checkpoint in production.
>
> With computed columns, user are easy to create a sequence source DDL same
> to Kafka DDL. Then they can test inside Flink, don't need launch other
> things.
>
> Have you consider this? What do you think?
>
> CC: @Aljoscha Krettek  the author
> of StatefulSequenceSource.
>
> Best,
> Jingsong Lee
>


Re: [DISCUSS] Releasing Flink 1.10.1

2020-03-20 Thread Yu Li
Hi All,

Here is the status update of issues in 1.10.1 watch list:

* Blockers (7)

  - [Under Discussion] FLINK-16018 Improve error reporting when submitting
batch job (instead of AskTimeoutException)

  - [Under Discussion] FLINK-16142 Memory Leak causes Metaspace OOM error
on repeated job submission

  - [PR Submitted] FLINK-16170 SearchTemplateRequest ClassNotFoundException
when use flink-sql-connector-elasticsearch7

  - [PR Submitted] FLINK-16262 Class loader problem with
FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory

  - [Closed] FLINK-16406 Increase default value for JVM Metaspace to
minimise its OutOfMemoryError

  - [Closed] FLINK-16454 Update the copyright year in NOTICE files

  - [Open] FLINK-16576 State inconsistency on restore with memory state
backends


* Critical (4)

  - [Closed] FLINK-16047 Blink planner produces wrong aggregate results
with state clean up

  - [PR Submitted] FLINK-16070 Blink planner can not extract correct unique
key for UpsertStreamTableSink

  - [Under Discussion] FLINK-16225 Metaspace Out Of Memory should be
handled as Fatal Error in TaskManager

  - [Open] FLINK-16408 Bind user code class loader to lifetime of a slot

Please let me know if you see any new blockers to add to the list. Thanks.

Best Regards,
Yu


On Wed, 18 Mar 2020 at 00:11, Yu Li  wrote:

> Thanks for the updates Till!
>
> For FLINK-16018, maybe we could create two sub-tasks for easy and complete
> fix separately, and only include the easy one in 1.10.1? Or please just
> feel free to postpone the whole task to 1.10.2 if "all or nothing" policy
> is preferred (smile). Thanks.
>
> Best Regards,
> Yu
>
>
> On Tue, 17 Mar 2020 at 23:33, Till Rohrmann  wrote:
>
>> +1 for a soonish bug fix release. Thanks for volunteering as our release
>> manager Yu.
>>
>> I think we can soon merge the increase of metaspace size and improving the
>> error message. The assumption is that we currently don't have too many
>> small Flink 1.10 deployments with a process size <= 1GB. Of course, the
>> sooner we release the bug fix release, the fewer deployments will be
>> affected by this change.
>>
>> For FLINK-16018, I think there would be an easy solution which we could
>> include in the bug fix release. The proper fix will most likely take a bit
>> longer, though.
>>
>> Cheers,
>> Till
>>
>> On Fri, Mar 13, 2020 at 8:08 PM Andrey Zagrebin 
>> wrote:
>>
>> > > @Andrey and @Xintong - could we have a quick poll on the user mailing
>> > list
>> > > about increasing the metaspace size in Flink 1.10.1? Specifically
>> asking
>> > > for who has very small TM setups?
>> >
>> > There has been a survey about this topic since 10 days:
>> >
>> > `[Survey] Default size for the new JVM Metaspace limit in 1.10`
>> > I can ask there about the small TM setups.
>> >
>> > On Fri, Mar 13, 2020 at 5:19 PM Yu Li  wrote:
>> >
>> > > Another blocker for 1.10.1: FLINK-16576 State inconsistency on restore
>> > with
>> > > memory state backends
>> > >
>> > > Let me recompile the watching list with recent feedbacks. There're
>> > totally
>> > > 45 issues with Blocker/Critical priority for 1.10.1, out of which 14
>> > > already resolved and 31 left, and the below ones are watched (meaning
>> > that
>> > > once the below ones got fixed, we will kick of the releasing with left
>> > ones
>> > > postponed to 1.10.2 unless objections):
>> > >
>> > > * Blockers (7)
>> > >   - [Under Discussion] FLINK-16018 Improve error reporting when
>> > submitting
>> > > batch job (instead of AskTimeoutException)
>> > >   - [Under Discussion] FLINK-16142 Memory Leak causes Metaspace OOM
>> error
>> > > on repeated job submission
>> > >   - [PR Submitted] FLINK-16170 SearchTemplateRequest
>> > ClassNotFoundException
>> > > when use flink-sql-connector-elasticsearch7
>> > >   - [PR Submitted] FLINK-16262 Class loader problem with
>> > > FlinkKafkaProducer.Semantic.EXACTLY_ONCE and usrlib directory
>> > >   - [Closed] FLINK-16406 Increase default value for JVM Metaspace to
>> > > minimise its OutOfMemoryError
>> > >   - [Open] FLINK-16454 Update the copyright year in NOTICE files
>> > >   - [Open] FLINK-16576 State inconsistency on restore with memory
>> state
>> > > backends
>> > >
>> > > * Critical (4)
>> > >   - [Open] FLINK-16047 Blink planner produces wrong aggregate results
>> > with
>> > > state clean up
>> > >   - [PR Submitted] FLINK-16070 Blink planner can not extract correct
>> > unique
>> > > key for UpsertStreamTableSink
>> > >   - [Under Discussion] FLINK-16225 Metaspace Out Of Memory should be
>> > > handled as Fatal Error in TaskManager
>> > >   - [Open] FLINK-16408 Bind user code class loader to lifetime of a
>> slot
>> > >
>> > > Please raise your hand if you find any other issues should be put into
>> > this
>> > > list, thanks.
>> > >
>> > > Please also note that 1.10.2 version is already created (thanks for
>> the
>> > > help @jincheng), and feel free to adjust/assign fix version to it when
>> > > necessary.
>> > >
>> > > Best Regards,