Re: [DISCUSS] Flink SQL DDL Design

2018-12-13 Thread Jark Wu
Hi all,

Here are a bunch of my thoughts:

8). support row/map/array data type
That's fine with me if we want to support them in the MVP. In my mind, we
can have the field type syntax like this:

```
filedType ::=
{
simpleType
 | MAP
 | ARRAY
 | ROW
}
```

I have included this in @Shuyi's summary doc [1] . Please leave feedbacks
there!

[1]
https://docs.google.com/document/d/1ug1-aVBSCxZQk58kR-yaK2ETCgL3zg0eDUVGCnW2V9E/edit

3) SOURCE / SINK / BOTH
@Timo, CREATE TABLE statement is registering a virtual table in the session
or catalog. I don't think it is immutable, as we might also want to support
CREATE INDEX statements in the future. On the other hand, ACL is not a part
of the table definition, it should belong to the permission system which is
usually stored in somewhere else. So GRANT/INVOKE sounds like a more
standard option.

7) Table Update Mode
I agree with @Shuyi that table update mode can be left out from the MVP.
Because IMO, the update mode will not break the current MVP design. It
should be something to add, like the CHANGE_FLAG you proposed. We can
continue this discussion when we finalize the MVP.

Meanwhile, the update mode is a big topic which may involve several weeks
to discuss. For example, (a) do we support CHANGE_FLAG when the table
supports upsert (or when the table defined a primary key)?  (b) the
CHANGE_FLAG should support write and read both. (c) currently, we only
support true (add) and false (retract) flag type, are they enough? (d) How
to connect an external storage which also support insert/delete flag like
mysql binlog?

Regarding to the CHANGE_FLAG @Timo proposed, I think this is a good
direction. But should isRetraction be a physical field and make CHANGE_FLAG
like a constraint on that? If yes, then what the type of isRetraction?

4.b) Ingesting and writing timestamps to systems.
@Shuyi, PERSISTED can solve the problem of the field is not physically
stored. However, it doesn't solve the problem that how to write a field
back to the computed column, because "A computed column cannot be the
target of an INSERT or UPDATE statement" even if the computed column is
persisted. If we want to write a rowtime back the the external system, the
DML should look like this: "INSERT INTO sink SELECT a, rowtime FROM
source". The point is that the `rowtime` must be specified in the INSERT
statement, that's why I hope the `rowtime` field in Table is not a computed
column. See more information about PERSISTED [2] [3].

Another point to consider is SYSTEMROWTIME() only solve reading timestamp
from message header in systems. There are many similar requirements here,
such as reading `topic`, `partition`, `offset` or custom properties from
message headers, do we plan to support a bunch of built-in functions like
SYSTEMROWTIME()?  Do we have some clean and easy way for this?

[2]:
https://docs.microsoft.com/en-us/sql/t-sql/statements/alter-table-computed-column-definition-transact-sql?view=sql-server-2017
[3]:
https://stackoverflow.com/questions/51390531/sql-server-persisted-computed-columns-versus-actual-normal-column

Looking forward to collaborate with you guys!

Best,
Jark


On Thu, 13 Dec 2018 at 01:38, Rong Rong  wrote:

> Thanks for the summary effort @shuyi. Sorry for jumping in the discussion
> so late.
>
> As of the scope of MVP, I think we might want to consider adding "table
> update mode" problem to it. I agree with @timo that might not be easily
> changed in the future if the flags has to be part of the schema/column
> definition.
>
> Regarding the components under discussion.
> 4) Event-Time Attributes and Watermarks
> b, c) I actually like the special indicator way @fabian suggested to hint
> Flink to read time attributes directly from the system not the data `(ts AS
> SYSTEMROWTIME())`. It should also address the "compute field not emitted"
> problem by carrying the "virtual column" concept like @shuyi suggested.
> However if I understand correctly, this also required to be defined as part
> of the schema/column definition.
>
> 3) SOURCE / SINK / BOTH
> +1 on not adding properties to `CREATE TABLE` to manage ACL/permission.
>
> On a higher level, I think one question I have is whether we can
> definitively come to an agreement that the features under discussion (and
> potential solutions) can be cleanly adjusted/added from what we are
> providing on MVP (e.g. the schema/column definition might be hard to
> achieve but if we all agree ACL/permission should not be part of the
> `CREATE TABLE` and a decision can be made later). @shuyi I can also help in
> drafting the FLIP doc by summarizing the features under discussion and the
> concerns to whether included in the MVP, so that we can carry on the
> discussions alongside with the MVP implementation effort. I think each one
> of these features deserves a subsection dedicated for it.
>
> Many thanks,
> Rong
>
>
> On Wed, Dec 12, 2018 at 1:14 AM Shuyi Chen  wrot

Re: [SURVEY] Usage of flink-python and flink-streaming-python

2018-12-13 Thread Xianda Ke
Hi Folks,
To avoid polluting the survey thread with discussions, we started separate
thread and maybe we can continue the discussion over there.

Regards,
Xianda

On Wed, Dec 12, 2018 at 3:34 AM Stephan Ewen  wrote:

> I like that we are having a general discussion about how to use Python and
> Flink together in the future.
> The current python support has some shortcomings that were mentioned
> before, so we clearly need something better.
>
> Parts of the community have worked together with the Apache Beam project,
> which is pretty far in adding a portability layer to support Python.
> Before we dive deep into a design proposal for a new Python API in Flink, I
> think we should figure out in which general direction Python support should
> go.
>
> *Option (1): Language portability via Apache Beam*
>
> Pro:
>   - already exists to a large extend and already has users
>   - portability layer offers other languages in addition to python. Go is
> in the making, NodeJS has been speculated, etc.
>   - collaboration with another project / community which means more
> manpower and exposure. Beam currently has a strong focus on Flink as a
> runner for Python.
>   - Python API is used for existing ML libraries from the TensorFlow
> ecosystem
>
> Con:
>   - Not Flink's API. Python users need to learn the syntax of another API
> (Python API is inherently different, but even more different here).
>
> *Option (2): Implement own Python API*
>
> Pro:
>   - Python API will be closer to Flink Java / Scala APIs
>
> Con:
>   - We will only have Python.
>   - Need to to rebuild the Python language bridge (significant work to get
> stable)
>   - might lose tight collaboration with Beam and the other parties in Beam
>   - not benefiting from Beam's ecosystem
>
> *Option (3): **Implement own portability layer*
>
> Pro
>   - Flexibility to align APIs across languages within Flink ecosystem
>
> Con
>   - A lot of work (for context, to get this feature complete, Beam has
> worked on that for a year now)
>   - Replicating work that already exists
>   - good chance to lose tight collaboration with Beam and parties in that
> project
>   - not benefiting from Beam's ecosystem
>
> Best,
> Stephan
>
>
> On Tue, Dec 11, 2018 at 3:38 PM Thomas Weise  wrote:
>
> > Did you take a look at Apache Beam? It already provides a comprehensive
> > Python SDK and can be used with Flink:
> > https://beam.apache.org/roadmap/portability/#python-on-flink
> >
> > We are using it at Lyft for Python streaming pipelines.
> >
> > Thomas
> >
> > On Tue, Dec 11, 2018 at 5:54 AM Xianda Ke  wrote:
> >
> > > Hi Till,
> > >
> > > 1. So far as I know, most of the users at Alibaba are using SQL.  Some
> of
> > > users at Alibaba want integrated python libraries with Flink for
> > streaming
> > > processing, and Jython is unusable.
> > >
> > > 2. Python UDFs for SQL:
> > > * declaring python UDF based on Alibaba's internal DDL syntax.
> > > * start a Python process in open()
> > > * communicate with JVM process via Socket.
> > > * Yes, it support python libraries, users can upload virutalenv/conda
> > > Python runtime
> > >
> > > 3. We've draft a design doc for Python API
> > >  [DISCUSS] Flink Python API
> > > <
> > >
> >
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> > > >
> > >
> > > Python UDF for SQL is not discussed in this documentation, we'll
> create a
> > > new proposal when the SQL DDL is ready.
> > >
> > > On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann 
> > > wrote:
> > >
> > > > Hi Xianda,
> > > >
> > > > thanks for sharing this detailed feedback. Do I understand you
> > correctly
> > > > that flink-python and flink-streaming-python are not usable for the
> use
> > > > cases at Alibaba atm?
> > > >
> > > > Could you share a bit more details about the Python UDFs for SQL? How
> > do
> > > > you execute the Python code? Will it work with any Python library? If
> > you
> > > > are about to publish the design document then feel free to refer me
> to
> > > this
> > > > document.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke 
> wrote:
> > > >
> > > > > Xianda Ke 
> > > > > 9:47 AM (11 minutes ago)
> > > > > to dev, user
> > > > > After communicating with some of the internal users at Alibaba, my
> > > > > impression is that:
> > > > >
> > > > >- Most of them need C extensions support, they want to
> integrated
> > > > their
> > > > >algorithms with stream processing,but Jython is unacceptable for
> > > them.
> > > > >- For some users, who are only familiar with SQL/Python,
> > developing
> > > > Java
> > > > >API application/UDF is too complex. Writing Python UDF and
> > declaring
> > > > it
> > > > > in
> > > > >SQL is preferred.
> > > > >- Machine Learning users needs richer Python APIs, such as Table
> > API
> > > > >Python support.
> > > > >
> > > > >
> > > > > From my point of view, currently Python support has a few caveats
>

[DISCUSS] Python (and Non-JVM) Language Support in Flink

2018-12-13 Thread Xianda Ke
Currently there is an ongoing survey about Python usage of Flink [1]. Some
discussion was also brought up there regarding non-jvm language support
strategy in general. To avoid polluting the survey thread, we are starting
this discussion thread and would like to move the discussions here.

In the interest of facilitating the discussion, we would like to first
share the following design doc which describes what we have done at Alibaba
about Python API for Flink. It could serve as a good reference to the
discussion.

 [DISCUSS] Flink Python API


As of now, we've implemented and delivered Python UDF for SQL for the
internal users at Alibaba.
We are starting to implement Python API.

To recap and continue the discussion from the survey thread, I agree with
@Stephan that we should figure out in which general direction Python
support should go. Stephan also list three options there:
* Option (1): Language portability via Apache Beam
* Option (2): Implement own Python API
* Option (3): Implement own portability layer

>From my perspective,
(1). Flink language APIs and Beam's languages support are not mutually
exclusive.
It is nice that Beam has Python/NodeJS/Go APIs, and support Flink as the
runner.
Flink's own Python(or NodeJS/Go) APIs will benefit Flink's ecosystem.

(2). Python API / portability layer
To support non-JVM languages in Flink,
 * at client side, Flink would provide language interfaces, which will
translate user's application to Flink StreamGraph.
* at server side, Flink would execute user's UDF code at runtime
The non-JVM languages communicate with JVM via RPC(or low-level socket,
embedded interpreter and so on). What the portability layer can do maybe is
abstracting the RPC layer. When the portability layer is ready, still there
are lots of stuff to do for a specified language. Say, Python, we may still
have to write the interface classes by hand for the users because generated
code without detailed documentation is unacceptable for users, or handle
the serialization issue of lambda/closure which is not a built-in feature
in Python.  Maybe, we can start with Python API, then extend to other
languages and abstract the logic in common as the portability layer.

---
References:
[1] [SURVEY] Usage of flink-python and flink-streaming-python

Regards,
Xianda


[jira] [Created] (FLINK-11147) Add document for TableAggregate Function

2018-12-13 Thread Hequn Cheng (JIRA)
Hequn Cheng created FLINK-11147:
---

 Summary: Add document for TableAggregate Function
 Key: FLINK-11147
 URL: https://issues.apache.org/jira/browse/FLINK-11147
 Project: Flink
  Issue Type: Sub-task
  Components: Table API & SQL
Reporter: Hequn Cheng
Assignee: Hequn Cheng


Add document for {{TableAggregateFunction}}, similar to the document of 
{{AggregateFunction}}: 
https://ci.apache.org/projects/flink/flink-docs-release-1.7/dev/table/udfs.html#aggregation-functions

Most parts of {{TableAggregateFunction}} would be same with 
{{AggregateFunction}}, except for the ways of handling outputs. 
{{AggregateFunction}} outputs a scalar value, while {{TableAggregateFunction}} 
outputs a Table with multi rows and columns.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11148) Rescaling operator regression in Flink 1.7

2018-12-13 Thread Truong Duc Kien (JIRA)
Truong Duc Kien created FLINK-11148:
---

 Summary: Rescaling operator regression in Flink 1.7
 Key: FLINK-11148
 URL: https://issues.apache.org/jira/browse/FLINK-11148
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.7.0
Reporter: Truong Duc Kien


We have a job using 20 TaskManager with 3 slot each.

Using Flink 1.4, when we rescale a data stream from 60 to 20, each TaskManager 
will only have one downstream slot, that receives the data from 3 upstream 
slots in the same TaskManager.

Using Flink 1.7, this behaviour no longer hold true, multiple downstream slots 
are being assigned to the same TaskManager. This change is causing imbalance in 
our TaskManager load.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [SURVEY] Usage of flink-python and flink-streaming-python

2018-12-13 Thread Stephan Ewen
You are right. Let's refocus this on the python user survey and spin out
another thread.

On Thu, Dec 13, 2018 at 9:56 AM Xianda Ke  wrote:

> Hi Folks,
> To avoid polluting the survey thread with discussions, we started separate
> thread and maybe we can continue the discussion over there.
>
> Regards,
> Xianda
>
> On Wed, Dec 12, 2018 at 3:34 AM Stephan Ewen  wrote:
>
> > I like that we are having a general discussion about how to use Python
> and
> > Flink together in the future.
> > The current python support has some shortcomings that were mentioned
> > before, so we clearly need something better.
> >
> > Parts of the community have worked together with the Apache Beam project,
> > which is pretty far in adding a portability layer to support Python.
> > Before we dive deep into a design proposal for a new Python API in
> Flink, I
> > think we should figure out in which general direction Python support
> should
> > go.
> >
> > *Option (1): Language portability via Apache Beam*
> >
> > Pro:
> >   - already exists to a large extend and already has users
> >   - portability layer offers other languages in addition to python. Go is
> > in the making, NodeJS has been speculated, etc.
> >   - collaboration with another project / community which means more
> > manpower and exposure. Beam currently has a strong focus on Flink as a
> > runner for Python.
> >   - Python API is used for existing ML libraries from the TensorFlow
> > ecosystem
> >
> > Con:
> >   - Not Flink's API. Python users need to learn the syntax of another API
> > (Python API is inherently different, but even more different here).
> >
> > *Option (2): Implement own Python API*
> >
> > Pro:
> >   - Python API will be closer to Flink Java / Scala APIs
> >
> > Con:
> >   - We will only have Python.
> >   - Need to to rebuild the Python language bridge (significant work to
> get
> > stable)
> >   - might lose tight collaboration with Beam and the other parties in
> Beam
> >   - not benefiting from Beam's ecosystem
> >
> > *Option (3): **Implement own portability layer*
> >
> > Pro
> >   - Flexibility to align APIs across languages within Flink ecosystem
> >
> > Con
> >   - A lot of work (for context, to get this feature complete, Beam has
> > worked on that for a year now)
> >   - Replicating work that already exists
> >   - good chance to lose tight collaboration with Beam and parties in that
> > project
> >   - not benefiting from Beam's ecosystem
> >
> > Best,
> > Stephan
> >
> >
> > On Tue, Dec 11, 2018 at 3:38 PM Thomas Weise  wrote:
> >
> > > Did you take a look at Apache Beam? It already provides a comprehensive
> > > Python SDK and can be used with Flink:
> > > https://beam.apache.org/roadmap/portability/#python-on-flink
> > >
> > > We are using it at Lyft for Python streaming pipelines.
> > >
> > > Thomas
> > >
> > > On Tue, Dec 11, 2018 at 5:54 AM Xianda Ke  wrote:
> > >
> > > > Hi Till,
> > > >
> > > > 1. So far as I know, most of the users at Alibaba are using SQL.
> Some
> > of
> > > > users at Alibaba want integrated python libraries with Flink for
> > > streaming
> > > > processing, and Jython is unusable.
> > > >
> > > > 2. Python UDFs for SQL:
> > > > * declaring python UDF based on Alibaba's internal DDL syntax.
> > > > * start a Python process in open()
> > > > * communicate with JVM process via Socket.
> > > > * Yes, it support python libraries, users can upload virutalenv/conda
> > > > Python runtime
> > > >
> > > > 3. We've draft a design doc for Python API
> > > >  [DISCUSS] Flink Python API
> > > > <
> > > >
> > >
> >
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> > > > >
> > > >
> > > > Python UDF for SQL is not discussed in this documentation, we'll
> > create a
> > > > new proposal when the SQL DDL is ready.
> > > >
> > > > On Mon, Dec 10, 2018 at 9:52 PM Till Rohrmann 
> > > > wrote:
> > > >
> > > > > Hi Xianda,
> > > > >
> > > > > thanks for sharing this detailed feedback. Do I understand you
> > > correctly
> > > > > that flink-python and flink-streaming-python are not usable for the
> > use
> > > > > cases at Alibaba atm?
> > > > >
> > > > > Could you share a bit more details about the Python UDFs for SQL?
> How
> > > do
> > > > > you execute the Python code? Will it work with any Python library?
> If
> > > you
> > > > > are about to publish the design document then feel free to refer me
> > to
> > > > this
> > > > > document.
> > > > >
> > > > > Cheers,
> > > > > Till
> > > > >
> > > > > On Mon, Dec 10, 2018 at 3:08 AM Xianda Ke 
> > wrote:
> > > > >
> > > > > > Xianda Ke 
> > > > > > 9:47 AM (11 minutes ago)
> > > > > > to dev, user
> > > > > > After communicating with some of the internal users at Alibaba,
> my
> > > > > > impression is that:
> > > > > >
> > > > > >- Most of them need C extensions support, they want to
> > integrated
> > > > > their
> > > > > >algorithms with stream processing,but Jython is unacceptable
> for
> > > > them.
> > > > > 

Re: Connection leak with flink elastic Sink

2018-12-13 Thread Chesnay Schepler

Specifically which connector are you using, and which Flink version?

On 12.12.2018 13:31, Vijay Bhaskar wrote:

Hi
We are using flink elastic sink which streams at the rate of 1000 
events/sec, as described in 
https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/elasticsearch.html.
We are observing connection leak of elastic connections. After few 
minutes all the open connections are exceeding the process limits of 
the max open descriptors and Job is getting terminated. But the  http 
connections with the elastic search server remain open forever. Am i 
missing any specific configuration setting to close the open 
connection, after serving the request?
But there is no such setting is described in the above documentation 
of elastic sink


Regards
Bhaskar





Re: Connection leak with flink elastic Sink

2018-12-13 Thread Tzu-Li (Gordon) Tai
Hi,

Besides the information that Chesnay requested, could you also provide a stack 
trace of the exception that caused the job to terminate in the first place?

The Elasticsearch sink does indeed close the internally used Elasticsearch 
client, which should in turn properly release all resources [1].
I would like to double check whether or not the case here is that that part of 
the code was never reached.

Cheers,
Gordon

[1] 
https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L334

On 13 December 2018 at 5:59:34 PM, Chesnay Schepler (ches...@apache.org) wrote:

Specifically which connector are you using, and which Flink version?  

On 12.12.2018 13:31, Vijay Bhaskar wrote:  
> Hi  
> We are using flink elastic sink which streams at the rate of 1000  
> events/sec, as described in  
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/elasticsearch.html.
>   
> We are observing connection leak of elastic connections. After few  
> minutes all the open connections are exceeding the process limits of  
> the max open descriptors and Job is getting terminated. But the http  
> connections with the elastic search server remain open forever. Am i  
> missing any specific configuration setting to close the open  
> connection, after serving the request?  
> But there is no such setting is described in the above documentation  
> of elastic sink  
>  
> Regards  
> Bhaskar  




Re: [DISCUSS] Python (and Non-JVM) Language Support in Flink

2018-12-13 Thread Shaoxuan Wang
RE: Stephen's options (
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/SURVEY-Usage-of-flink-python-and-flink-streaming-python-td25793.html
)
* Option (1): Language portability via Apache Beam
* Option (2): Implement own Python API
* Option (3): Implement own portability layer

Hi Stephen,
Eventually, I think we should support both option1 and option3. TMO, these
two options are orthogonal. I agree with you that we can leverage the
existing work and ecosystem in beam by supporting option1. But the problem
of beam is that it skips (to the best of my knowledge) the natural
table/SQL optimization framework provided by Flink. We should spend all the
needed efforts to support solution1 (as it is the better alternative of the
current Flink python API), but cannot solely bet on it. Option3 is the
ideal choice for Flink to support all Non-JVM languages which we should
better plan to achieve. We have done some preliminary prototypes for
option2/option3, and it seems not quite complex and difficult to accomplish.

Regards,
Shaoxuan


On Thu, Dec 13, 2018 at 4:58 PM Xianda Ke  wrote:

> Currently there is an ongoing survey about Python usage of Flink [1]. Some
> discussion was also brought up there regarding non-jvm language support
> strategy in general. To avoid polluting the survey thread, we are starting
> this discussion thread and would like to move the discussions here.
>
> In the interest of facilitating the discussion, we would like to first
> share the following design doc which describes what we have done at Alibaba
> about Python API for Flink. It could serve as a good reference to the
> discussion.
>
>  [DISCUSS] Flink Python API
> <
> https://docs.google.com/document/d/1JNGWdLwbo_btq9RVrc1PjWJV3lYUgPvK0uEWDIfVNJI/edit?usp=drive_web
> >
>
> As of now, we've implemented and delivered Python UDF for SQL for the
> internal users at Alibaba.
> We are starting to implement Python API.
>
> To recap and continue the discussion from the survey thread, I agree with
> @Stephan that we should figure out in which general direction Python
> support should go. Stephan also list three options there:
> * Option (1): Language portability via Apache Beam
> * Option (2): Implement own Python API
> * Option (3): Implement own portability layer
>
> From my perspective,
> (1). Flink language APIs and Beam's languages support are not mutually
> exclusive.
> It is nice that Beam has Python/NodeJS/Go APIs, and support Flink as the
> runner.
> Flink's own Python(or NodeJS/Go) APIs will benefit Flink's ecosystem.
>
> (2). Python API / portability layer
> To support non-JVM languages in Flink,
>  * at client side, Flink would provide language interfaces, which will
> translate user's application to Flink StreamGraph.
> * at server side, Flink would execute user's UDF code at runtime
> The non-JVM languages communicate with JVM via RPC(or low-level socket,
> embedded interpreter and so on). What the portability layer can do maybe is
> abstracting the RPC layer. When the portability layer is ready, still there
> are lots of stuff to do for a specified language. Say, Python, we may still
> have to write the interface classes by hand for the users because generated
> code without detailed documentation is unacceptable for users, or handle
> the serialization issue of lambda/closure which is not a built-in feature
> in Python.  Maybe, we can start with Python API, then extend to other
> languages and abstract the logic in common as the portability layer.
>
> ---
> References:
> [1] [SURVEY] Usage of flink-python and flink-streaming-python
>
> Regards,
> Xianda
>


[jira] [Created] (FLINK-11149) Flink will request too more containers than it actually needs

2018-12-13 Thread Fan Xinpu (JIRA)
Fan Xinpu created FLINK-11149:
-

 Summary: Flink will request too more containers than it actually 
needs
 Key: FLINK-11149
 URL: https://issues.apache.org/jira/browse/FLINK-11149
 Project: Flink
  Issue Type: Improvement
  Components: YARN
Affects Versions: 1.7.0
Reporter: Fan Xinpu


  As known, flink will request new containers when it was notified that some 
allocated container is completed. Let me say, maybe one container failed, and 
Flink tries to request one container from NM, but actually Flink will request 
n+1 containers, the n refers to the number that ever requested after cluster is 
created.It is not graceful.

  When requesting a container, Flink will send a ContainerRequest to RM through 
AMRM Client, and AMRMClient will save the ContainerRequest in itself, and hopes 
the ContainerRequest will be removed in future, but Flink never removes the 
ContainerRequest, so one by one, the number of ContainerRequest accumulates to 
a unexpected value.

  In our environment, a cluster initially allocated 100 containers, and later 
on,it requests one container from RM, RM returns more than 2000 containers to 
it as the request actually has more than 2000 ContainerRequest. Although Flink 
will return the excess containers, this request behavior waste time and 
resource on yarn.

  So, maybe Flink can remove the ContainerRequest after the request has been 
sent to RM, then Flink will get exactly numbers of containers as it explicitly 
did.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Connection leak with flink elastic Sink

2018-12-13 Thread Vijay Bhaskar
Hi Gordon,
We are using flink cluster 1.6.1, elastic search connector version:
flink-connector-elasticsearch6_2.11
Attached the stack trace.

Following are the max open file descriptor limit of theTask manager
process and open connections to the elastic
search cluster

Regards
Bhaskar
*#lsof -p 62041 | wc -l*

*65583*

*All the connections to elastic cluster reached to:*

*netstat -aln | grep 9200 | wc -l*

*2333*




On Thu, Dec 13, 2018 at 4:12 PM Tzu-Li (Gordon) Tai 
wrote:

> Hi,
>
> Besides the information that Chesnay requested, could you also provide a
> stack trace of the exception that caused the job to terminate in the first
> place?
>
> The Elasticsearch sink does indeed close the internally used Elasticsearch
> client, which should in turn properly release all resources [1].
> I would like to double check whether or not the case here is that that
> part of the code was never reached.
>
> Cheers,
> Gordon
>
> [1]
> https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-elasticsearch-base/src/main/java/org/apache/flink/streaming/connectors/elasticsearch/ElasticsearchSinkBase.java#L334
>
> On 13 December 2018 at 5:59:34 PM, Chesnay Schepler (ches...@apache.org)
> wrote:
>
> Specifically which connector are you using, and which Flink version?
>
> On 12.12.2018 13:31, Vijay Bhaskar wrote:
> > Hi
> > We are using flink elastic sink which streams at the rate of 1000
> > events/sec, as described in
> >
> https://ci.apache.org/projects/flink/flink-docs-stable/dev/connectors/elasticsearch.html.
>
> > We are observing connection leak of elastic connections. After few
> > minutes all the open connections are exceeding the process limits of
> > the max open descriptors and Job is getting terminated. But the http
> > connections with the elastic search server remain open forever. Am i
> > missing any specific configuration setting to close the open
> > connection, after serving the request?
> > But there is no such setting is described in the above documentation
> > of elastic sink
> >
> > Regards
> > Bhaskar
>
>
>
2018-12-13 06:14:42,120 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 

2018-12-13 06:14:42,122 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Starting 
StandaloneSessionClusterEntrypoint (Version: 1.6.1, Rev:23e2636, 
Date:14.09.2018 @ 19:56:46 UTC)
2018-12-13 06:14:42,122 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  OS current 
user: root
2018-12-13 06:14:42,122 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Current 
Hadoop/Kerberos user: 
2018-12-13 06:14:42,122 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JVM: OpenJDK 
64-Bit Server VM - Oracle Corporation - 1.8/25.181-b13
2018-12-13 06:14:42,122 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Maximum heap 
size: 981 MiBytes
2018-12-13 06:14:42,123 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JAVA_HOME: 
(not set)
2018-12-13 06:14:42,123 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  No Hadoop 
Dependency available
2018-12-13 06:14:42,123 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  JVM Options:
2018-12-13 06:14:42,123 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xms1024m
2018-12-13 06:14:42,123 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - -Xmx1024m
2018-12-13 06:14:42,123 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
-Dlog.file=/root/flink-1.6.1/log/flink-root-standalonesession-0-contrail-eng-raisa-cloudsecure-eng-raisa-flink.log
2018-12-13 06:14:42,123 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
-Dlog4j.configuration=file:/root/flink-1.6.1/conf/log4j.properties
2018-12-13 06:14:42,124 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
-Dlogback.configurationFile=file:/root/flink-1.6.1/conf/logback.xml
2018-12-13 06:14:42,124 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Program 
Arguments:
2018-12-13 06:14:42,124 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - --configDir
2018-12-13 06:14:42,124 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
/root/flink-1.6.1/conf
2018-12-13 06:14:42,124 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - 
--executionMode
2018-12-13 06:14:42,124 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint - cluster
2018-12-13 06:14:42,124 INFO  
org.apache.flink.runtime.entrypoint.ClusterEntrypoint -  Classpath: 
/root/flink-1.6.1/lib/flink-python_2.11-1.6.1.jar:/root/flink-1.6.1/lib/log4j-1.2.17.jar:/root/flink-1.6.1/lib/log4j-api-2.9.1.jar:/root/flink-1.6.1/lib/slf4j-log4j12-1.7.7.jar:/root/flink-1.6.1/lib/flin

[jira] [Created] (FLINK-11150) Check exception messages in ValidationTest of flink-table

2018-12-13 Thread Hequn Cheng (JIRA)
Hequn Cheng created FLINK-11150:
---

 Summary: Check exception messages in ValidationTest of flink-table
 Key: FLINK-11150
 URL: https://issues.apache.org/jira/browse/FLINK-11150
 Project: Flink
  Issue Type: Improvement
  Components: Table API & SQL
Reporter: Hequn Cheng
Assignee: Hequn Cheng


Problem

Currently, there are a lot of {{ValidationTests}} in flink-table. These tests 
are used to test whether exceptions are thrown correctly. However, the detailed 
messages of the exception have not been checked which makes the tests very 
fragile. Take the following test as an example:
{code:java}
class TableSinkValidationTest extends TableTestBase {

  @Test(expected = classOf[TableException])
  def testAppendSinkOnUpdatingTable(): Unit = {
val env = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(env)

val t = StreamTestData.get3TupleDataStream(env).toTable(tEnv, 'id, 'num, 
'text)
tEnv.registerTableSink("testSink", new TestAppendSink)

t.groupBy('text)
.select('text, 'id.count, 'num.sum)
.insertInto("testSink")

// must fail because table is not append-only
env.execute()
  }
}
{code}
The test is used to check validation for AppendSink on updating table. The test 
will pass without any exceptions. If we remove the {{(expected = 
classOf[TableException])}}, we can see the following exception:
{code:java}
org.apache.flink.table.api.TableException: Table sink is not configured.
{code}
However, the correct exception should be:
{code:java}
org.apache.flink.table.api.TableException: AppendStreamTableSink requires that 
Table has only insert changes.
{code}
Since the two exceptions share the same exception class name, we also have to 
check the exception messages.

 

Proposal

To make our test more rigorous, I think it is better to use 
{{ExpectedException}} to check both the exception class and exception messages. 
So the previous test would be:
{code:java}
@Test
  def testAppendSinkOnUpdatingTable(): Unit = {
expectedException.expect(classOf[TableException])
expectedException.expectMessage("AppendStreamTableSink requires that Table 
has only insert changes.")

val env = StreamExecutionEnvironment.getExecutionEnvironment
val tEnv = TableEnvironment.getTableEnvironment(env)

val t = StreamTestData.get3TupleDataStream(env).toTable(tEnv, 'id, 'num, 
'text)
tEnv.registerTableSink("testSink", new TestAppendSink()
  .configure(Array("text", "id", "num"), Array(Types.STRING, Types.LONG, 
Types.LONG)))

t.groupBy('text)
.select('text, 'id.count, 'num.sum)
.insertInto("testSink")

// must fail because table is not append-only
env.execute()
  }
{code}
which adds two more lines to the test:
{code:java}
expectedException.expect(classOf[TableException])
expectedException.expectMessage("AppendStreamTableSink requires that Table 
has only insert changes.")
{code}
 

Any suggestions are greatly appreciated!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11151) FileUploadHandler stops working if the upload directory is removed

2018-12-13 Thread Chesnay Schepler (JIRA)
Chesnay Schepler created FLINK-11151:


 Summary: FileUploadHandler stops working if the upload directory 
is removed
 Key: FLINK-11151
 URL: https://issues.apache.org/jira/browse/FLINK-11151
 Project: Flink
  Issue Type: Bug
  Components: Job-Submission, REST
Affects Versions: 1.7.0, 1.6.2
Reporter: Chesnay Schepler
Assignee: Chesnay Schepler
 Fix For: 1.6.3, 1.8.0, 1.7.2


A user has reported on the ML that the FileUploadHandler does not accept any 
files anymore if the upload directory was deleted after the cluster has been 
started.
A cursory glance at the code shows that it currently uses 
{{Files.createDirectory(...)}} to create a temporary directory for the current 
request to store uploaded files in.
Changing this to use {{Files.createDirectories(...)}} instead should prevent 
this from happening again.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11152) ClosureCleanerTest fails with IllegalStateException

2018-12-13 Thread Gary Yao (JIRA)
Gary Yao created FLINK-11152:


 Summary: ClosureCleanerTest fails with IllegalStateException
 Key: FLINK-11152
 URL: https://issues.apache.org/jira/browse/FLINK-11152
 Project: Flink
  Issue Type: Sub-task
  Components: Java API, Tests
Affects Versions: 1.8.0
Reporter: Gary Yao
Assignee: Gary Yao
 Fix For: 1.8.0


{{ClosureCleanerTest}} fails with the exception below:

{noformat}
java.lang.IllegalArgumentException
at 
org.apache.flink.shaded.asm5.org.objectweb.asm.ClassReader.(Unknown 
Source)
at 
org.apache.flink.shaded.asm5.org.objectweb.asm.ClassReader.(Unknown 
Source)
at 
org.apache.flink.shaded.asm5.org.objectweb.asm.ClassReader.(Unknown 
Source)
at 
org.apache.flink.api.java.ClosureCleaner.getClassReader(ClosureCleaner.java:148)
at 
org.apache.flink.api.java.ClosureCleaner.cleanThis0(ClosureCleaner.java:115)
at 
org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:75)
at 
org.apache.flink.api.java.functions.ClosureCleanerTest.testCleanedNonSerializable(ClosureCleanerTest.java:51)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11153) UdfAnalyzerTest fails with CodeAnalyzerException

2018-12-13 Thread Gary Yao (JIRA)
Gary Yao created FLINK-11153:


 Summary: UdfAnalyzerTest fails with CodeAnalyzerException
 Key: FLINK-11153
 URL: https://issues.apache.org/jira/browse/FLINK-11153
 Project: Flink
  Issue Type: Sub-task
  Components: Java API, Tests
Affects Versions: 1.8.0
Reporter: Gary Yao
 Fix For: 1.8.0


{noformat}
org.apache.flink.api.java.sca.CodeAnalyzerException: Exception occurred during 
code analysis.

at 
org.apache.flink.api.java.sca.UdfAnalyzer.analyze(UdfAnalyzer.java:341)
at 
org.apache.flink.api.java.sca.UdfAnalyzerTest.compareAnalyzerResultWithAnnotationsSingleInputWithKeys(UdfAnalyzerTest.java:1339)
at 
org.apache.flink.api.java.sca.UdfAnalyzerTest.compareAnalyzerResultWithAnnotationsSingleInput(UdfAnalyzerTest.java:1322)
at 
org.apache.flink.api.java.sca.UdfAnalyzerTest.testForwardWithArrayModification(UdfAnalyzerTest.java:695)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at 
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at 
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:47)
at 
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at 
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
Caused by: java.lang.IllegalArgumentException
at 
org.apache.flink.shaded.asm5.org.objectweb.asm.ClassReader.(Unknown 
Source)
at 
org.apache.flink.shaded.asm5.org.objectweb.asm.ClassReader.(Unknown 
Source)
at 
org.apache.flink.shaded.asm5.org.objectweb.asm.ClassReader.(Unknown 
Source)
at 
org.apache.flink.api.java.sca.UdfAnalyzerUtils.findMethodNode(UdfAnalyzerUtils.java:131)
at 
org.apache.flink.api.java.sca.UdfAnalyzerUtils.findMethodNode(UdfAnalyzerUtils.java:115)
at 
org.apache.flink.api.java.sca.UdfAnalyzer.analyze(UdfAnalyzer.java:290)
... 25 more
{noformat}




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11154) Upgrade Netty to 4.1.32

2018-12-13 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-11154:
---

 Summary: Upgrade Netty to 4.1.32
 Key: FLINK-11154
 URL: https://issues.apache.org/jira/browse/FLINK-11154
 Project: Flink
  Issue Type: Improvement
  Components: Network
Affects Versions: 1.8.0
Reporter: Nico Kruber


Notable changes since 4.1.24 (currently used in Flink 1.6-1.7):
- big (performance, feature set) improvements for using openSSL based SSL 
engine (useful for FLINK-9816)
- allow multiple shaded versions of the same netty artifact (as long as the 
shaded prefix is different)
- Ensure ByteToMessageDecoder.Cumulator implementations always release.
- Don't re-arm timerfd each epoll_wait
- Use a non-volatile read for ensureAccessible() whenever possible to reduce 
overhead and allow better inlining.
- Do not fail on runtime when an older version of Log4J2 is on the classpath
- Fix leak and corruption bugs in CompositeByteBuf
- Add support for TLSv1.3
- Harden ref-counting concurrency semantics
- bug fixes
- Java 9-12 related fixes



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11155) HadoopFreeFsFactoryTest fails with ClassCastException

2018-12-13 Thread Gary Yao (JIRA)
Gary Yao created FLINK-11155:


 Summary: HadoopFreeFsFactoryTest fails with ClassCastException
 Key: FLINK-11155
 URL: https://issues.apache.org/jira/browse/FLINK-11155
 Project: Flink
  Issue Type: Sub-task
Reporter: Gary Yao


{{HadoopFreeFsFactoryTest#testHadoopFactoryInstantiationWithoutHadoop}} fails 
with a ClassCastException because we try to cast the result of 
{{getClassLoader}} to {{URLClassLoader}} (also see 
https://stackoverflow.com/questions/46694600/java-9-compatability-issue-with-classloader-getsystemclassloader)
{noformat}
java.lang.ClassCastException: 
java.base/jdk.internal.loader.ClassLoaders$AppClassLoader cannot be cast to 
java.base/java.net.URLClassLoader

at 
org.apache.flink.runtime.fs.hdfs.HadoopFreeFsFactoryTest.testHadoopFactoryInstantiationWithoutHadoop(HadoopFreeFsFactoryTest.java:46)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11156) ZooKeeperCompletedCheckpointStoreMockitoTest fails with IncompatibleClassChangeError

2018-12-13 Thread Gary Yao (JIRA)
Gary Yao created FLINK-11156:


 Summary: ZooKeeperCompletedCheckpointStoreMockitoTest fails with 
IncompatibleClassChangeError
 Key: FLINK-11156
 URL: https://issues.apache.org/jira/browse/FLINK-11156
 Project: Flink
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 1.8.0
Reporter: Gary Yao
 Fix For: 1.8.0



{noformat}
java.lang.IncompatibleClassChangeError: Method 
java.util.Comparator.comparing(Ljava/util/function/Function;)Ljava/util/Comparator;
 must be InterfaceMethodref constant

at 
org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStore.(ZooKeeperCompletedCheckpointStore.java:74)
at java.base/java.lang.Class.forName0(Native Method)
at java.base/java.lang.Class.forName(Class.java:292)
at javassist.runtime.Desc.getClassObject(Desc.java:43)
at javassist.runtime.Desc.getClassType(Desc.java:152)
at javassist.runtime.Desc.getType(Desc.java:122)
at javassist.runtime.Desc.getType(Desc.java:78)
at 
org.apache.flink.runtime.checkpoint.ZooKeeperCompletedCheckpointStoreMockitoTest.testCheckpointRecovery(ZooKeeperCompletedCheckpointStoreMockitoTest.java:175)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:564)
at org.junit.internal.runners.TestMethod.invoke(TestMethod.java:68)
at 
org.powermock.modules.junit4.internal.impl.PowerMockJUnit44RunnerDelegateImpl$PowerMockJUnit44MethodRunner.runTestMethod(PowerMockJUnit44RunnerDelegateImpl.java:326)
at 
org.junit.internal.runners.MethodRoadie$1$1.call(MethodRoadie.java:64)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1167)
at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:641)
at java.base/java.lang.Thread.run(Thread.java:844)
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11157) Web UI should show timestamps relative to server time zone

2018-12-13 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-11157:
---

 Summary: Web UI should show timestamps relative to server time zone
 Key: FLINK-11157
 URL: https://issues.apache.org/jira/browse/FLINK-11157
 Project: Flink
  Issue Type: Improvement
  Components: Webfrontend
Affects Versions: 1.7.0
Reporter: Nico Kruber


Currently, it seems the web UI is fetching timestamps, e.g. for checkpoint, as 
milliseconds from Epoch and simply converts them using the Browser's clock and 
time zone. It should be based on the server's time zone instead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11158) Do not show empty page for unknown jobs

2018-12-13 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-11158:
---

 Summary: Do not show empty page for unknown jobs
 Key: FLINK-11158
 URL: https://issues.apache.org/jira/browse/FLINK-11158
 Project: Flink
  Issue Type: Improvement
  Components: Webfrontend
Affects Versions: 1.7.0, 1.6.2, 1.5.5
Reporter: Nico Kruber


If you try to access the Web UI using an old/non-existing job ID, e.g. after a 
cluster restart, you are currently presented with a white page with no further 
info. It should at least contain "job not found" message to indicate that there 
was no other error.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Creating last bug fix release for 1.5 branch

2018-12-13 Thread Thomas Weise
Hi,

I would be interested to try my hand at being the release manger for this.

There are currently still 5 in-progress issues [1], all except [2] with an
open PR.

Nico, Chesnay, Till: Can you please take a look and see if these can be
completed?

Thanks,
Thomas


[1]
https://issues.apache.org/jira/issues/?jql=statusCategory%20%3D%20indeterminate%20AND%20project%20%3D%2012315522%20AND%20fixVersion%20%3D%2012344315%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
[2] https://issues.apache.org/jira/browse/FLINK-9010






On Mon, Dec 10, 2018 at 3:15 PM Thomas Weise  wrote:

> Thanks Till and my belated +1 for a final patch release :)
>
> On Mon, Dec 10, 2018 at 5:47 AM Till Rohrmann 
> wrote:
>
>> Thanks for the feedback! I conclude that the community is in favour of a
>> last 1.5.6 release. I'll try to make the arrangements in the next two
>> weeks.
>>
>> Cheers,
>> Till
>>
>> On Mon, Dec 10, 2018 at 2:40 AM jincheng sun 
>> wrote:
>>
>> > +1. There are incompatible improvements between 1.5.x and 1.6/1.7, so
>> many
>> > 1.5.x users may not be willing to upgrade to 1.6 or 1.7 due to migration
>> > costs, so it makes sense to creating last bug fix release for 1.5
>> branch.
>> >
>> > Bests,
>> > Jincheng
>> >
>> > Jeff Zhang  于2018年12月10日周一 上午9:24写道:
>> >
>> > > +1, I think very few people would use 1.6 or 1.7 in their production
>> in
>> > > near future, so I expect they would use 1.5 in production for a long
>> > > period,it makes sense to provide a stable version for production
>> usage.
>> > >
>> > > Ufuk Celebi  于2018年12月9日周日 下午6:07写道:
>> > >
>> > > > +1. This seems reasonable to me. Since the fixes are already in and
>> > > > also part of other releases, the release overhead should be
>> > > > manageable.
>> > > >
>> > > > @Vino: I agree with your assessment.
>> > > >
>> > > > @Qi: As Till mentioned, the official project guideline is to support
>> > > > the last two minor releases, e.g. currently 1.7 and 1.6.
>> > > >
>> > > > Best,
>> > > >
>> > > > Ufuk
>> > > >
>> > > > On Sun, Dec 9, 2018 at 3:48 AM qi luo  wrote:
>> > > > >
>> > > > > Hi Till,
>> > > > >
>> > > > > Does Flink has an agreement on how long will a major version be
>> > > > supported? Some companies may need a long time to upgrade Flink
>> major
>> > > > versions in production. If Flink terminates support for a major
>> version
>> > > too
>> > > > quickly, it may be a concern for companies.
>> > > > >
>> > > > > Best,
>> > > > > Qi
>> > > > >
>> > > > > > On Dec 8, 2018, at 10:57 AM, vino yang 
>> > > wrote:
>> > > > > >
>> > > > > > Hi Till,
>> > > > > >
>> > > > > > I think it makes sense to release a bug fix version (especially
>> > some
>> > > > > > serious bug fixes) for flink 1.5.
>> > > > > > Consider that some companies' production environments are more
>> > > cautious
>> > > > > > about upgrading large versions.
>> > > > > > I think some organizations are still using 1.5.x or even 1.4.x.
>> > > > > >
>> > > > > > Best,
>> > > > > > Vino
>> > > > > >
>> > > > > > Till Rohrmann  于2018年12月7日周五 下午11:39写道:
>> > > > > >
>> > > > > >> Dear community,
>> > > > > >>
>> > > > > >> I wanted to reach out to you and discuss whether we should
>> > release a
>> > > > last
>> > > > > >> bug fix release for the 1.5 branch.
>> > > > > >>
>> > > > > >> Since we have already released Flink 1.7.0, we only need to
>> > support
>> > > > the
>> > > > > >> 1.6.x and 1.7.x branches (last two major releases). However,
>> the
>> > > > current
>> > > > > >> release-1.5 branch contains 45 unreleased fixes. Some of the
>> fixes
>> > > > address
>> > > > > >> serializer duplication problems (FLINK-10839, FLINK-10693),
>> fixing
>> > > > > >> retractions (FLINK-10674) or prevent a deadlock in the
>> > > > > >> SpillableSubpartition (FLINK-10491). I think it would be nice
>> for
>> > > our
>> > > > users
>> > > > > >> if we officially terminated the Flink 1.5.x support with a last
>> > > 1.5.6
>> > > > > >> release. What do you think?
>> > > > > >>
>> > > > > >> Cheers,
>> > > > > >> Till
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Best Regards
>> > >
>> > > Jeff Zhang
>> > >
>> >
>>
>


[jira] [Created] (FLINK-11159) Allow configuration whether to fall back to savepoints for restore

2018-12-13 Thread Nico Kruber (JIRA)
Nico Kruber created FLINK-11159:
---

 Summary: Allow configuration whether to fall back to savepoints 
for restore
 Key: FLINK-11159
 URL: https://issues.apache.org/jira/browse/FLINK-11159
 Project: Flink
  Issue Type: Improvement
  Components: State Backends, Checkpointing
Affects Versions: 1.7.0, 1.6.2, 1.5.5
Reporter: Nico Kruber


Ever since FLINK-3397, upon failure, Flink would restart from the latest 
checkpoint/savepoint which ever is more recent. With the introduction of local 
recovery and the knowledge that a RocksDB checkpoint restore would just copy 
the files, it may be time to re-consider / making this configurable:
In certain situations, it may be faster to restore from the latest checkpoint 
only (even if there is a more recent savepoint) and reprocess the data between. 
On the downside, though, that may not be correct because that might break side 
effects if the savepoint was the latest one, e.g. consider this chain: {{chk1 
-> chk2 -> sp … restore chk2 -> …}}. Then all side effects between {{chk2 -> 
sp}} would be reproduced.

Making this configurable will allow the user to set whatever he needs / can to 
get the lowest recovery time in Flink.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11160) Confluent Avro Serialization Schema

2018-12-13 Thread Zhenhao Li (JIRA)
Zhenhao Li created FLINK-11160:
--

 Summary: Confluent Avro Serialization Schema 
 Key: FLINK-11160
 URL: https://issues.apache.org/jira/browse/FLINK-11160
 Project: Flink
  Issue Type: Improvement
  Components: Kafka Connector
Affects Versions: 1.7.0
Reporter: Zhenhao Li
 Fix For: 1.8.0, 1.7.2


Currently, Flink is missing Serialization Schema to work with the Confluent 
Avro format and the Confluent schema registry.

I wrote something that solved this problem for the company I currently work at. 
I think it is nice to contribute something back to the community. It has been 
used in a Scala project, and the project has been deployed to production. 

The new serialization schemas only serialize GenericRecord and users have to 
pass the Avro schema files to the constructors. It might be not flexible enough 
to cover a broader set of use cases. The keyed serialization schema works for 
only Scala key-value paris.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] Creating last bug fix release for 1.5 branch

2018-12-13 Thread Chesnay Schepler
FLINK-11023: will not be fixed for 1.5.6; this would take significantly 
longer to implement, and TBH I'm not really keen on doing that for a 
final bugfix release.
FLINK-7991: This is just a minor cleanup; the issue doesn't affect users 
in any way. It is thus not particularly important to have for this 
release and can be omitted IMO; I would also have to double-check 
whether the open PR applies properly to 1.5.6, and frankly I don't have 
the time for that right now anyway.
FLINK-10251: has been in review for a while, but will likely not be 
merged this year from what I know.
FLINK-9253: appears to require additional changes and is also quite 
outdated (it is from May after all), and looks more like a general 
improvement than a bug fix from the JIRA description. I would omit this 
from the release, unless Nico objects.


On 13.12.2018 17:08, Thomas Weise wrote:

Hi,

I would be interested to try my hand at being the release manger for this.

There are currently still 5 in-progress issues [1], all except [2] with an
open PR.

Nico, Chesnay, Till: Can you please take a look and see if these can be
completed?

Thanks,
Thomas


[1]
https://issues.apache.org/jira/issues/?jql=statusCategory%20%3D%20indeterminate%20AND%20project%20%3D%2012315522%20AND%20fixVersion%20%3D%2012344315%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
[2] https://issues.apache.org/jira/browse/FLINK-9010






On Mon, Dec 10, 2018 at 3:15 PM Thomas Weise  wrote:


Thanks Till and my belated +1 for a final patch release :)

On Mon, Dec 10, 2018 at 5:47 AM Till Rohrmann 
wrote:


Thanks for the feedback! I conclude that the community is in favour of a
last 1.5.6 release. I'll try to make the arrangements in the next two
weeks.

Cheers,
Till

On Mon, Dec 10, 2018 at 2:40 AM jincheng sun 
wrote:


+1. There are incompatible improvements between 1.5.x and 1.6/1.7, so

many

1.5.x users may not be willing to upgrade to 1.6 or 1.7 due to migration
costs, so it makes sense to creating last bug fix release for 1.5

branch.

Bests,
Jincheng

Jeff Zhang  于2018年12月10日周一 上午9:24写道:


+1, I think very few people would use 1.6 or 1.7 in their production

in

near future, so I expect they would use 1.5 in production for a long
period,it makes sense to provide a stable version for production

usage.

Ufuk Celebi  于2018年12月9日周日 下午6:07写道:


+1. This seems reasonable to me. Since the fixes are already in and
also part of other releases, the release overhead should be
manageable.

@Vino: I agree with your assessment.

@Qi: As Till mentioned, the official project guideline is to support
the last two minor releases, e.g. currently 1.7 and 1.6.

Best,

Ufuk

On Sun, Dec 9, 2018 at 3:48 AM qi luo  wrote:

Hi Till,

Does Flink has an agreement on how long will a major version be

supported? Some companies may need a long time to upgrade Flink

major

versions in production. If Flink terminates support for a major

version

too

quickly, it may be a concern for companies.

Best,
Qi


On Dec 8, 2018, at 10:57 AM, vino yang 

wrote:

Hi Till,

I think it makes sense to release a bug fix version (especially

some

serious bug fixes) for flink 1.5.
Consider that some companies' production environments are more

cautious

about upgrading large versions.
I think some organizations are still using 1.5.x or even 1.4.x.

Best,
Vino

Till Rohrmann  于2018年12月7日周五 下午11:39写道:


Dear community,

I wanted to reach out to you and discuss whether we should

release a

last

bug fix release for the 1.5 branch.

Since we have already released Flink 1.7.0, we only need to

support

the

1.6.x and 1.7.x branches (last two major releases). However,

the

current

release-1.5 branch contains 45 unreleased fixes. Some of the

fixes

address

serializer duplication problems (FLINK-10839, FLINK-10693),

fixing

retractions (FLINK-10674) or prevent a deadlock in the
SpillableSubpartition (FLINK-10491). I think it would be nice

for

our

users

if we officially terminated the Flink 1.5.x support with a last

1.5.6

release. What do you think?

Cheers,
Till



--
Best Regards

Jeff Zhang





Create version 1.7.1 in JIRA

2018-12-13 Thread Thomas Weise
Hi,

Can a PMC member please create the version number 1.7.1. There are already
some JIRAs with 1.7.2 version that may have to updated as well.

https://issues.apache.org/jira/projects/FLINK?selectedItem=com.atlassian.jira.jira-projects-plugin:release-page&status=released-unreleased

Thanks


Re: Create version 1.7.1 in JIRA

2018-12-13 Thread Chesnay Schepler
The 1.7.1 version already exists. As I'Ve already started the release 
process for 1.1 I have archived this version temporarily to save me the 
hassle of updating JIRAs that people now mark as fixed for 1.7.1 even 
though they aren't included.


On 13.12.2018 21:17, Thomas Weise wrote:

Hi,

Can a PMC member please create the version number 1.7.1. There are already
some JIRAs with 1.7.2 version that may have to updated as well.

https://issues.apache.org/jira/projects/FLINK?selectedItem=com.atlassian.jira.jira-projects-plugin:release-page&status=released-unreleased

Thanks





Thanks for hiding ASF GitHub Bot logs on JIRA

2018-12-13 Thread Tzu-Li Chen
Not sure how we arrive here but comments a bit noisy made by ASF GitHub Bot
are now hidden to Work Log. Thank you!

Best,
tison.


Re: Thanks for hiding ASF GitHub Bot logs on JIRA

2018-12-13 Thread Chesnay Schepler

ehhfrom what I can tell this is not the case.

On 13.12.2018 22:00, Tzu-Li Chen wrote:

Not sure how we arrive here but comments a bit noisy made by ASF GitHub Bot
are now hidden to Work Log. Thank you!

Best,
tison.





Re: Thanks for hiding ASF GitHub Bot logs on JIRA

2018-12-13 Thread Tzu-Li Chen
hmm..then what is it


Chesnay Schepler  于2018年12月14日周五 上午5:07写道:

> ehhfrom what I can tell this is not the case.
>
> On 13.12.2018 22:00, Tzu-Li Chen wrote:
> > Not sure how we arrive here but comments a bit noisy made by ASF GitHub
> Bot
> > are now hidden to Work Log. Thank you!
> >
> > Best,
> > tison.
> >
>
>


[jira] [Created] (FLINK-11161) Unable to import java packages in scala-shell

2018-12-13 Thread Jeff Zhang (JIRA)
Jeff Zhang created FLINK-11161:
--

 Summary: Unable to import java packages in scala-shell
 Key: FLINK-11161
 URL: https://issues.apache.org/jira/browse/FLINK-11161
 Project: Flink
  Issue Type: Improvement
  Components: Scala Shell
Affects Versions: 1.8.0
Reporter: Jeff Zhang
 Attachments: 09_33_31__12_14_2018.jpg, 
image-2018-12-14-09-55-29-523.png, image-2018-12-14-09-56-18-275.png





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Apply for flink contributor permission

2018-12-13 Thread 宋辛童(五藏)
Hi there,

Could anyone kindly give me the contributor permission? My JIRA id is: wuzang

Thank You,
Tony Xintong Song



[jira] [Created] (FLINK-11162) Provide a rest API to list all logical operators

2018-12-13 Thread vinoyang (JIRA)
vinoyang created FLINK-11162:


 Summary: Provide a rest API to list all logical operators
 Key: FLINK-11162
 URL: https://issues.apache.org/jira/browse/FLINK-11162
 Project: Flink
  Issue Type: New Feature
  Components: REST
Reporter: vinoyang
Assignee: vinoyang


The scene of this issue:
We are using the indicator variable of the operator: , 
.
We have customized the display of the indicator. Based on the query purpose, we 
currently lack an interface to get all the logical operators of a job. The 
current rest API only provides the chained node information.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (FLINK-11163) RestClusterClientTest#testSendIsNotRetriableIfHttpNotFound with BindException

2018-12-13 Thread TisonKun (JIRA)
TisonKun created FLINK-11163:


 Summary: 
RestClusterClientTest#testSendIsNotRetriableIfHttpNotFound with BindException
 Key: FLINK-11163
 URL: https://issues.apache.org/jira/browse/FLINK-11163
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 1.8.0
Reporter: TisonKun
 Fix For: 1.8.0


{quote}
03:14:22.321 [ERROR] Tests run: 10, Failures: 0, Errors: 1, Skipped: 0, Time 
elapsed: 3.189 s <<< FAILURE! - in 
org.apache.flink.client.program.rest.RestClusterClientTest
03:14:22.322 [ERROR] 
testSendIsNotRetriableIfHttpNotFound(org.apache.flink.client.program.rest.RestClusterClientTest)
 Time elapsed: 0.043 s <<< ERROR!
java.net.BindException: Address already in use
{quote}

https://api.travis-ci.org/v3/job/467812798/log.txt



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)