date:20200204

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-04 Thread Aljoscha Krettek

Hi,

I think that's a good idea, but we will also soon have Flink 1.10 anyways.

Best,
Aljoscha

On 04.02.20 07:25, Hequn Cheng wrote:

Hi Jincheng,

+1 for this proposal.
From the perspective of users, I think it would nice to have PyFlink on
PyPI which makes it much easier to install PyFlink.

Best, Hequn

On Tue, Feb 4, 2020 at 1:09 PM Jeff Zhang wrote:

Xingbo Huang 于2020年2月4日周二 下午1:07写道：

Hi Jincheng,

Thanks for driving this.
+1 for this proposal.

Compared to building from source, downloading directly from PyPI will
greatly save the development cost of Python users.

Best,
Xingbo

Wei Zhong 于2020年2月4日周二 下午12:43写道：

Hi Jincheng,

Thanks for bring up this discussion!

+1 for this proposal. Building from source takes long time and requires
a good network environment. Some users may not have such an environment.
Uploading to PyPI will greatly improve the user experience.

Best,
Wei

jincheng sun 于2020年2月4日周二 上午11:49写道：

Hi folks,

I am very happy to receive some user inquiries about the use of Flink
Python API (PyFlink) recently. One of the more common questions is
whether
it is possible to install PyFlink without using source code build. The
most
convenient and natural way for users is to use `pip install
apache-flink`.
We originally planned to support the use of `pip install apache-flink`
in
Flink 1.10, but the reason for this decision was that when Flink 1.9 was
released at August 22, 2019[1], Flink's PyPI account system was not
ready.
At present, our PyPI account is available at October 09, 2019 [2](Only
PMC
can access), So for the convenience of users I propose:

- Publish the latest release version (1.9.2) of Flink 1.9 to PyPI.
- Update Flink 1.9 documentation to add support for `pip install`.

As we all know, Flink 1.9.2 was just completed released at January 31,
2020
[3]. There is still at least 1 to 2 months before the release of 1.9.3,
so
my proposal is completely considered from the perspective of user
convenience. Although the proposed work is not large, we have not set a
precedent for independent release of the Flink Python API(PyFlink) in
the
previous release process. I hereby initiate the current discussion and
look
forward to your feedback!

Best,
Jincheng

[1]

https://lists.apache.org/thread.html/4a4d23c449f26b66bc58c71cc1a5c6079c79b5049c6c6744224c5f46%40%3Cdev.flink.apache.org%3E
[2]

https://lists.apache.org/thread.html/8273a5e8834b788d8ae552a5e177b69e04e96c0446bb90979444deee%40%3Cprivate.flink.apache.org%3E
[3]

https://lists.apache.org/thread.html/ra27644a4e111476b6041e8969def4322f47d5e0aae8da3ef30cd2926%40%3Cdev.flink.apache.org%3E

--
Best Regards

Jeff Zhang

[jira] [Created] (FLINK-15898) Make Stateful Functions release-ready

2020-02-04 Thread Tzu-Li (Gordon) Tai (Jira)

Tzu-Li (Gordon) Tai created FLINK-15898:
---

 Summary: Make Stateful Functions release-ready
 Key: FLINK-15898
 URL: https://issues.apache.org/jira/browse/FLINK-15898
 Project: Flink
  Issue Type: Task
  Components: Release System, Stateful Functions
Affects Versions: statefun-1.1
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai


As a preparation for releasing the first release for Apache Flink Stateful 
Functions, there's a few things that needs to be done:

*Legal*
* Missing NOTICE files for canonical source distribution in project root
* Missing NOTICE / LICENSE files under {{META-INF}} directory of to-be-released 
Maven artifact jars.

*Tooling*
Need utility scripts to create release branches, create source distribution, 
building and deploying staging artifacts, etc. Most of this can probably be 
based on existing tooling in {{apache/flink}}.

Stateful Functions releases would also include a base Docker image for building 
applications, so we also need to look into how we want to do that. Most likely 
this will be blocked by the ongoing efforts in releasing an official Flink 
image endorsed by the Apache Flink PMC.

*Documentation*
All information about releasing a Stateful Functions release should also be 
documented in the Flink community project wiki.

This is an umbrella issue covering all necessary subtasks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15897) Deserialize the Python UDF execution results when sending them out

2020-02-04 Thread Dian Fu (Jira)

Dian Fu created FLINK-15897:
---

 Summary: Deserialize the Python UDF execution results when sending 
them out
 Key: FLINK-15897
 URL: https://issues.apache.org/jira/browse/FLINK-15897
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.11.0


Currently, the Python UDF execution results are deserialized and then buffered 
in a collection when received from the Python worker. The deserialization could 
be deferred when sending the execution results to the downstream operator. 
That's to say, it buffers the serialized bytes instead of the deserialized Java 
objects in the buffer. This could reduce the memory footprint of the Java 
operator.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15899) Stateful Functions root POM should use the ASF parent POM

2020-02-04 Thread Tzu-Li (Gordon) Tai (Jira)

Tzu-Li (Gordon) Tai created FLINK-15899:
---

 Summary: Stateful Functions root POM should use the ASF parent POM
 Key: FLINK-15899
 URL: https://issues.apache.org/jira/browse/FLINK-15899
 Project: Flink
  Issue Type: Sub-task
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai


Stateful Functions should use the ASF parent POM: 
https://maven.apache.org/pom/asf/.

The ASF parent POM covers a few things that is required for building and 
releasing code for any Apache project, such as
* pulling in licenses / notice files for Maven artifacts
* setup staged releases to the Apache Nexus instance
*  etc.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15900) JoinITCase#testRightJoinWithPk failed on Travis

2020-02-04 Thread Gary Yao (Jira)

Gary Yao created FLINK-15900:


 Summary: JoinITCase#testRightJoinWithPk failed on Travis
 Key: FLINK-15900
 URL: https://issues.apache.org/jira/browse/FLINK-15900
 Project: Flink
  Issue Type: Bug
Affects Versions: 1.10.0
Reporter: Gary Yao


{noformat}
org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
at 
org.apache.flink.table.planner.runtime.stream.sql.JoinITCase.testRightJoinWithPk(JoinITCase.scala:672)
Caused by: org.apache.flink.runtime.JobException: Recovery is suppressed by 
FixedDelayRestartBackoffTimeStrategy(maxNumberRestartAttempts=1, 
backoffTimeMS=0)
Caused by: java.lang.Exception: Exception while creating 
StreamOperatorStateContext.
Caused by: org.apache.flink.util.FlinkException: Could not restore keyed state 
backend for KeyedProcessOperator_17aecc34cf8aa256be6fe4836cbdf29a_(2/4) from 
any of the 1 provided restore options.
Caused by: java.io.IOException: Failed to acquire shared cache resource for 
RocksDB
Caused by: org.apache.flink.runtime.memory.MemoryAllocationException: Could not 
created the shared memory resource of size 20971520. Not enough memory left to 
reserve from the slot's managed memory.
Caused by: org.apache.flink.runtime.memory.MemoryReservationException: Could 
not allocate 20971520 bytes. Only 0 bytes are remaining.

{noformat}

https://api.travis-ci.org/v3/job/645466432/log.txt



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15901) Support partitioned generic table in HiveCatalog

2020-02-04 Thread Rui Li (Jira)

Rui Li created FLINK-15901:
--

 Summary: Support partitioned generic table in HiveCatalog
 Key: FLINK-15901
 URL: https://issues.apache.org/jira/browse/FLINK-15901
 Project: Flink
  Issue Type: Task
  Components: Connectors / Hive
Reporter: Rui Li






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2020-02-04 Thread Jingsong Li

Hi all,

For your information, we have document the dependencies detailed
information [1]. I think it's a lot clearer than before, but it's worse
than presto and spark (they avoid or have built-in hive dependency).

I thought about Stephan's suggestion:
- The hive/lib has 200+ jars, but we only need hive-exec.jar or plus two or
three jars, if so many jars are introduced, maybe will there be a big
conflict.
- And hive/lib is not available on every machine. We need to upload so many
jars.
- A separate classloader maybe hard to work too, our flink-connector-hive
need hive jars, we may need to deal with flink-connector-hive jar spacial
too.
CC: Rui Li

I think the best system to integrate with hive is presto, which only
connects hive metastore through thrift protocol. But I understand that it
costs a lot to rewrite the code.

[1]
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#dependencies

Best,
Jingsong Lee

On Tue, Feb 4, 2020 at 1:44 AM Stephan Ewen  wrote:

> We have had much trouble in the past from "too deep too custom"
> integrations that everyone got out of the box, i.e., Hadoop.
> Flink has has such a broad spectrum of use cases, if we have custom build
> for every other framework in that spectrum, we'll be in trouble.
>
> So I would also be -1 for custom builds.
>
> Couldn't we do something similar as we started doing for Hadoop? Moving
> away from convenience downloads to allowing users to "export" their setup
> for Flink?
>
>   - We can have a "hive module (loader)" in flink/lib by default
>   - The module loader would look for an environment variable like
> "HIVE_CLASSPATH" and load these classes (ideally in a separate
> classloader).
>   - The loader can search for certain classes and instantiate catalog /
> functions / etc. when finding them instantiates the hive module referencing
> them
>   - That way, we use exactly what users have installed, without needing to
> build our own bundles.
>
> Could that work?
>
> Best,
> Stephan
>
>
> On Wed, Dec 18, 2019 at 9:43 AM Till Rohrmann 
> wrote:
>
> > Couldn't it simply be documented which jars are in the convenience jars
> > which are pre built and can be downloaded from the website? Then people
> who
> > need a custom version know which jars they need to provide to Flink?
> >
> > Cheers,
> > Till
> >
> > On Tue, Dec 17, 2019 at 6:49 PM Bowen Li  wrote:
> >
> > > I'm not sure providing an uber jar would be possible.
> > >
> > > Different from kafka and elasticsearch connector who have dependencies
> > for
> > > a specific kafka/elastic version, or the kafka universal connector that
> > > provides good compatibilities, hive connector needs to deal with hive
> > jars
> > > in all 1.x, 2.x, 3.x versions (let alone all the HDP/CDH distributions)
> > > with incompatibility even between minor versions, different versioned
> > > hadoop and other extra dependency jars for each hive version.
> > >
> > > Besides, users usually need to be able to easily see which individual
> > jars
> > > are required, which is invisible from an uber jar. Hive users already
> > have
> > > their hive deployments. They usually have to use their own hive jars
> > > because, unlike hive jars on mvn, their own jars contain changes
> in-house
> > > or from vendors. They need to easily tell which jars Flink requires for
> > > corresponding open sourced hive version to their own hive deployment,
> and
> > > copy in-hosue jars over from hive deployments as replacements.
> > >
> > > Providing a script to download all the individual jars for a specified
> > hive
> > > version can be an alternative.
> > >
> > > The goal is we need to provide a *product*, not a technology, to make
> it
> > > less hassle for Hive users. Afterall, it's Flink embracing Hive
> community
> > > and ecosystem, not the other way around. I'd argue Hive connector can
> be
> > > treat differently because its community/ecosystem/userbase is much
> larger
> > > than the other connectors, and it's way more important than other
> > > connectors to Flink on the mission of becoming a batch/streaming
> unified
> > > engine and get Flink more widely adopted.
> > >
> > >
> > > On Sun, Dec 15, 2019 at 10:03 PM Danny Chan 
> > wrote:
> > >
> > > > Also -1 on separate builds.
> > > >
> > > > After referencing some other BigData engines for distribution[1], i
> > > didn't
> > > > find strong needs to publish a separate build
> > > > for just a separate Hive version, indeed there are builds for
> different
> > > > Hadoop version.
> > > >
> > > > Just like Seth and Aljoscha said, we could push a
> > > > flink-hive-version-uber.jar to use as a lib of SQL-CLI or other use
> > > cases.
> > > >
> > > > [1] https://spark.apache.org/downloads.html
> > > > [2]
> > > https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html
> > > >
> > > > Best,
> > > > Danny Chan
> > > > 在 2019年12月14日 +0800 AM3:03，dev@flink.apache.org，写道：
> > > > >
> > > > >
> > > >
> > >
> >
> https://ci.apache.org/projects/flink/flink

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

2020-02-04 Thread Becket Qin

Thanks for the suggestion, Till.

I am curious about how do we usually decide when to put the jars into the
opt folder?

Technically speaking, it seems that `flink-ml-api` should be put into the
opt directory because they are actually API instead of libraries, just like
CEP and Table.

`flink-ml-lib` seems to be on the border. On one hand, it is a library. On
the other hand, unlike SQL formats and Hadoop whose major code are outside
of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more like
those of built-in SQL UDFs. So it seems fine to either put it in the opt
folder or in the downloads page.

>From the user experience perspective, it might be better to have both
`flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to two
places for the required dependencies.

Thanks,

Jiangjie (Becket) Qin

On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng  wrote:

> Hi Till,
>
> Thanks a lot for your suggestion. It's a good idea to offer the flink-ml
> libraries as optional dependencies on the download page which can make the
> dist smaller.
>
> But I also have some concerns for it, e.g., the download page now only
> includes the latest 3 releases. We may need to find ways to support more
> versions.
> On the other hand, the size of the flink-ml libraries now is very
> small(about 246K), so it would not bring much impact on the size of dist.
>
> What do you think?
>
> Best,
> Hequn
>
> On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann  wrote:
>
>> An alternative solution would be to offer the flink-ml libraries as
>> optional dependencies on the download page. Similar to how we offer the
>> different SQL formats and Hadoop releases [1].
>>
>> [1] https://flink.apache.org/downloads.html
>>
>> Cheers,
>> Till
>>
>> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng  wrote:
>>
>> > Thank you all for your feedback and suggestions!
>> >
>> > Best, Hequn
>> >
>> > On Mon, Feb 3, 2020 at 5:07 PM Becket Qin  wrote:
>> >
>> > > Thanks for bringing up the discussion, Hequn.
>> > >
>> > > +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
>> make
>> > > it much easier for the users to try out some simple ml tasks.
>> > >
>> > > Thanks,
>> > >
>> > > Jiangjie (Becket) Qin
>> > >
>> > > On Mon, Feb 3, 2020 at 4:34 PM jincheng sun > >
>> > > wrote:
>> > >
>> > >> Thank you for pushing forward @Hequn Cheng  !
>> > >>
>> > >> Hi  @Becket Qin  , Do you have any concerns on
>> > >> this ?
>> > >>
>> > >> Best,
>> > >> Jincheng
>> > >>
>> > >> Hequn Cheng  于2020年2月3日周一 下午2:09写道：
>> > >>
>> > >>> Hi everyone,
>> > >>>
>> > >>> Thanks for the feedback. As there are no objections, I've opened a
>> JIRA
>> > >>> issue(FLINK-15847[1]) to address this issue.
>> > >>> The implementation details can be discussed in the issue or in the
>> > >>> following PR.
>> > >>>
>> > >>> Best,
>> > >>> Hequn
>> > >>>
>> > >>> [1] https://issues.apache.org/jira/browse/FLINK-15847
>> > >>>
>> > >>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng 
>> > wrote:
>> > >>>
>> > >>> > Hi Jincheng,
>> > >>> >
>> > >>> > Thanks a lot for your feedback!
>> > >>> > Yes, I agree with you. There are cases that multi jars need to be
>> > >>> > uploaded. I will prepare another discussion later. Maybe with a
>> > simple
>> > >>> > design doc.
>> > >>> >
>> > >>> > Best, Hequn
>> > >>> >
>> > >>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
>> > sunjincheng...@gmail.com>
>> > >>> > wrote:
>> > >>> >
>> > >>> >> Thanks for bring up this discussion Hequn!
>> > >>> >>
>> > >>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
>> > >>> >>
>> > >>> >> BTW: I think would be great if bring up a discussion for upload
>> > >>> multiple
>> > >>> >> Jars at the same time. as PyFlink JOB also can have the benefit
>> if
>> > we
>> > >>> do
>> > >>> >> that improvement.
>> > >>> >>
>> > >>> >> Best,
>> > >>> >> Jincheng
>> > >>> >>
>> > >>> >>
>> > >>> >> Hequn Cheng  于2020年1月8日周三 上午11:50写道：
>> > >>> >>
>> > >>> >> > Hi everyone,
>> > >>> >> >
>> > >>> >> > FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI which
>> > moves
>> > >>> >> Flink
>> > >>> >> > ML a step further. Base on it, users can develop their ML jobs
>> and
>> > >>> more
>> > >>> >> and
>> > >>> >> > more machine learning platforms are providing ML services.
>> > >>> >> >
>> > >>> >> > However, the problem now is the jars of flink-ml-api and
>> > >>> flink-ml-lib
>> > >>> >> are
>> > >>> >> > only exist on maven repo. Whenever users want to submit ML
>> jobs,
>> > >>> they
>> > >>> >> can
>> > >>> >> > only depend on the ml modules and package a fat jar. This
>> would be
>> > >>> >> > inconvenient especially for the machine learning platforms on
>> > which
>> > >>> >> nearly
>> > >>> >> > all jobs depend on Flink ML modules and have to package a fat
>> jar.
>> > >>> >> >
>> > >>> >> > Given this, it would be better to include jars of flink-ml-api
>> and
>> > >>> >> > flink-ml-lib in the `opt` folder, so that users can directly
>> use
>> > the
>> > >>> >> jars
>> > >>> >> > with the

Re: [DISCUSS] have separate Flink distributions with built-in Hive dependencies

2020-02-04 Thread Rui Li

Hi Stephan,

As Jingsong stated, in our documentation the recommended way to add Hive
deps is to use exactly what users have installed. It's just we ask users to
manually add those jars, instead of automatically find them based on env
variables. I prefer to keep it this way for a while, and see if there're
real concerns/complaints from user feedbacks.

Please also note the Hive jars are not the only ones needed to integrate
with Hive, users have to make sure flink-connector-hive and Hadoop jars are
in classpath too. So I'm afraid a single "HIVE" env variable wouldn't save
all the manual work for our users.

On Tue, Feb 4, 2020 at 5:54 PM Jingsong Li  wrote:

> Hi all,
>
> For your information, we have document the dependencies detailed
> information [1]. I think it's a lot clearer than before, but it's worse
> than presto and spark (they avoid or have built-in hive dependency).
>
> I thought about Stephan's suggestion:
> - The hive/lib has 200+ jars, but we only need hive-exec.jar or plus two
> or three jars, if so many jars are introduced, maybe will there be a big
> conflict.
> - And hive/lib is not available on every machine. We need to upload so
> many jars.
> - A separate classloader maybe hard to work too, our flink-connector-hive
> need hive jars, we may need to deal with flink-connector-hive jar spacial
> too.
> CC: Rui Li
>
> I think the best system to integrate with hive is presto, which only
> connects hive metastore through thrift protocol. But I understand that it
> costs a lot to rewrite the code.
>
> [1]
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/hive/#dependencies
>
> Best,
> Jingsong Lee
>
> On Tue, Feb 4, 2020 at 1:44 AM Stephan Ewen  wrote:
>
>> We have had much trouble in the past from "too deep too custom"
>> integrations that everyone got out of the box, i.e., Hadoop.
>> Flink has has such a broad spectrum of use cases, if we have custom build
>> for every other framework in that spectrum, we'll be in trouble.
>>
>> So I would also be -1 for custom builds.
>>
>> Couldn't we do something similar as we started doing for Hadoop? Moving
>> away from convenience downloads to allowing users to "export" their setup
>> for Flink?
>>
>>   - We can have a "hive module (loader)" in flink/lib by default
>>   - The module loader would look for an environment variable like
>> "HIVE_CLASSPATH" and load these classes (ideally in a separate
>> classloader).
>>   - The loader can search for certain classes and instantiate catalog /
>> functions / etc. when finding them instantiates the hive module
>> referencing
>> them
>>   - That way, we use exactly what users have installed, without needing to
>> build our own bundles.
>>
>> Could that work?
>>
>> Best,
>> Stephan
>>
>>
>> On Wed, Dec 18, 2019 at 9:43 AM Till Rohrmann 
>> wrote:
>>
>> > Couldn't it simply be documented which jars are in the convenience jars
>> > which are pre built and can be downloaded from the website? Then people
>> who
>> > need a custom version know which jars they need to provide to Flink?
>> >
>> > Cheers,
>> > Till
>> >
>> > On Tue, Dec 17, 2019 at 6:49 PM Bowen Li  wrote:
>> >
>> > > I'm not sure providing an uber jar would be possible.
>> > >
>> > > Different from kafka and elasticsearch connector who have dependencies
>> > for
>> > > a specific kafka/elastic version, or the kafka universal connector
>> that
>> > > provides good compatibilities, hive connector needs to deal with hive
>> > jars
>> > > in all 1.x, 2.x, 3.x versions (let alone all the HDP/CDH
>> distributions)
>> > > with incompatibility even between minor versions, different versioned
>> > > hadoop and other extra dependency jars for each hive version.
>> > >
>> > > Besides, users usually need to be able to easily see which individual
>> > jars
>> > > are required, which is invisible from an uber jar. Hive users already
>> > have
>> > > their hive deployments. They usually have to use their own hive jars
>> > > because, unlike hive jars on mvn, their own jars contain changes
>> in-house
>> > > or from vendors. They need to easily tell which jars Flink requires
>> for
>> > > corresponding open sourced hive version to their own hive deployment,
>> and
>> > > copy in-hosue jars over from hive deployments as replacements.
>> > >
>> > > Providing a script to download all the individual jars for a specified
>> > hive
>> > > version can be an alternative.
>> > >
>> > > The goal is we need to provide a *product*, not a technology, to make
>> it
>> > > less hassle for Hive users. Afterall, it's Flink embracing Hive
>> community
>> > > and ecosystem, not the other way around. I'd argue Hive connector can
>> be
>> > > treat differently because its community/ecosystem/userbase is much
>> larger
>> > > than the other connectors, and it's way more important than other
>> > > connectors to Flink on the mission of becoming a batch/streaming
>> unified
>> > > engine and get Flink more widely adopted.
>> > >
>> > >
>> > > On Sun, Dec 15, 2019 at 10:03

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-04 Thread jincheng sun

Hi Aljoscha,

I agree that the coming PyFlink package of 1.10 is enough for users who
just want to try out the latest version of PyFlink. However, I think that
there are also cases where the old versions are useful. Usually users tend
to prepare their development environment according to the version of the
cluster. Suppose that the Flink cluster version is 1.9, then users may want
to install the package of PyFlink 1.9. Even if he installs the 1.10 package
from the PyPI, it will fail in cases that he tries to submit the Python job
to the 1.9 cluster using the python shell shipped with the 1.10 package.
(The reason is that the JobGraph of 1.10 is incompatible with the JobGraph
of 1.9)

As the 1.9.2 is already released, considering that publishing it to PyPI
requires not too much efforts(I have already prepared the package),
personally I think it worths to do that.

What's your thought? :)

Best,
Jincheng

Aljoscha Krettek  于2020年2月4日周二 下午4:00写道：

> Hi,
>
> I think that's a good idea, but we will also soon have Flink 1.10 anyways.
>
> Best,
> Aljoscha
>
> On 04.02.20 07:25, Hequn Cheng wrote:
> > Hi Jincheng,
> >
> > +1 for this proposal.
> >  From the perspective of users, I think it would nice to have PyFlink on
> > PyPI which makes it much easier to install PyFlink.
> >
> > Best, Hequn
> >
> > On Tue, Feb 4, 2020 at 1:09 PM Jeff Zhang  wrote:
> >
> >> +1
> >>
> >>
> >> Xingbo Huang  于2020年2月4日周二 下午1:07写道：
> >>
> >>> Hi Jincheng,
> >>>
> >>> Thanks for driving this.
> >>> +1 for this proposal.
> >>>
> >>> Compared to building from source, downloading directly from PyPI will
> >>> greatly save the development cost of Python users.
> >>>
> >>> Best,
> >>> Xingbo
> >>>
> >>>
> >>>
> >>> Wei Zhong  于2020年2月4日周二 下午12:43写道：
> >>>
>  Hi Jincheng,
> 
>  Thanks for bring up this discussion!
> 
>  +1 for this proposal. Building from source takes long time and
> requires
>  a good network environment. Some users may not have such an
> environment.
>  Uploading to PyPI will greatly improve the user experience.
> 
>  Best,
>  Wei
> 
>  jincheng sun  于2020年2月4日周二 上午11:49写道：
> 
> > Hi folks,
> >
> > I am very happy to receive some user inquiries about the use of Flink
> > Python API (PyFlink) recently. One of the more common questions is
> > whether
> > it is possible to install PyFlink without using source code build.
> The
> > most
> > convenient and natural way for users is to use `pip install
> > apache-flink`.
> > We originally planned to support the use of `pip install
> apache-flink`
> > in
> > Flink 1.10, but the reason for this decision was that when Flink 1.9
> was
> > released at August 22, 2019[1], Flink's PyPI account system was not
> > ready.
> > At present, our PyPI account is available at October 09, 2019
> [2](Only
> > PMC
> > can access), So for the convenience of users I propose:
> >
> > - Publish the latest release version (1.9.2) of Flink 1.9 to PyPI.
> > - Update Flink 1.9 documentation to add support for `pip install`.
> >
> > As we all know, Flink 1.9.2 was just completed released at January
> 31,
> > 2020
> > [3]. There is still at least 1 to 2 months before the release of
> 1.9.3,
> > so
> > my proposal is completely considered from the perspective of user
> > convenience. Although the proposed work is not large, we have not
> set a
> > precedent for independent release of the Flink Python API(PyFlink) in
> > the
> > previous release process. I hereby initiate the current discussion
> and
> > look
> > forward to your feedback!
> >
> > Best,
> > Jincheng
> >
> > [1]
> >
> >
> https://lists.apache.org/thread.html/4a4d23c449f26b66bc58c71cc1a5c6079c79b5049c6c6744224c5f46%40%3Cdev.flink.apache.org%3E
> > [2]
> >
> >
> https://lists.apache.org/thread.html/8273a5e8834b788d8ae552a5e177b69e04e96c0446bb90979444deee%40%3Cprivate.flink.apache.org%3E
> > [3]
> >
> >
> https://lists.apache.org/thread.html/ra27644a4e111476b6041e8969def4322f47d5e0aae8da3ef30cd2926%40%3Cdev.flink.apache.org%3E
> >
> 
> >>
> >> --
> >> Best Regards
> >>
> >> Jeff Zhang
> >>
> >
>

Re: [DISCUSS] Improve TableFactory

2020-02-04 Thread Timo Walther


Hi Jingsong,

some last minute changes from my side:

1. rename `getTableIdentifier` to `getObjectIdentifier` to keep the API 
obvious. Otherwise people expect a `TableIdentifier` class being 
returned here.


2. rename `getTableConfig` to `getConfiguration()` in the future this 
will not only be a "table" config but might give access to the full 
Flink config


Thanks,
Timo


On 04.02.20 06:27, Jingsong Li wrote:

So the interface will be:

public interface TableSourceFactory extends TableFactory {
..

/**
 * Creates and configures a {@link TableSource} based on the given
{@link Context}.
 *
 * @param context context of this table source.
 * @return the configured table source.
 */
default TableSource createTableSource(Context context) {
   ObjectIdentifier tableIdentifier = context.getTableIdentifier();
   return createTableSource(
 new ObjectPath(tableIdentifier.getDatabaseName(),
tableIdentifier.getObjectName()),
 context.getTable());
}
/**
 * Context of table source creation. Contains table information and
environment information.
 */
interface Context {
   /**
* @return full identifier of the given {@link CatalogTable}.
*/
   ObjectIdentifier getTableIdentifier();
   /**
* @return table {@link CatalogTable} instance.
*/
   CatalogTable getTable();
   /**
* @return readable config of this table environment.
*/
   ReadableConfig getTableConfig();
}
}

public interface TableSinkFactory extends TableFactory {
..
/**
 * Creates and configures a {@link TableSink} based on the given
{@link Context}.
 *
 * @param context context of this table sink.
 * @return the configured table sink.
 */
default TableSink createTableSink(Context context) {
   ObjectIdentifier tableIdentifier = context.getTableIdentifier();
   return createTableSink(
 new ObjectPath(tableIdentifier.getDatabaseName(),
tableIdentifier.getObjectName()),
 context.getTable());
}
/**
 * Context of table sink creation. Contains table information and
environment information.
 */
interface Context {
   /**
* @return full identifier of the given {@link CatalogTable}.
*/
   ObjectIdentifier getTableIdentifier();
   /**
* @return table {@link CatalogTable} instance.
*/
   CatalogTable getTable();
   /**
* @return readable config of this table environment.
*/
   ReadableConfig getTableConfig();
}
}


Best,
Jingsong Lee

On Tue, Feb 4, 2020 at 1:22 PM Jingsong Li  wrote:


Hi all,

After rethinking and discussion with Kurt, I'd like to remove "isBounded".
We can delay this is bounded message to TableSink.
With TableSink refactor, we need consider "consumeDataStream"
and "consumeBoundedStream".

Best,
Jingsong Lee

On Mon, Feb 3, 2020 at 4:17 PM Jingsong Li  wrote:


Hi Jark,

Thanks involving, yes, it's hard to understand to add isBounded on the
source.
I recommend adding only to sink at present, because sink has upstream.
Its upstream is either bounded or unbounded.

Hi all,

Let me summarize with your suggestions.

public interface TableSourceFactory extends TableFactory {

..


/**
 * Creates and configures a {@link TableSource} based on the given {@link 
Context}.
 *
 * @param context context of this table source.
 * @return the configured table source.
 */
default TableSource createTableSource(Context context) {
   ObjectIdentifier tableIdentifier = context.getTableIdentifier();
   return createTableSource(
 new ObjectPath(tableIdentifier.getDatabaseName(), 
tableIdentifier.getObjectName()),
 context.getTable());
}

/**
 * Context of table source creation. Contains table information and 
environment information.
 */
interface Context {

   /**
* @return full identifier of the given {@link CatalogTable}.
*/
   ObjectIdentifier getTableIdentifier();

   /**
* @return table {@link CatalogTable} instance.
*/
   CatalogTable getTable();

   /**
* @return readable config of this table environment.
*/
   ReadableConfig getTableConfig();
}
}

public interface TableSinkFactory extends TableFactory {

..

/**
 * Creates and configures a {@link TableSink} based on the given {@link 
Context}.
 *
 * @param context context of this table sink.
 * @return the configured table sink.
 */
default TableSink createTableSink(Context context) {
   ObjectIdentifier tableIdentifier = context.getTableIdentifier();
   return createTableSink(
 new ObjectPath(tableIdentifier.getDatabaseName(), 
tableIdentifier.getObjectName()),
 context.getTable());
}

/**
 * Context of table sink creation. Contains table informatio

Re: [DISCUSS] Support User-Defined Table Function in PyFlink

2020-02-04 Thread jincheng sun

Thanks for bring up this discussion Xingbo!

The the design is pretty nice for me! This feature is really need which
mentioned in FLIP-58. So, I think is better to create the JIRA and open the
PR, then more detail can be reviewed. :)

Best,
Jincheng



Xingbo Huang  于2020年2月3日周一 下午3:02写道：

> Hi all,
>
> The scalar Python UDF has already been supported in coming release of 1.10,
> we’d like to introduce Python UDTF now. FLIP-58[1] has already introduced
> some content about Python UDTF. However, the implementation details are
> still not touched. I have drafted a design doc[2]. It includes the
> following items:
>
> - How to define Python UDTF.
>
> - The introduced rules for Python UDTF.
>
> - How to execute Python UDTF.
>
> Because the implementation relies on Beam's portability framework for
> Python user-defined table function execution and not all the contributors
> are familiar with it, I have done a prototype[3].
>
> Welcome any feedback.
>
> Best,
>
> Xingbo
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
>
> [2]
>
> https://docs.google.com/document/d/1Pkv5S0geoYQ2ySS5YTTBivJ3hoi-uzLXVQkDVIaR0cE/edit#heading=h.pzeztvig3kg1
> [3] https://github.com/HuangXingBo/flink/commits/FLINK-UDTF
>

[jira] [Created] (FLINK-15902) Improve the Python API doc of the version they are added

2020-02-04 Thread Dian Fu (Jira)

Dian Fu created FLINK-15902:
---

 Summary: Improve the Python API doc of the version they are added
 Key: FLINK-15902
 URL: https://issues.apache.org/jira/browse/FLINK-15902
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
 Fix For: 1.11.0, 1.10.1


Currently it's not possible to know in which version a Python API is added. 
This information is very useful for users and it should be added.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15903) Make REST port configurable in end to end test scripts

2020-02-04 Thread Robert Metzger (Jira)

Robert Metzger created FLINK-15903:
--

 Summary: Make REST port configurable in end to end test scripts
 Key: FLINK-15903
 URL: https://issues.apache.org/jira/browse/FLINK-15903
 Project: Flink
  Issue Type: Improvement
  Components: Test Infrastructure
Reporter: Robert Metzger
Assignee: Robert Metzger


The end to end tests have the default port (8081) hardcoded. Unfortunately, my 
employer has asked me to run a daemon on my system, which uses port 8081. 
This makes executing end to end tests on my machine a cumbersome experience.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Release 1.10.0, release candidate #1

2020-02-04 Thread Gary Yao

Hi Thomas,

> 2) Was there a change in how job recovery reflects in the uptime metric?
> Didn't uptime previously reset to 0 on recovery (now it just keeps
> increasing)?

The uptime is the difference between the current time and the time when the
job transitioned to RUNNING state. By default we no longer transition the
job
out of the RUNNING state when restarting. This has something to do with the
new scheduler which enables pipelined region failover by default [1].
Actually
we enabled pipelined region failover already in the binary distribution of
Flink 1.9 by setting:

jobmanager.execution.failover-strategy: region

in the default flink-conf.yaml. Unless you have removed this config option
or
you are using a custom yaml, you should be seeing this behavior in Flink
1.9.
If you do not want region failover, set

jobmanager.execution.failover-strategy: full


> 1) Is the low watermark display in the UI still broken?

I was not aware that this is broken. Is there an issue tracking this bug?

Best,
Gary

[1] https://issues.apache.org/jira/browse/FLINK-14651

On Tue, Feb 4, 2020 at 2:56 AM Thomas Weise  wrote:

> I opened a PR for FLINK-15868
> :
> https://github.com/apache/flink/pull/11006
>
> With that change, I was able to run an application that consumes from
> Kinesis.
>
> I should have data tomorrow regarding the performance.
>
> Two questions/observations:
>
> 1) Is the low watermark display in the UI still broken?
> 2) Was there a change in how job recovery reflects in the uptime metric?
> Didn't uptime previously reset to 0 on recovery (now it just keeps
> increasing)?
>
> Thanks,
> Thomas
>
>
>
>
> On Mon, Feb 3, 2020 at 10:55 AM Thomas Weise  wrote:
>
> > I found another issue with the Kinesis connector:
> >
> > https://issues.apache.org/jira/browse/FLINK-15868
> >
> >
> > On Mon, Feb 3, 2020 at 3:35 AM Gary Yao  wrote:
> >
> >> Hi everyone,
> >>
> >> I am hereby canceling the vote due to:
> >>
> >> FLINK-15837
> >> FLINK-15840
> >>
> >> Another RC will be created later today.
> >>
> >> Best,
> >> Gary
> >>
> >> On Mon, Jan 27, 2020 at 10:06 PM Gary Yao  wrote:
> >>
> >> > Hi everyone,
> >> > Please review and vote on the release candidate #1 for the version
> >> 1.10.0,
> >> > as follows:
> >> > [ ] +1, Approve the release
> >> > [ ] -1, Do not approve the release (please provide specific comments)
> >> >
> >> >
> >> > The complete staging area is available for your review, which
> includes:
> >> > * JIRA release notes [1],
> >> > * the official Apache source release and binary convenience releases
> to
> >> be
> >> > deployed to dist.apache.org [2], which are signed with the key with
> >> > fingerprint BB137807CEFBE7DD2616556710B12A1F89C115E8 [3],
> >> > * all artifacts to be deployed to the Maven Central Repository [4],
> >> > * source code tag "release-1.10.0-rc1" [5],
> >> >
> >> > The announcement blog post is in the works. I will update this voting
> >> > thread with a link to the pull request soon.
> >> >
> >> > The vote will be open for at least 72 hours. It is adopted by majority
> >> > approval, with at least 3 PMC affirmative votes.
> >> >
> >> > Thanks,
> >> > Yu & Gary
> >> >
> >> > [1]
> >> >
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845
> >> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc1/
> >> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> >> > [4]
> >> https://repository.apache.org/content/repositories/orgapacheflink-1325
> >> > [5] https://github.com/apache/flink/releases/tag/release-1.10.0-rc1
> >> >
> >>
> >
>

[jira] [Created] (FLINK-15904) Make Kafka Consumer work with activated "disableGenericTypes()"

2020-02-04 Thread Aljoscha Krettek (Jira)

Aljoscha Krettek created FLINK-15904:


 Summary: Make Kafka Consumer work with activated 
"disableGenericTypes()"
 Key: FLINK-15904
 URL: https://issues.apache.org/jira/browse/FLINK-15904
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Reporter: Aljoscha Krettek


A user reported a problem that the Kafka Consumer doesn't work in that case: 
https://lists.apache.org/thread.html/r462a854e8a0ab3512e2906b40411624f3164ea3af7cba61ee94cd760%40%3Cuser.flink.apache.org%3E.
 We should use a different constructor for {{ListStateDescriptor}} that takes 
{{TypeSerializer}} here: 
https://github.com/apache/flink/blob/68cc21e4af71505efa142110e35a1f8b1c25fe6e/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java#L860.
 This will circumvent the check.

My full analysis from the email thread:

{quote}
Unfortunately, the fact that the Kafka Sources use Kryo for state serialization 
is a very early design misstep that we cannot get rid of for now. We will get 
rid of that when the new source interface lands ([1]) and when we have a new 
Kafka Source based on that.

As a workaround, we should change the Kafka Consumer to go through a different 
constructor of ListStateDescriptor which directly takes a TypeSerializer 
instead of a TypeInformation here: [2]. This should sidestep the "no generic 
types" check.

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-27%3A+Refactor+Source+Interface
[2] 
https://github.com/apache/flink/blob/68cc21e4af71505efa142110e35a1f8b1c25fe6e/flink-connectors/flink-connector-kafka-base/src/main/java/org/apache/flink/streaming/connectors/kafka/FlinkKafkaConsumerBase.java#L860
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-04 Thread Aljoscha Krettek

Yes, as I said I think it's still good to release.

On 04.02.20 13:49, jincheng sun wrote:

Hi Aljoscha,

I agree that the coming PyFlink package of 1.10 is enough for users who
just want to try out the latest version of PyFlink. However, I think that
there are also cases where the old versions are useful. Usually users tend
to prepare their development environment according to the version of the
cluster. Suppose that the Flink cluster version is 1.9, then users may want
to install the package of PyFlink 1.9. Even if he installs the 1.10 package
from the PyPI, it will fail in cases that he tries to submit the Python job
to the 1.9 cluster using the python shell shipped with the 1.10 package.
(The reason is that the JobGraph of 1.10 is incompatible with the JobGraph
of 1.9)

As the 1.9.2 is already released, considering that publishing it to PyPI
requires not too much efforts(I have already prepared the package),
personally I think it worths to do that.

What's your thought? :)

Best,
Jincheng

Aljoscha Krettek 于2020年2月4日周二 下午4:00写道：

Hi,

I think that's a good idea, but we will also soon have Flink 1.10 anyways.

Best,
Aljoscha

On 04.02.20 07:25, Hequn Cheng wrote:

Hi Jincheng,

+1 for this proposal.
From the perspective of users, I think it would nice to have PyFlink on
PyPI which makes it much easier to install PyFlink.

Best, Hequn

On Tue, Feb 4, 2020 at 1:09 PM Jeff Zhang wrote:

Xingbo Huang 于2020年2月4日周二 下午1:07写道：

Hi Jincheng,

Thanks for driving this.
+1 for this proposal.

Compared to building from source, downloading directly from PyPI will
greatly save the development cost of Python users.

Best,
Xingbo

Wei Zhong 于2020年2月4日周二 下午12:43写道：

Hi Jincheng,

Thanks for bring up this discussion!

+1 for this proposal. Building from source takes long time and

requires

a good network environment. Some users may not have such an

environment.

Uploading to PyPI will greatly improve the user experience.

Best,
Wei

jincheng sun 于2020年2月4日周二 上午11:49写道：

Hi folks,

The

most
convenient and natural way for users is to use `pip install
apache-flink`.
We originally planned to support the use of `pip install

apache-flink`

in
Flink 1.10, but the reason for this decision was that when Flink 1.9

was

released at August 22, 2019[1], Flink's PyPI account system was not
ready.
At present, our PyPI account is available at October 09, 2019

[2](Only

PMC
can access), So for the convenience of users I propose:

- Publish the latest release version (1.9.2) of Flink 1.9 to PyPI.
- Update Flink 1.9 documentation to add support for `pip install`.

As we all know, Flink 1.9.2 was just completed released at January

31,

2020
[3]. There is still at least 1 to 2 months before the release of

1.9.3,

so
my proposal is completely considered from the perspective of user
convenience. Although the proposed work is not large, we have not

set a

precedent for independent release of the Flink Python API(PyFlink) in
the
previous release process. I hereby initiate the current discussion

and

look
forward to your feedback!

Best,
Jincheng

[1]

https://lists.apache.org/thread.html/4a4d23c449f26b66bc58c71cc1a5c6079c79b5049c6c6744224c5f46%40%3Cdev.flink.apache.org%3E

[2]

https://lists.apache.org/thread.html/8273a5e8834b788d8ae552a5e177b69e04e96c0446bb90979444deee%40%3Cprivate.flink.apache.org%3E

[3]

https://lists.apache.org/thread.html/ra27644a4e111476b6041e8969def4322f47d5e0aae8da3ef30cd2926%40%3Cdev.flink.apache.org%3E

--
Best Regards

Jeff Zhang

Re: [DISCUSS] Support User-Defined Table Function in PyFlink

2020-02-04 Thread Hequn Cheng

Hi Xingbo,

Thanks a lot for bringing up the discussion. Looks good from my side.
One suggestion beyond the document: it would be nice to avoid Scala code in
the flink-table module since we would like to get rid of Scala in the
long-term[1][2].

Best, Hequn

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free
[2] https://flink.apache.org/contributing/code-style-and-quality-scala.html


On Tue, Feb 4, 2020 at 9:09 PM jincheng sun 
wrote:

> Thanks for bring up this discussion Xingbo!
>
> The the design is pretty nice for me! This feature is really need which
> mentioned in FLIP-58. So, I think is better to create the JIRA and open the
> PR, then more detail can be reviewed. :)
>
> Best,
> Jincheng
>
>
>
> Xingbo Huang  于2020年2月3日周一 下午3:02写道：
>
> > Hi all,
> >
> > The scalar Python UDF has already been supported in coming release of
> 1.10,
> > we’d like to introduce Python UDTF now. FLIP-58[1] has already introduced
> > some content about Python UDTF. However, the implementation details are
> > still not touched. I have drafted a design doc[2]. It includes the
> > following items:
> >
> > - How to define Python UDTF.
> >
> > - The introduced rules for Python UDTF.
> >
> > - How to execute Python UDTF.
> >
> > Because the implementation relies on Beam's portability framework for
> > Python user-defined table function execution and not all the contributors
> > are familiar with it, I have done a prototype[3].
> >
> > Welcome any feedback.
> >
> > Best,
> >
> > Xingbo
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> >
> > [2]
> >
> >
> https://docs.google.com/document/d/1Pkv5S0geoYQ2ySS5YTTBivJ3hoi-uzLXVQkDVIaR0cE/edit#heading=h.pzeztvig3kg1
> > [3] https://github.com/HuangXingBo/flink/commits/FLINK-UDTF
> >
>

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

2020-02-04 Thread Till Rohrmann

I think there is no such rule that APIs go automatically into opt/ and
"libraries" not. The contents of opt/ have mainly grown over time w/o
following a strict rule.

I think the decisive factor for what goes into Flink's binary distribution
should be how core it is to Flink. Of course another important
consideration is which use cases Flink should promote "out of the box" (not
sure whether this is actual true for content shipped in opt/ because you
also have to move it to lib).

For example, Gelly would be an example which I would rather see as an
optional component than shipping it with every Flink binary distribution.

Cheers,
Till

On Tue, Feb 4, 2020 at 11:24 AM Becket Qin  wrote:

> Thanks for the suggestion, Till.
>
> I am curious about how do we usually decide when to put the jars into the
> opt folder?
>
> Technically speaking, it seems that `flink-ml-api` should be put into the
> opt directory because they are actually API instead of libraries, just like
> CEP and Table.
>
> `flink-ml-lib` seems to be on the border. On one hand, it is a library. On
> the other hand, unlike SQL formats and Hadoop whose major code are outside
> of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more like
> those of built-in SQL UDFs. So it seems fine to either put it in the opt
> folder or in the downloads page.
>
> From the user experience perspective, it might be better to have both
> `flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to two
> places for the required dependencies.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng  wrote:
>
> > Hi Till,
> >
> > Thanks a lot for your suggestion. It's a good idea to offer the flink-ml
> > libraries as optional dependencies on the download page which can make
> the
> > dist smaller.
> >
> > But I also have some concerns for it, e.g., the download page now only
> > includes the latest 3 releases. We may need to find ways to support more
> > versions.
> > On the other hand, the size of the flink-ml libraries now is very
> > small(about 246K), so it would not bring much impact on the size of dist.
> >
> > What do you think?
> >
> > Best,
> > Hequn
> >
> > On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann 
> wrote:
> >
> >> An alternative solution would be to offer the flink-ml libraries as
> >> optional dependencies on the download page. Similar to how we offer the
> >> different SQL formats and Hadoop releases [1].
> >>
> >> [1] https://flink.apache.org/downloads.html
> >>
> >> Cheers,
> >> Till
> >>
> >> On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng  wrote:
> >>
> >> > Thank you all for your feedback and suggestions!
> >> >
> >> > Best, Hequn
> >> >
> >> > On Mon, Feb 3, 2020 at 5:07 PM Becket Qin 
> wrote:
> >> >
> >> > > Thanks for bringing up the discussion, Hequn.
> >> > >
> >> > > +1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would
> >> make
> >> > > it much easier for the users to try out some simple ml tasks.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jiangjie (Becket) Qin
> >> > >
> >> > > On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <
> sunjincheng...@gmail.com
> >> >
> >> > > wrote:
> >> > >
> >> > >> Thank you for pushing forward @Hequn Cheng  !
> >> > >>
> >> > >> Hi  @Becket Qin  , Do you have any concerns
> on
> >> > >> this ?
> >> > >>
> >> > >> Best,
> >> > >> Jincheng
> >> > >>
> >> > >> Hequn Cheng  于2020年2月3日周一 下午2:09写道：
> >> > >>
> >> > >>> Hi everyone,
> >> > >>>
> >> > >>> Thanks for the feedback. As there are no objections, I've opened a
> >> JIRA
> >> > >>> issue(FLINK-15847[1]) to address this issue.
> >> > >>> The implementation details can be discussed in the issue or in the
> >> > >>> following PR.
> >> > >>>
> >> > >>> Best,
> >> > >>> Hequn
> >> > >>>
> >> > >>> [1] https://issues.apache.org/jira/browse/FLINK-15847
> >> > >>>
> >> > >>> On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng 
> >> > wrote:
> >> > >>>
> >> > >>> > Hi Jincheng,
> >> > >>> >
> >> > >>> > Thanks a lot for your feedback!
> >> > >>> > Yes, I agree with you. There are cases that multi jars need to
> be
> >> > >>> > uploaded. I will prepare another discussion later. Maybe with a
> >> > simple
> >> > >>> > design doc.
> >> > >>> >
> >> > >>> > Best, Hequn
> >> > >>> >
> >> > >>> > On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <
> >> > sunjincheng...@gmail.com>
> >> > >>> > wrote:
> >> > >>> >
> >> > >>> >> Thanks for bring up this discussion Hequn!
> >> > >>> >>
> >> > >>> >> +1 for include `flink-ml-api` and `flink-ml-lib` in opt.
> >> > >>> >>
> >> > >>> >> BTW: I think would be great if bring up a discussion for upload
> >> > >>> multiple
> >> > >>> >> Jars at the same time. as PyFlink JOB also can have the benefit
> >> if
> >> > we
> >> > >>> do
> >> > >>> >> that improvement.
> >> > >>> >>
> >> > >>> >> Best,
> >> > >>> >> Jincheng
> >> > >>> >>
> >> > >>> >>
> >> > >>> >> Hequn Cheng  于2020年1月8日周三 上午11:50写道：
> >> > >>> >>
> >> > >>> >> > Hi everyone,
> >> > >>> >> >
> >> > >>> >> > FLIP-39[1] rebuil

Re: [DISCUSS] Include flink-ml-api and flink-ml-lib in opt

2020-02-04 Thread Chesnay Schepler

Around a year ago I started a discussion 
 
on reducing the amount of jars we ship with the distribution.


While there was no definitive conclusion there was a shared sentiment 
that APIs should be shipped with the distribution.


On 04/02/2020 17:25, Till Rohrmann wrote:

I think there is no such rule that APIs go automatically into opt/ and
"libraries" not. The contents of opt/ have mainly grown over time w/o
following a strict rule.

I think the decisive factor for what goes into Flink's binary distribution
should be how core it is to Flink. Of course another important
consideration is which use cases Flink should promote "out of the box" (not
sure whether this is actual true for content shipped in opt/ because you
also have to move it to lib).

For example, Gelly would be an example which I would rather see as an
optional component than shipping it with every Flink binary distribution.

Cheers,
Till

On Tue, Feb 4, 2020 at 11:24 AM Becket Qin  wrote:


Thanks for the suggestion, Till.

I am curious about how do we usually decide when to put the jars into the
opt folder?

Technically speaking, it seems that `flink-ml-api` should be put into the
opt directory because they are actually API instead of libraries, just like
CEP and Table.

`flink-ml-lib` seems to be on the border. On one hand, it is a library. On
the other hand, unlike SQL formats and Hadoop whose major code are outside
of Flink, the algorithm codes are in Flink. So `flink-ml-lib` is more like
those of built-in SQL UDFs. So it seems fine to either put it in the opt
folder or in the downloads page.

 From the user experience perspective, it might be better to have both
`flink-ml-lib` and `flink-ml-api` in opt folder so users needn't go to two
places for the required dependencies.

Thanks,

Jiangjie (Becket) Qin

On Tue, Feb 4, 2020 at 2:32 PM Hequn Cheng  wrote:


Hi Till,

Thanks a lot for your suggestion. It's a good idea to offer the flink-ml
libraries as optional dependencies on the download page which can make

the

dist smaller.

But I also have some concerns for it, e.g., the download page now only
includes the latest 3 releases. We may need to find ways to support more
versions.
On the other hand, the size of the flink-ml libraries now is very
small(about 246K), so it would not bring much impact on the size of dist.

What do you think?

Best,
Hequn

On Mon, Feb 3, 2020 at 6:24 PM Till Rohrmann 

wrote:

An alternative solution would be to offer the flink-ml libraries as
optional dependencies on the download page. Similar to how we offer the
different SQL formats and Hadoop releases [1].

[1] https://flink.apache.org/downloads.html

Cheers,
Till

On Mon, Feb 3, 2020 at 10:19 AM Hequn Cheng  wrote:


Thank you all for your feedback and suggestions!

Best, Hequn

On Mon, Feb 3, 2020 at 5:07 PM Becket Qin 

wrote:

Thanks for bringing up the discussion, Hequn.

+1 on adding `flink-ml-api` and `flink-ml-lib` into opt. This would

make

it much easier for the users to try out some simple ml tasks.

Thanks,

Jiangjie (Becket) Qin

On Mon, Feb 3, 2020 at 4:34 PM jincheng sun <

sunjincheng...@gmail.com

wrote:


Thank you for pushing forward @Hequn Cheng  !

Hi  @Becket Qin  , Do you have any concerns

on

this ?

Best,
Jincheng

Hequn Cheng  于2020年2月3日周一 下午2:09写道：


Hi everyone,

Thanks for the feedback. As there are no objections, I've opened a

JIRA

issue(FLINK-15847[1]) to address this issue.
The implementation details can be discussed in the issue or in the
following PR.

Best,
Hequn

[1] https://issues.apache.org/jira/browse/FLINK-15847

On Wed, Jan 8, 2020 at 9:15 PM Hequn Cheng 

wrote:

Hi Jincheng,

Thanks a lot for your feedback!
Yes, I agree with you. There are cases that multi jars need to

be

uploaded. I will prepare another discussion later. Maybe with a

simple

design doc.

Best, Hequn

On Wed, Jan 8, 2020 at 3:06 PM jincheng sun <

sunjincheng...@gmail.com>

wrote:


Thanks for bring up this discussion Hequn!

+1 for include `flink-ml-api` and `flink-ml-lib` in opt.

BTW: I think would be great if bring up a discussion for upload

multiple

Jars at the same time. as PyFlink JOB also can have the benefit

if

we

do

that improvement.

Best,
Jincheng


Hequn Cheng  于2020年1月8日周三 上午11:50写道：


Hi everyone,

FLIP-39[1] rebuilds Flink ML pipeline on top of TableAPI

which

moves

Flink

ML a step further. Base on it, users can develop their ML

jobs

and

more

and

more machine learning platforms are providing ML services.

However, the problem now is the jars of flink-ml-api and

flink-ml-lib

are

only exist on maven repo. Whenever users want to submit ML

jobs,

they

can

only depend on the ml modules and package a fat jar. This

would be

inconvenient especially for the machine learning platforms on

which

nearly

all jobs depend on Flink ML modules and have to package a fat

jar.

Given this, it would be

[jira] [Created] (FLINK-15905) Fix Race Condition when releasing shared memory resource

2020-02-04 Thread Stephan Ewen (Jira)

Stephan Ewen created FLINK-15905:


 Summary: Fix Race Condition when releasing shared memory resource
 Key: FLINK-15905
 URL: https://issues.apache.org/jira/browse/FLINK-15905
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Task
Reporter: Stephan Ewen
 Fix For: 1.10.0






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-04 Thread jincheng sun

I see, thanks for the confirmation Aljoscha!

Best,
Jincheng



Aljoscha Krettek  于2020年2月4日周二 下午9:58写道：

> Yes, as I said I think it's still good to release.
>
> On 04.02.20 13:49, jincheng sun wrote:
> > Hi Aljoscha,
> >
> > I agree that the coming PyFlink package of 1.10 is enough for users who
> > just want to try out the latest version of PyFlink. However, I think that
> > there are also cases where the old versions are useful. Usually users
> tend
> > to prepare their development environment according to the version of the
> > cluster. Suppose that the Flink cluster version is 1.9, then users may
> want
> > to install the package of PyFlink 1.9. Even if he installs the 1.10
> package
> > from the PyPI, it will fail in cases that he tries to submit the Python
> job
> > to the 1.9 cluster using the python shell shipped with the 1.10 package.
> > (The reason is that the JobGraph of 1.10 is incompatible with the
> JobGraph
> > of 1.9)
> >
> > As the 1.9.2 is already released, considering that publishing it to PyPI
> > requires not too much efforts(I have already prepared the package),
> > personally I think it worths to do that.
> >
> > What's your thought? :)
> >
> > Best,
> > Jincheng
> >
> > Aljoscha Krettek  于2020年2月4日周二 下午4:00写道：
> >
> >> Hi,
> >>
> >> I think that's a good idea, but we will also soon have Flink 1.10
> anyways.
> >>
> >> Best,
> >> Aljoscha
> >>
> >> On 04.02.20 07:25, Hequn Cheng wrote:
> >>> Hi Jincheng,
> >>>
> >>> +1 for this proposal.
> >>>   From the perspective of users, I think it would nice to have PyFlink
> on
> >>> PyPI which makes it much easier to install PyFlink.
> >>>
> >>> Best, Hequn
> >>>
> >>> On Tue, Feb 4, 2020 at 1:09 PM Jeff Zhang  wrote:
> >>>
>  +1
> 
> 
>  Xingbo Huang  于2020年2月4日周二 下午1:07写道：
> 
> > Hi Jincheng,
> >
> > Thanks for driving this.
> > +1 for this proposal.
> >
> > Compared to building from source, downloading directly from PyPI will
> > greatly save the development cost of Python users.
> >
> > Best,
> > Xingbo
> >
> >
> >
> > Wei Zhong  于2020年2月4日周二 下午12:43写道：
> >
> >> Hi Jincheng,
> >>
> >> Thanks for bring up this discussion!
> >>
> >> +1 for this proposal. Building from source takes long time and
> >> requires
> >> a good network environment. Some users may not have such an
> >> environment.
> >> Uploading to PyPI will greatly improve the user experience.
> >>
> >> Best,
> >> Wei
> >>
> >> jincheng sun  于2020年2月4日周二 上午11:49写道：
> >>
> >>> Hi folks,
> >>>
> >>> I am very happy to receive some user inquiries about the use of
> Flink
> >>> Python API (PyFlink) recently. One of the more common questions is
> >>> whether
> >>> it is possible to install PyFlink without using source code build.
> >> The
> >>> most
> >>> convenient and natural way for users is to use `pip install
> >>> apache-flink`.
> >>> We originally planned to support the use of `pip install
> >> apache-flink`
> >>> in
> >>> Flink 1.10, but the reason for this decision was that when Flink
> 1.9
> >> was
> >>> released at August 22, 2019[1], Flink's PyPI account system was not
> >>> ready.
> >>> At present, our PyPI account is available at October 09, 2019
> >> [2](Only
> >>> PMC
> >>> can access), So for the convenience of users I propose:
> >>>
> >>> - Publish the latest release version (1.9.2) of Flink 1.9 to PyPI.
> >>> - Update Flink 1.9 documentation to add support for `pip install`.
> >>>
> >>> As we all know, Flink 1.9.2 was just completed released at January
> >> 31,
> >>> 2020
> >>> [3]. There is still at least 1 to 2 months before the release of
> >> 1.9.3,
> >>> so
> >>> my proposal is completely considered from the perspective of user
> >>> convenience. Although the proposed work is not large, we have not
> >> set a
> >>> precedent for independent release of the Flink Python API(PyFlink)
> in
> >>> the
> >>> previous release process. I hereby initiate the current discussion
> >> and
> >>> look
> >>> forward to your feedback!
> >>>
> >>> Best,
> >>> Jincheng
> >>>
> >>> [1]
> >>>
> >>>
> >>
> https://lists.apache.org/thread.html/4a4d23c449f26b66bc58c71cc1a5c6079c79b5049c6c6744224c5f46%40%3Cdev.flink.apache.org%3E
> >>> [2]
> >>>
> >>>
> >>
> https://lists.apache.org/thread.html/8273a5e8834b788d8ae552a5e177b69e04e96c0446bb90979444deee%40%3Cprivate.flink.apache.org%3E
> >>> [3]
> >>>
> >>>
> >>
> https://lists.apache.org/thread.html/ra27644a4e111476b6041e8969def4322f47d5e0aae8da3ef30cd2926%40%3Cdev.flink.apache.org%3E
> >>>
> >>
> 
>  --
>  Best Regards
> 
>  Jeff Zhang
> 
> >>>
> >>
> >
>

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-04 Thread jincheng sun

Thanks everyone for your positive feedback, I will continue to push forward
the subsequent release process.

Best, Jincheng


jincheng sun  于2020年2月5日周三 上午8:56写道：

> I see, thanks for the confirmation Aljoscha!
>
> Best,
> Jincheng
>
>
>
> Aljoscha Krettek  于2020年2月4日周二 下午9:58写道：
>
>> Yes, as I said I think it's still good to release.
>>
>> On 04.02.20 13:49, jincheng sun wrote:
>> > Hi Aljoscha,
>> >
>> > I agree that the coming PyFlink package of 1.10 is enough for users who
>> > just want to try out the latest version of PyFlink. However, I think
>> that
>> > there are also cases where the old versions are useful. Usually users
>> tend
>> > to prepare their development environment according to the version of the
>> > cluster. Suppose that the Flink cluster version is 1.9, then users may
>> want
>> > to install the package of PyFlink 1.9. Even if he installs the 1.10
>> package
>> > from the PyPI, it will fail in cases that he tries to submit the Python
>> job
>> > to the 1.9 cluster using the python shell shipped with the 1.10 package.
>> > (The reason is that the JobGraph of 1.10 is incompatible with the
>> JobGraph
>> > of 1.9)
>> >
>> > As the 1.9.2 is already released, considering that publishing it to PyPI
>> > requires not too much efforts(I have already prepared the package),
>> > personally I think it worths to do that.
>> >
>> > What's your thought? :)
>> >
>> > Best,
>> > Jincheng
>> >
>> > Aljoscha Krettek  于2020年2月4日周二 下午4:00写道：
>> >
>> >> Hi,
>> >>
>> >> I think that's a good idea, but we will also soon have Flink 1.10
>> anyways.
>> >>
>> >> Best,
>> >> Aljoscha
>> >>
>> >> On 04.02.20 07:25, Hequn Cheng wrote:
>> >>> Hi Jincheng,
>> >>>
>> >>> +1 for this proposal.
>> >>>   From the perspective of users, I think it would nice to have
>> PyFlink on
>> >>> PyPI which makes it much easier to install PyFlink.
>> >>>
>> >>> Best, Hequn
>> >>>
>> >>> On Tue, Feb 4, 2020 at 1:09 PM Jeff Zhang  wrote:
>> >>>
>>  +1
>> 
>> 
>>  Xingbo Huang  于2020年2月4日周二 下午1:07写道：
>> 
>> > Hi Jincheng,
>> >
>> > Thanks for driving this.
>> > +1 for this proposal.
>> >
>> > Compared to building from source, downloading directly from PyPI
>> will
>> > greatly save the development cost of Python users.
>> >
>> > Best,
>> > Xingbo
>> >
>> >
>> >
>> > Wei Zhong  于2020年2月4日周二 下午12:43写道：
>> >
>> >> Hi Jincheng,
>> >>
>> >> Thanks for bring up this discussion!
>> >>
>> >> +1 for this proposal. Building from source takes long time and
>> >> requires
>> >> a good network environment. Some users may not have such an
>> >> environment.
>> >> Uploading to PyPI will greatly improve the user experience.
>> >>
>> >> Best,
>> >> Wei
>> >>
>> >> jincheng sun  于2020年2月4日周二 上午11:49写道：
>> >>
>> >>> Hi folks,
>> >>>
>> >>> I am very happy to receive some user inquiries about the use of
>> Flink
>> >>> Python API (PyFlink) recently. One of the more common questions is
>> >>> whether
>> >>> it is possible to install PyFlink without using source code build.
>> >> The
>> >>> most
>> >>> convenient and natural way for users is to use `pip install
>> >>> apache-flink`.
>> >>> We originally planned to support the use of `pip install
>> >> apache-flink`
>> >>> in
>> >>> Flink 1.10, but the reason for this decision was that when Flink
>> 1.9
>> >> was
>> >>> released at August 22, 2019[1], Flink's PyPI account system was
>> not
>> >>> ready.
>> >>> At present, our PyPI account is available at October 09, 2019
>> >> [2](Only
>> >>> PMC
>> >>> can access), So for the convenience of users I propose:
>> >>>
>> >>> - Publish the latest release version (1.9.2) of Flink 1.9 to PyPI.
>> >>> - Update Flink 1.9 documentation to add support for `pip install`.
>> >>>
>> >>> As we all know, Flink 1.9.2 was just completed released at January
>> >> 31,
>> >>> 2020
>> >>> [3]. There is still at least 1 to 2 months before the release of
>> >> 1.9.3,
>> >>> so
>> >>> my proposal is completely considered from the perspective of user
>> >>> convenience. Although the proposed work is not large, we have not
>> >> set a
>> >>> precedent for independent release of the Flink Python
>> API(PyFlink) in
>> >>> the
>> >>> previous release process. I hereby initiate the current discussion
>> >> and
>> >>> look
>> >>> forward to your feedback!
>> >>>
>> >>> Best,
>> >>> Jincheng
>> >>>
>> >>> [1]
>> >>>
>> >>>
>> >>
>> https://lists.apache.org/thread.html/4a4d23c449f26b66bc58c71cc1a5c6079c79b5049c6c6744224c5f46%40%3Cdev.flink.apache.org%3E
>> >>> [2]
>> >>>
>> >>>
>> >>
>> https://lists.apache.org/thread.html/8273a5e8834b788d8ae552a5e177b69e04e96c0446bb90979444deee%40%3Cprivate.flink.apache.org%3E
>> >>> [3]
>> >>>
>> >>>
>> >>
>> https://lists.apache.org/thread.html/r

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-04 Thread dianfu

Hi Jincheng,

Thanks for the proposal. I think this is a good idea. Usually Python users will 
firstly try to install a Python package from PyPI when they are trying it out. 
This would definitely benefit Python users.

Regards,
Dian

> 在 2020年2月5日，上午8:56，jincheng sun  写道：
> 
> I see, thanks for the confirmation Aljoscha!
> 
> Best,
> Jincheng
> 
> 
> 
> Aljoscha Krettek  于2020年2月4日周二 下午9:58写道：
> 
>> Yes, as I said I think it's still good to release.
>> 
>> On 04.02.20 13:49, jincheng sun wrote:
>>> Hi Aljoscha,
>>> 
>>> I agree that the coming PyFlink package of 1.10 is enough for users who
>>> just want to try out the latest version of PyFlink. However, I think that
>>> there are also cases where the old versions are useful. Usually users
>> tend
>>> to prepare their development environment according to the version of the
>>> cluster. Suppose that the Flink cluster version is 1.9, then users may
>> want
>>> to install the package of PyFlink 1.9. Even if he installs the 1.10
>> package
>>> from the PyPI, it will fail in cases that he tries to submit the Python
>> job
>>> to the 1.9 cluster using the python shell shipped with the 1.10 package.
>>> (The reason is that the JobGraph of 1.10 is incompatible with the
>> JobGraph
>>> of 1.9)
>>> 
>>> As the 1.9.2 is already released, considering that publishing it to PyPI
>>> requires not too much efforts(I have already prepared the package),
>>> personally I think it worths to do that.
>>> 
>>> What's your thought? :)
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> Aljoscha Krettek  于2020年2月4日周二 下午4:00写道：
>>> 
 Hi,
 
 I think that's a good idea, but we will also soon have Flink 1.10
>> anyways.
 
 Best,
 Aljoscha
 
 On 04.02.20 07:25, Hequn Cheng wrote:
> Hi Jincheng,
> 
> +1 for this proposal.
>  From the perspective of users, I think it would nice to have PyFlink
>> on
> PyPI which makes it much easier to install PyFlink.
> 
> Best, Hequn
> 
> On Tue, Feb 4, 2020 at 1:09 PM Jeff Zhang  wrote:
> 
>> +1
>> 
>> 
>> Xingbo Huang  于2020年2月4日周二 下午1:07写道：
>> 
>>> Hi Jincheng,
>>> 
>>> Thanks for driving this.
>>> +1 for this proposal.
>>> 
>>> Compared to building from source, downloading directly from PyPI will
>>> greatly save the development cost of Python users.
>>> 
>>> Best,
>>> Xingbo
>>> 
>>> 
>>> 
>>> Wei Zhong  于2020年2月4日周二 下午12:43写道：
>>> 
 Hi Jincheng,
 
 Thanks for bring up this discussion!
 
 +1 for this proposal. Building from source takes long time and
 requires
 a good network environment. Some users may not have such an
 environment.
 Uploading to PyPI will greatly improve the user experience.
 
 Best,
 Wei
 
 jincheng sun  于2020年2月4日周二 上午11:49写道：
 
> Hi folks,
> 
> I am very happy to receive some user inquiries about the use of
>> Flink
> Python API (PyFlink) recently. One of the more common questions is
> whether
> it is possible to install PyFlink without using source code build.
 The
> most
> convenient and natural way for users is to use `pip install
> apache-flink`.
> We originally planned to support the use of `pip install
 apache-flink`
> in
> Flink 1.10, but the reason for this decision was that when Flink
>> 1.9
 was
> released at August 22, 2019[1], Flink's PyPI account system was not
> ready.
> At present, our PyPI account is available at October 09, 2019
 [2](Only
> PMC
> can access), So for the convenience of users I propose:
> 
> - Publish the latest release version (1.9.2) of Flink 1.9 to PyPI.
> - Update Flink 1.9 documentation to add support for `pip install`.
> 
> As we all know, Flink 1.9.2 was just completed released at January
 31,
> 2020
> [3]. There is still at least 1 to 2 months before the release of
 1.9.3,
> so
> my proposal is completely considered from the perspective of user
> convenience. Although the proposed work is not large, we have not
 set a
> precedent for independent release of the Flink Python API(PyFlink)
>> in
> the
> previous release process. I hereby initiate the current discussion
 and
> look
> forward to your feedback!
> 
> Best,
> Jincheng
> 
> [1]
> 
> 
 
>> https://lists.apache.org/thread.html/4a4d23c449f26b66bc58c71cc1a5c6079c79b5049c6c6744224c5f46%40%3Cdev.flink.apache.org%3E
> [2]
> 
> 
 
>> https://lists.apache.org/thread.html/8273a5e8834b788d8ae552a5e177b69e04e96c0446bb90979444deee%40%3Cprivate.flink.apache.org%3E
> [3]
> 
> 
 
>> https://lists.apache

Re: [DISCUSS] Improve TableFactory

2020-02-04 Thread Jingsong Li

Hi Timo,

G ood catch!

I really love the idea 2, a full Flink config looks very good to me.

Try to understand your first one, actually we don't have `TableIdentifier`
class now. But TableFactory already indicate table. So I am OK.

New Context should be:

   /**
* Context of table source creation. Contains table information and
environment information.
*/
   interface Context {
  /**
   * @return full identifier of the given {@link CatalogTable}.
   */
  ObjectIdentifier getObjectIdentifier();
  /**
   * @return table {@link CatalogTable} instance.
   */
  CatalogTable getTable();
  /**
   * @return readable config of this table environment.
   */
  ReadableConfig getConfiguration();
   }


Best,
Jingsong Lee

On Tue, Feb 4, 2020 at 8:51 PM Timo Walther  wrote:

> Hi Jingsong,
>
> some last minute changes from my side:
>
> 1. rename `getTableIdentifier` to `getObjectIdentifier` to keep the API
> obvious. Otherwise people expect a `TableIdentifier` class being
> returned here.
>
> 2. rename `getTableConfig` to `getConfiguration()` in the future this
> will not only be a "table" config but might give access to the full
> Flink config
>
> Thanks,
> Timo
>
>
> On 04.02.20 06:27, Jingsong Li wrote:
> > So the interface will be:
> >
> > public interface TableSourceFactory extends TableFactory {
> > ..
> >
> > /**
> >  * Creates and configures a {@link TableSource} based on the given
> > {@link Context}.
> >  *
> >  * @param context context of this table source.
> >  * @return the configured table source.
> >  */
> > default TableSource createTableSource(Context context) {
> >ObjectIdentifier tableIdentifier = context.getTableIdentifier();
> >return createTableSource(
> >  new ObjectPath(tableIdentifier.getDatabaseName(),
> > tableIdentifier.getObjectName()),
> >  context.getTable());
> > }
> > /**
> >  * Context of table source creation. Contains table information and
> > environment information.
> >  */
> > interface Context {
> >/**
> > * @return full identifier of the given {@link CatalogTable}.
> > */
> >ObjectIdentifier getTableIdentifier();
> >/**
> > * @return table {@link CatalogTable} instance.
> > */
> >CatalogTable getTable();
> >/**
> > * @return readable config of this table environment.
> > */
> >ReadableConfig getTableConfig();
> > }
> > }
> >
> > public interface TableSinkFactory extends TableFactory {
> > ..
> > /**
> >  * Creates and configures a {@link TableSink} based on the given
> > {@link Context}.
> >  *
> >  * @param context context of this table sink.
> >  * @return the configured table sink.
> >  */
> > default TableSink createTableSink(Context context) {
> >ObjectIdentifier tableIdentifier = context.getTableIdentifier();
> >return createTableSink(
> >  new ObjectPath(tableIdentifier.getDatabaseName(),
> > tableIdentifier.getObjectName()),
> >  context.getTable());
> > }
> > /**
> >  * Context of table sink creation. Contains table information and
> > environment information.
> >  */
> > interface Context {
> >/**
> > * @return full identifier of the given {@link CatalogTable}.
> > */
> >ObjectIdentifier getTableIdentifier();
> >/**
> > * @return table {@link CatalogTable} instance.
> > */
> >CatalogTable getTable();
> >/**
> > * @return readable config of this table environment.
> > */
> >ReadableConfig getTableConfig();
> > }
> > }
> >
> >
> > Best,
> > Jingsong Lee
> >
> > On Tue, Feb 4, 2020 at 1:22 PM Jingsong Li 
> wrote:
> >
> >> Hi all,
> >>
> >> After rethinking and discussion with Kurt, I'd like to remove
> "isBounded".
> >> We can delay this is bounded message to TableSink.
> >> With TableSink refactor, we need consider "consumeDataStream"
> >> and "consumeBoundedStream".
> >>
> >> Best,
> >> Jingsong Lee
> >>
> >> On Mon, Feb 3, 2020 at 4:17 PM Jingsong Li 
> wrote:
> >>
> >>> Hi Jark,
> >>>
> >>> Thanks involving, yes, it's hard to understand to add isBounded on the
> >>> source.
> >>> I recommend adding only to sink at present, because sink has upstream.
> >>> Its upstream is either bounded or unbounded.
> >>>
> >>> Hi all,
> >>>
> >>> Let me summarize with your suggestions.
> >>>
> >>> public interface TableSourceFactory extends TableFactory {
> >>>
> >>> ..
> >>>
> >>>
> >>> /**
> >>>  * Creates and configures a {@link TableSource} based on the given
> {@link Context}.
> >>>  *
> >>>  * @param context context of this table source.
> >>>  * @return the configured table source.
> >>>  */
> >>> default TableSource createTableSource(Context context) {
> >>>ObjectIdentifier tableIdentifier

Re: [VOTE] Improve TableFactory to add Context

2020-02-04 Thread Jingsong Li

Hi all,

Interface updated.
Please re-vote.

Best,
Jingsong Lee

On Tue, Feb 4, 2020 at 1:28 PM Jingsong Li  wrote:

> Hi all,
>
> I would like to start the vote for the improve of
> TableFactory, which is discussed and
> reached a consensus in the discussion thread[2].
>
> The vote will be open for at least 72 hours. I'll try to close it
> unless there is an objection or not enough votes.
>
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-TableFactory-td36647.html
>
> Best,
> Jingsong Lee
>


-- 
Best, Jingsong Lee

[jira] [Created] (FLINK-15906) physical memory exceeded causing being killed by yarn

2020-02-04 Thread liupengcheng (Jira)

liupengcheng created FLINK-15906:


 Summary: physical memory exceeded causing being killed by yarn
 Key: FLINK-15906
 URL: https://issues.apache.org/jira/browse/FLINK-15906
 Project: Flink
  Issue Type: Bug
Reporter: liupengcheng


Recently, we encoutered this issue when testing TPCDS query with 100g data. 

I first meet this issue when I only set the 
`taskmanager.memory.total-process.size` to `4g` with `-tm` option. Then I try 
to increase the jvmOverhead size with following arguments, but still failed.
{code:java}
taskmanager.memory.jvm-overhead.min: 640m
taskmanager.memory.jvm-metaspace: 128m
taskmanager.memory.task.heap.size: 1408m
taskmanager.memory.framework.heap.size: 128m
taskmanager.memory.framework.off-heap.size: 128m
taskmanager.memory.managed.size: 1408m
taskmanager.memory.shuffle.max: 256m
{code}
{code:java}
java.lang.Exception: [2020-02-05 11:31:32.345]Container 
[pid=101677,containerID=container_e08_1578903621081_4785_01_51] is running 
46342144B beyond the 'PHYSICAL' memory limit. Current usage: 4.04 GB of 4 GB 
physical memory used; 17.68 GB of 40 GB virtual memory used. Killing 
container.java.lang.Exception: [2020-02-05 11:31:32.345]Container 
[pid=101677,containerID=container_e08_1578903621081_4785_01_51] is running 
46342144B beyond the 'PHYSICAL' memory limit. Current usage: 4.04 GB of 4 GB 
physical memory used; 17.68 GB of 40 GB virtual memory used. Killing 
container.Dump of the process-tree for 
container_e08_1578903621081_4785_01_51 : |- PID PPID PGRPID SESSID CMD_NAME 
USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) 
RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 101938 101677 101677 101677 (java) 25762 
3571 18867417088 1059157 /opt/soft/openjdk1.8.0/bin/java 
-Dhadoop.root.logfile=syslog -Xmx1610612736 -Xms1610612736 
-XX:MaxDirectMemorySize=402653184 -XX:MaxMetaspaceSize=134217728 
-Dlog.file=/home/work/hdd5/yarn/zjyprc-analysis/nodemanager/application_1578903621081_4785/container_e08_1578903621081_4785_01_51/taskmanager.log
 -Dlog4j.configuration=file:./log4j.properties 
org.apache.flink.yarn.YarnTaskExecutorRunner -D 
taskmanager.memory.shuffle.max=268435456b -D 
taskmanager.memory.framework.off-heap.size=134217728b -D 
taskmanager.memory.framework.heap.size=134217728b -D 
taskmanager.memory.managed.size=1476395008b -D taskmanager.cpu.cores=1.0 -D 
taskmanager.memory.task.heap.size=1476395008b -D 
taskmanager.memory.task.off-heap.size=0b -D 
taskmanager.memory.shuffle.min=268435456b --configDir . 
-Djobmanager.rpc.address=zjy-hadoop-prc-st2805.bj -Dweb.port=0 
-Dweb.tmpdir=/tmp/flink-web-4bf6cd3a-a6e1-4b46-b140-b8ac7bdffbeb 
-Djobmanager.rpc.port=36769 -Dtaskmanager.memory.managed.size=1476395008b 
-Drest.address=zjy-hadoop-prc-st2805.bj |- 101677 101671 101677 101677 (bash) 1 
1 118030336 733 /bin/bash -c /opt/soft/openjdk1.8.0/bin/java 
-Dhadoop.root.logfile=syslog -Xmx1610612736 -Xms1610612736 
-XX:MaxDirectMemorySize=402653184 -XX:MaxMetaspaceSize=134217728 
-Dlog.file=/home/work/hdd5/yarn/zjyprc-analysis/nodemanager/application_1578903621081_4785/container_e08_1578903621081_4785_01_51/taskmanager.log
 -Dlog4j.configuration=file:./log4j.properties 
org.apache.flink.yarn.YarnTaskExecutorRunner -D 
taskmanager.memory.shuffle.max=268435456b -D 
taskmanager.memory.framework.off-heap.size=134217728b -D 
taskmanager.memory.framework.heap.size=134217728b -D 
taskmanager.memory.managed.size=1476395008b -D taskmanager.cpu.cores=1.0 -D 
taskmanager.memory.task.heap.size=1476395008b -D 
taskmanager.memory.task.off-heap.size=0b -D 
taskmanager.memory.shuffle.min=268435456b --configDir . 
-Djobmanager.rpc.address=zjy-hadoop-prc-st2805.bj -Dweb.port=0 
-Dweb.tmpdir=/tmp/flink-web-4bf6cd3a-a6e1-4b46-b140-b8ac7bdffbeb 
-Djobmanager.rpc.port=36769 -Dtaskmanager.memory.managed.size=1476395008b 
-Drest.address=zjy-hadoop-prc-st2805.bj 1> 
/home/work/hdd5/yarn/zjyprc-analysis/nodemanager/application_1578903621081_4785/container_e08_1578903621081_4785_01_51/taskmanager.out
 2> 
/home/work/hdd5/yarn/zjyprc-analysis/nodemanager/application_1578903621081_4785/container_e08_1578903621081_4785_01_51/taskmanager.err
{code}
I suspect there are some leaks or unexpected offheap memory usage.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[DISCUSS] Support scalar vectorized Python UDF in PyFlink

2020-02-04 Thread dianfu

Hi all,

Scalar Python UDF has already been supported in the coming release 1.10
(FLIP-58[1]). It operates one row at a time. It works in the way that the Java
operator serializes one input row to bytes and sends them to the Python worker;
the Python worker deserializes the input row and evaluates the Python UDF with
it; the result row is serialized and sent back to the Java operator.

It suffers from the following problems:
1) High serialization/deserialization overhead
2) It’s difficult to leverage the popular Python libraries used by data
scientists, such as Pandas, Numpy, etc which provide high performance data
structure and functions.

Jincheng and I have discussed offline and we want to introduce vectorized
Python UDF to address the above problems. This feature has also been mentioned
in the discussion thread about the Python API plan[2]. For vectorized Python
UDF, a batch of rows are transferred between JVM and Python VM in columnar
format. The batch of rows will be converted to a collection of Pandas.Series
and given to the vectorized Python UDF which could then leverage the popular
Python libraries such as Pandas, Numpy, etc for the Python UDF implementation.

Please refer the design doc[3] for more details and welcome any feedback.

Regards,
Dian

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-What-parts-of-the-Python-API-should-we-focus-on-next-tt36119.html
[3]
https://docs.google.com/document/d/1ls0mt96YV1LWPHfLOh8v7lpgh1KsCNx8B9RrW1ilnoE/edit#heading=h.qivada1u8hwd

[jira] [Created] (FLINK-15907) expose private method in Configuration so that user can extend from it

2020-02-04 Thread Steven Zhen Wu (Jira)

Steven Zhen Wu created FLINK-15907:
--

 Summary: expose private method in Configuration so that user can 
extend from it
 Key: FLINK-15907
 URL: https://issues.apache.org/jira/browse/FLINK-15907
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Configuration
Affects Versions: 1.9.2
Reporter: Steven Zhen Wu


We use Archaius for configuration internally: 
https://github.com/Netflix/archaius

It will be nice to expose this methods as *_protected_* so that we can override 
and forward to Archaius.

{code}
private Optional getRawValue(String key) 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Improve TableFactory to add Context

2020-02-04 Thread Kurt Young

+1 from my side.

Best,
Kurt


On Wed, Feb 5, 2020 at 10:59 AM Jingsong Li  wrote:

> Hi all,
>
> Interface updated.
> Please re-vote.
>
> Best,
> Jingsong Lee
>
> On Tue, Feb 4, 2020 at 1:28 PM Jingsong Li  wrote:
>
> > Hi all,
> >
> > I would like to start the vote for the improve of
> > TableFactory, which is discussed and
> > reached a consensus in the discussion thread[2].
> >
> > The vote will be open for at least 72 hours. I'll try to close it
> > unless there is an objection or not enough votes.
> >
> > [1]
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-TableFactory-td36647.html
> >
> > Best,
> > Jingsong Lee
> >
>
>
> --
> Best, Jingsong Lee
>

Re: [VOTE] Release 1.10.0, release candidate #1

2020-02-04 Thread Thomas Weise

Hi Gary,

Thanks for the reply.

-->

On Tue, Feb 4, 2020 at 5:20 AM Gary Yao  wrote:

> Hi Thomas,
>
> > 2) Was there a change in how job recovery reflects in the uptime metric?
> > Didn't uptime previously reset to 0 on recovery (now it just keeps
> > increasing)?
>
> The uptime is the difference between the current time and the time when the
> job transitioned to RUNNING state. By default we no longer transition the
> job
> out of the RUNNING state when restarting. This has something to do with the
> new scheduler which enables pipelined region failover by default [1].
> Actually
> we enabled pipelined region failover already in the binary distribution of
> Flink 1.9 by setting:
>
> jobmanager.execution.failover-strategy: region
>
> in the default flink-conf.yaml. Unless you have removed this config option
> or
> you are using a custom yaml, you should be seeing this behavior in Flink
> 1.9.
> If you do not want region failover, set
>
> jobmanager.execution.failover-strategy: full
>
>
We are using the default (the jobmanager.execution.failover-strategy
setting is not present in our flink config).

The change in behavior I see is between the 1.9 based deployment and the
1.10 RC.

Our 1.9 branch is here: https://github.com/lyft/flink/tree/release-1.9-lyft

I also notice that the exception causing a restart is no longer displayed
in the UI, which is probably related?


>
> > 1) Is the low watermark display in the UI still broken?
>
> I was not aware that this is broken. Is there an issue tracking this bug?
>

The watermark issue was https://issues.apache.org/jira/browse/FLINK-14470

(I don't have a good way to verify it is fixed at the moment.)

Another problem with this 1.10 RC is that the checkpointAlignmentTime
metric is missing. (I have not been able to investigate this further yet.)


>
> Best,
> Gary
>
> [1] https://issues.apache.org/jira/browse/FLINK-14651
>
> On Tue, Feb 4, 2020 at 2:56 AM Thomas Weise  wrote:
>
>> I opened a PR for FLINK-15868
>> :
>> https://github.com/apache/flink/pull/11006
>>
>> With that change, I was able to run an application that consumes from
>> Kinesis.
>>
>> I should have data tomorrow regarding the performance.
>>
>> Two questions/observations:
>>
>> 1) Is the low watermark display in the UI still broken?
>> 2) Was there a change in how job recovery reflects in the uptime metric?
>> Didn't uptime previously reset to 0 on recovery (now it just keeps
>> increasing)?
>>
>> Thanks,
>> Thomas
>>
>>
>>
>>
>> On Mon, Feb 3, 2020 at 10:55 AM Thomas Weise  wrote:
>>
>> > I found another issue with the Kinesis connector:
>> >
>> > https://issues.apache.org/jira/browse/FLINK-15868
>> >
>> >
>> > On Mon, Feb 3, 2020 at 3:35 AM Gary Yao  wrote:
>> >
>> >> Hi everyone,
>> >>
>> >> I am hereby canceling the vote due to:
>> >>
>> >> FLINK-15837
>> >> FLINK-15840
>> >>
>> >> Another RC will be created later today.
>> >>
>> >> Best,
>> >> Gary
>> >>
>> >> On Mon, Jan 27, 2020 at 10:06 PM Gary Yao  wrote:
>> >>
>> >> > Hi everyone,
>> >> > Please review and vote on the release candidate #1 for the version
>> >> 1.10.0,
>> >> > as follows:
>> >> > [ ] +1, Approve the release
>> >> > [ ] -1, Do not approve the release (please provide specific comments)
>> >> >
>> >> >
>> >> > The complete staging area is available for your review, which
>> includes:
>> >> > * JIRA release notes [1],
>> >> > * the official Apache source release and binary convenience releases
>> to
>> >> be
>> >> > deployed to dist.apache.org [2], which are signed with the key with
>> >> > fingerprint BB137807CEFBE7DD2616556710B12A1F89C115E8 [3],
>> >> > * all artifacts to be deployed to the Maven Central Repository [4],
>> >> > * source code tag "release-1.10.0-rc1" [5],
>> >> >
>> >> > The announcement blog post is in the works. I will update this voting
>> >> > thread with a link to the pull request soon.
>> >> >
>> >> > The vote will be open for at least 72 hours. It is adopted by
>> majority
>> >> > approval, with at least 3 PMC affirmative votes.
>> >> >
>> >> > Thanks,
>> >> > Yu & Gary
>> >> >
>> >> > [1]
>> >> >
>> >>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345845
>> >> > [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.10.0-rc1/
>> >> > [3] https://dist.apache.org/repos/dist/release/flink/KEYS
>> >> > [4]
>> >> https://repository.apache.org/content/repositories/orgapacheflink-1325
>> >> > [5] https://github.com/apache/flink/releases/tag/release-1.10.0-rc1
>> >> >
>> >>
>> >
>>
>

[jira] [Created] (FLINK-15908) Add Description of support 'pip install' to 1.9.x documents

2020-02-04 Thread sunjincheng (Jira)

sunjincheng created FLINK-15908:
---

 Summary: Add Description of support 'pip install' to 1.9.x 
documents
 Key: FLINK-15908
 URL: https://issues.apache.org/jira/browse/FLINK-15908
 Project: Flink
  Issue Type: Bug
  Components: Documentation
Reporter: sunjincheng
 Fix For: 1.9.3


Add Description of support 'pip install' to 1.9.x documents.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-15909) Add PyPI release process into the subsequent release of 1.9.x

2020-02-04 Thread sunjincheng (Jira)

sunjincheng created FLINK-15909:
---

 Summary: Add PyPI release process into the subsequent release of 
1.9.x 
 Key: FLINK-15909
 URL: https://issues.apache.org/jira/browse/FLINK-15909
 Project: Flink
  Issue Type: Bug
  Components: Build System
Reporter: sunjincheng
 Fix For: 1.9.3


Add PyPI release process into the subsequent release of 1.9.x. i.e., improve 
the script of `create-binary-release. sh`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

2020-02-04 Thread Kurt Young

Hi all,

I'd like to bring up a discussion about removing registration of
TableSource and
TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
affected
method would be:

TableEnvironment::registerTableSource
TableEnvironment::fromTableSource
TableEnvironment::registerTableSink
ConnectTableDescriptor::registerTableSource
ConnectTableDescriptor::registerTableSink
ConnectTableDescriptor::registerTableSourceAndSink

(Most of them are already deprecated, except for
TableEnvironment::fromTableSource,
which was intended to deprecate but missed by accident).

FLIP-64 [1] already explained why we want to deprecate TableSource &
TableSink from
user's interface. In a short word, these interfaces should only read &
write the physical
representation of the table, and they are not fitting well after we already
introduced some
logical table fields such as computed column, watermarks.

Another reason is the exposure of registerTableSource in Table Env just
make the whole
SQL protocol opposite. TableSource should be used as a reader of table, it
should rely on
other metadata information held by framework, which eventually comes from
DDL or
ConnectDescriptor. But if we register a TableSource to Table Env, we have
no choice but
have to rely on TableSource::getTableSchema. It will make the design
obscure, sometimes
TableSource should trust the information comes from framework, but
sometimes it should
also generate its own schema information.

Furthermore, if the authority about schema information is not clear, it
will make things much
more complicated if we want to improve the table api usability such as
introducing automatic
schema inference in the near future.

Since this is an API break change, I've also included user mailing list to
gather more feedbacks.

Best,
Kurt

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module

[VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #0

2020-02-04 Thread jincheng sun

Hi everyone,

Please review and vote on the release candidate #0 for the PyFlink version
1.9.2, as follows:

[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)

The complete staging area is available for your review, which includes:
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [1], which are signed with the key with
fingerprint 8FEA1EE9D0048C0CCC70B7573211B0703B79EA0E [2],
* source code tag "release-1.9.2" [3],
* create JIRA. for add description of support 'pip install' to 1.9.x
documents[4]
* create JIRA. for add PyPI release process for subsequent version release
of 1.9.x . i.e. improve the script of `create-binary-release. sh`.[5]

The vote will be open for at least 72 hours. It is adopted by majority
approval, with at least 3 PMC affirmative votes.

Thanks,
Jincheng

[1] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.2-rc0/
[2] https://dist.apache.org/repos/dist/release/flink/KEYS
[3] https://github.com/apache/flink/tree/release-1.9.2
[4] https://issues.apache.org/jira/browse/FLINK-15908
[5] https://issues.apache.org/jira/browse/FLINK-15909

Re: [DISCUSS] Upload the Flink Python API 1.9.x to PyPI for user convenience.

2020-02-04 Thread jincheng sun

Hi all,

Thanks Dian and all of your feedback! I have bring up the VOTE thread for
the first RC [1].

I would appreciate if you could check it.

Best,
Jincheng

[1]
https://lists.apache.org/thread.html/r4b43a94db6731de4d450cee828ac2effd5a47bb8a8374e31af79b08a%40%3Cdev.flink.apache.org%3E


dianfu  于2020年2月5日周三 上午9:00写道：

> Hi Jincheng,
>
> Thanks for the proposal. I think this is a good idea. Usually Python users
> will firstly try to install a Python package from PyPI when they are trying
> it out. This would definitely benefit Python users.
>
> Regards,
> Dian
>
> > 在 2020年2月5日，上午8:56，jincheng sun  写道：
> >
> > I see, thanks for the confirmation Aljoscha!
> >
> > Best,
> > Jincheng
> >
> >
> >
> > Aljoscha Krettek  于2020年2月4日周二 下午9:58写道：
> >
> >> Yes, as I said I think it's still good to release.
> >>
> >> On 04.02.20 13:49, jincheng sun wrote:
> >>> Hi Aljoscha,
> >>>
> >>> I agree that the coming PyFlink package of 1.10 is enough for users who
> >>> just want to try out the latest version of PyFlink. However, I think
> that
> >>> there are also cases where the old versions are useful. Usually users
> >> tend
> >>> to prepare their development environment according to the version of
> the
> >>> cluster. Suppose that the Flink cluster version is 1.9, then users may
> >> want
> >>> to install the package of PyFlink 1.9. Even if he installs the 1.10
> >> package
> >>> from the PyPI, it will fail in cases that he tries to submit the Python
> >> job
> >>> to the 1.9 cluster using the python shell shipped with the 1.10
> package.
> >>> (The reason is that the JobGraph of 1.10 is incompatible with the
> >> JobGraph
> >>> of 1.9)
> >>>
> >>> As the 1.9.2 is already released, considering that publishing it to
> PyPI
> >>> requires not too much efforts(I have already prepared the package),
> >>> personally I think it worths to do that.
> >>>
> >>> What's your thought? :)
> >>>
> >>> Best,
> >>> Jincheng
> >>>
> >>> Aljoscha Krettek  于2020年2月4日周二 下午4:00写道：
> >>>
>  Hi,
> 
>  I think that's a good idea, but we will also soon have Flink 1.10
> >> anyways.
> 
>  Best,
>  Aljoscha
> 
>  On 04.02.20 07:25, Hequn Cheng wrote:
> > Hi Jincheng,
> >
> > +1 for this proposal.
> >  From the perspective of users, I think it would nice to have PyFlink
> >> on
> > PyPI which makes it much easier to install PyFlink.
> >
> > Best, Hequn
> >
> > On Tue, Feb 4, 2020 at 1:09 PM Jeff Zhang  wrote:
> >
> >> +1
> >>
> >>
> >> Xingbo Huang  于2020年2月4日周二 下午1:07写道：
> >>
> >>> Hi Jincheng,
> >>>
> >>> Thanks for driving this.
> >>> +1 for this proposal.
> >>>
> >>> Compared to building from source, downloading directly from PyPI
> will
> >>> greatly save the development cost of Python users.
> >>>
> >>> Best,
> >>> Xingbo
> >>>
> >>>
> >>>
> >>> Wei Zhong  于2020年2月4日周二 下午12:43写道：
> >>>
>  Hi Jincheng,
> 
>  Thanks for bring up this discussion!
> 
>  +1 for this proposal. Building from source takes long time and
>  requires
>  a good network environment. Some users may not have such an
>  environment.
>  Uploading to PyPI will greatly improve the user experience.
> 
>  Best,
>  Wei
> 
>  jincheng sun  于2020年2月4日周二 上午11:49写道：
> 
> > Hi folks,
> >
> > I am very happy to receive some user inquiries about the use of
> >> Flink
> > Python API (PyFlink) recently. One of the more common questions
> is
> > whether
> > it is possible to install PyFlink without using source code
> build.
>  The
> > most
> > convenient and natural way for users is to use `pip install
> > apache-flink`.
> > We originally planned to support the use of `pip install
>  apache-flink`
> > in
> > Flink 1.10, but the reason for this decision was that when Flink
> >> 1.9
>  was
> > released at August 22, 2019[1], Flink's PyPI account system was
> not
> > ready.
> > At present, our PyPI account is available at October 09, 2019
>  [2](Only
> > PMC
> > can access), So for the convenience of users I propose:
> >
> > - Publish the latest release version (1.9.2) of Flink 1.9 to
> PyPI.
> > - Update Flink 1.9 documentation to add support for `pip
> install`.
> >
> > As we all know, Flink 1.9.2 was just completed released at
> January
>  31,
> > 2020
> > [3]. There is still at least 1 to 2 months before the release of
>  1.9.3,
> > so
> > my proposal is completely considered from the perspective of user
> > convenience. Although the proposed work is not large, we have not
>  set a
> > precedent for independent release of the Flink Python
> API(PyFlink)
> >> in
> > the
> >

[jira] [Created] (FLINK-15910) Add NOTICE files for Stateful Functions

2020-02-04 Thread Tzu-Li (Gordon) Tai (Jira)

Tzu-Li (Gordon) Tai created FLINK-15910:
---

 Summary: Add NOTICE files for Stateful Functions
 Key: FLINK-15910
 URL: https://issues.apache.org/jira/browse/FLINK-15910
 Project: Flink
  Issue Type: Sub-task
  Components: Build System / Stateful Functions, Stateful Functions
Affects Versions: statefun-1.1
Reporter: Tzu-Li (Gordon) Tai
Assignee: Tzu-Li (Gordon) Tai


The Stateful Functions repo is still lacking NOTICE files.

As of now, we need the following:

*For canonical source distribution*:
A NOTICE file should be located at the root directory

*For Maven artifacts*
The only artifact that currently bundles extra dependencies is 
{{statefun-flink-distribution}}. It should explicitly contain a NOTICE file 
under the {{META-INF}} folder.

All other Maven artifacts, as far as I can tell, do not bundle extra 
dependencies, therefore the NOTICE pulled in by the Apache Parent POM is 
sufficient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] Improve TableFactory to add Context

2020-02-04 Thread Jark Wu

+1 form my side.
Thanks for driving this.

Btw, could you also attach a JIRA issue with the changes described in it,
so that users can find the issue through the mailing list in the future.

Best,
Jark

On Wed, 5 Feb 2020 at 13:38, Kurt Young  wrote:

> +1 from my side.
>
> Best,
> Kurt
>
>
> On Wed, Feb 5, 2020 at 10:59 AM Jingsong Li 
> wrote:
>
> > Hi all,
> >
> > Interface updated.
> > Please re-vote.
> >
> > Best,
> > Jingsong Lee
> >
> > On Tue, Feb 4, 2020 at 1:28 PM Jingsong Li 
> wrote:
> >
> > > Hi all,
> > >
> > > I would like to start the vote for the improve of
> > > TableFactory, which is discussed and
> > > reached a consensus in the discussion thread[2].
> > >
> > > The vote will be open for at least 72 hours. I'll try to close it
> > > unless there is an objection or not enough votes.
> > >
> > > [1]
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-TableFactory-td36647.html
> > >
> > > Best,
> > > Jingsong Lee
> > >
> >
> >
> > --
> > Best, Jingsong Lee
> >
>

[jira] [Created] (FLINK-15911) Flink does not work over NAT

2020-02-04 Thread Till Rohrmann (Jira)

Till Rohrmann created FLINK-15911:
-

 Summary: Flink does not work over NAT
 Key: FLINK-15911
 URL: https://issues.apache.org/jira/browse/FLINK-15911
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.10.0
Reporter: Till Rohrmann
 Fix For: 1.11.0


Currently, it is not possible to run Flink over network address translation. 
The problem is that the Flink processes do not allow to specify separate bind 
and external ports. Moreover, the {{TaskManager}} tries to resolve the given 
{{taskmanager.host}} which might not be resolvable. This breaks NAT or docker 
setups where the external address is not resolvable from within the 
container/internal network.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Support User-Defined Table Function in PyFlink

2020-02-04 Thread Xingbo Huang

Hi Jincheng,

Thanks for your feed back. The more details we can discussed in the JIRA
and PR. :)

Best,
Xingbo

jincheng sun  于2020年2月4日周二 下午9:09写道：

> Thanks for bring up this discussion Xingbo!
>
> The the design is pretty nice for me! This feature is really need which
> mentioned in FLIP-58. So, I think is better to create the JIRA and open the
> PR, then more detail can be reviewed. :)
>
> Best,
> Jincheng
>
>
>
> Xingbo Huang  于2020年2月3日周一 下午3:02写道：
>
> > Hi all,
> >
> > The scalar Python UDF has already been supported in coming release of
> 1.10,
> > we’d like to introduce Python UDTF now. FLIP-58[1] has already introduced
> > some content about Python UDTF. However, the implementation details are
> > still not touched. I have drafted a design doc[2]. It includes the
> > following items:
> >
> > - How to define Python UDTF.
> >
> > - The introduced rules for Python UDTF.
> >
> > - How to execute Python UDTF.
> >
> > Because the implementation relies on Beam's portability framework for
> > Python user-defined table function execution and not all the contributors
> > are familiar with it, I have done a prototype[3].
> >
> > Welcome any feedback.
> >
> > Best,
> >
> > Xingbo
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> >
> > [2]
> >
> >
> https://docs.google.com/document/d/1Pkv5S0geoYQ2ySS5YTTBivJ3hoi-uzLXVQkDVIaR0cE/edit#heading=h.pzeztvig3kg1
> > [3] https://github.com/HuangXingBo/flink/commits/FLINK-UDTF
> >
>

Re: [DISCUSS] Support User-Defined Table Function in PyFlink

2020-02-04 Thread Xingbo Huang

Hi Hequn,

Thanks for your feedback. Good suggestion. I will avoid Scala code in the
flink-table module.

Best,
Xingbo

Hequn Cheng  于2020年2月4日周二 下午10:14写道：

> Hi Xingbo,
>
> Thanks a lot for bringing up the discussion. Looks good from my side.
> One suggestion beyond the document: it would be nice to avoid Scala code in
> the flink-table module since we would like to get rid of Scala in the
> long-term[1][2].
>
> Best, Hequn
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free
> [2]
> https://flink.apache.org/contributing/code-style-and-quality-scala.html
>
>
> On Tue, Feb 4, 2020 at 9:09 PM jincheng sun 
> wrote:
>
> > Thanks for bring up this discussion Xingbo!
> >
> > The the design is pretty nice for me! This feature is really need which
> > mentioned in FLIP-58. So, I think is better to create the JIRA and open
> the
> > PR, then more detail can be reviewed. :)
> >
> > Best,
> > Jincheng
> >
> >
> >
> > Xingbo Huang  于2020年2月3日周一 下午3:02写道：
> >
> > > Hi all,
> > >
> > > The scalar Python UDF has already been supported in coming release of
> > 1.10,
> > > we’d like to introduce Python UDTF now. FLIP-58[1] has already
> introduced
> > > some content about Python UDTF. However, the implementation details are
> > > still not touched. I have drafted a design doc[2]. It includes the
> > > following items:
> > >
> > > - How to define Python UDTF.
> > >
> > > - The introduced rules for Python UDTF.
> > >
> > > - How to execute Python UDTF.
> > >
> > > Because the implementation relies on Beam's portability framework for
> > > Python user-defined table function execution and not all the
> contributors
> > > are familiar with it, I have done a prototype[3].
> > >
> > > Welcome any feedback.
> > >
> > > Best,
> > >
> > > Xingbo
> > >
> > > [1]
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Stateless+Function+for+Table
> > >
> > > [2]
> > >
> > >
> >
> https://docs.google.com/document/d/1Pkv5S0geoYQ2ySS5YTTBivJ3hoi-uzLXVQkDVIaR0cE/edit#heading=h.pzeztvig3kg1
> > > [3] https://github.com/HuangXingBo/flink/commits/FLINK-UDTF
> > >
> >
>

Re: [VOTE] Release Flink Python API(PyFlink) 1.9.2 to PyPI, release candidate #0

2020-02-04 Thread Wei Zhong

Hi,

Thanks for driving this, Jincheng.

+1 (non-binding) 

- Verified signatures and checksums.
- `pip install apache-flink-1.9.2.tar.gz` successfully.
- Start local pyflink shell via `pyflink-shell.sh local` and try the examples 
in the help message, run well and no exception.
- Try a word count example in IDE, run well and no exception.

In addition I'm willing to take these JIRAs. Could you assign them to me? :)

Best,
Wei


> 在 2020年2月5日，14:49，jincheng sun  写道：
> 
> Hi everyone,
> 
> Please review and vote on the release candidate #0 for the PyFlink version
> 1.9.2, as follows:
> 
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> The complete staging area is available for your review, which includes:
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [1], which are signed with the key with
> fingerprint 8FEA1EE9D0048C0CCC70B7573211B0703B79EA0E [2],
> * source code tag "release-1.9.2" [3],
> * create JIRA. for add description of support 'pip install' to 1.9.x
> documents[4]
> * create JIRA. for add PyPI release process for subsequent version release
> of 1.9.x . i.e. improve the script of `create-binary-release. sh`.[5]
> 
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
> 
> Thanks,
> Jincheng
> 
> [1] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.2-rc0/
> [2] https://dist.apache.org/repos/dist/release/flink/KEYS
> [3] https://github.com/apache/flink/tree/release-1.9.2
> [4] https://issues.apache.org/jira/browse/FLINK-15908
> [5] https://issues.apache.org/jira/browse/FLINK-15909

[jira] [Created] (FLINK-15912) Add Context to improve TableSourceFactory and TableSinkFactory

2020-02-04 Thread Jingsong Lee (Jira)

Jingsong Lee created FLINK-15912:


 Summary: Add Context to improve TableSourceFactory and 
TableSinkFactory
 Key: FLINK-15912
 URL: https://issues.apache.org/jira/browse/FLINK-15912
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / API
Reporter: Jingsong Lee
 Fix For: 1.11.0


Discussion in: 
[http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Improve-TableFactory-td36647.html]

Vote in: 
[http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Improve-TableFactory-to-add-Context-td37211.html]

Motivation:
Now the main needs and problems are:
 * Connector can't get TableConfig[1], and some behaviors really need to be
controlled by the user's table configuration. In the era of catalog, we
can't put these config in connector properties, which is too inconvenient.
 * A context class also allows for future modifications without touching the 
TableFactory interface again.

Interface:
{code:java}
  public interface TableSourceFactory extends TableFactory {
   ..

   /**
    * Creates and configures a {@link TableSource} based on the given
{@link Context}.
    *
    * @param context context of this table source.
    * @return the configured table source.
    */
   default TableSource createTableSource(Context context) {
      return createTableSource(
            context.getObjectIdentifier().toObjectPath(),
            context.getTable());
   }
   /**
    * Context of table source creation. Contains table information and
environment information.
    */
   interface Context {
      /**
       * @return full identifier of the given {@link CatalogTable}.
       */
      ObjectIdentifier getObjectIdentifier();
      /**
       * @return table {@link CatalogTable} instance.
       */
      CatalogTable getTable();
      /**
       * @return readable config of this table environment.
       */
      ReadableConfig getConfiguration();
   }
}

public interface TableSinkFactory extends TableFactory {
   ..
   /**
    * Creates and configures a {@link TableSink} based on the given
{@link Context}.
    *
    * @param context context of this table sink.
    * @return the configured table sink.
    */
   default TableSink createTableSink(Context context) {
      return createTableSink(
            context.getObjectIdentifier().toObjectPath(),
            context.getTable());
   }
   /**
    * Context of table sink creation. Contains table information and
environment information.
    */
   interface Context {
      /**
       * @return full identifier of the given {@link CatalogTable}.
       */
      ObjectIdentifier getObjectIdentifier();
      /**
       * @return table {@link CatalogTable} instance.
       */
      CatalogTable getTable();
      /**
       * @return readable config of this table environment.
       */
      ReadableConfig getConfiguration();
   }
}
{code}
Add inner class into TableSourceFactory and TableSinkFactory, the reason they 
are defined repeatedly is that source and sink may need different properties in 
the future.

 

[1] https://issues.apache.org/jira/browse/FLINK-15290



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] Remove registration of TableSource/TableSink in Table Env and ConnectTableDescriptor

2020-02-04 Thread Dawid Wysakowicz

Hi Kurt,

I fully agree with the proposal. Yes it was an omission that we did not
deprecate the TableEnvironment#fromTableSource in the previous version.

I would vote to remove all those methods altogether.

Best,

Dawid

On 05/02/2020 07:36, Kurt Young wrote:
> Hi all,
>
> I'd like to bring up a discussion about removing registration of
> TableSource and
> TableSink in TableEnvironment as well as in ConnectTableDescriptor. The
> affected
> method would be:
>
> TableEnvironment::registerTableSource
> TableEnvironment::fromTableSource
> TableEnvironment::registerTableSink
> ConnectTableDescriptor::registerTableSource
> ConnectTableDescriptor::registerTableSink
> ConnectTableDescriptor::registerTableSourceAndSink
>
> (Most of them are already deprecated, except for
> TableEnvironment::fromTableSource,
> which was intended to deprecate but missed by accident).
>
> FLIP-64 [1] already explained why we want to deprecate TableSource &
> TableSink from
> user's interface. In a short word, these interfaces should only read &
> write the physical
> representation of the table, and they are not fitting well after we already
> introduced some
> logical table fields such as computed column, watermarks.
>
> Another reason is the exposure of registerTableSource in Table Env just
> make the whole
> SQL protocol opposite. TableSource should be used as a reader of table, it
> should rely on
> other metadata information held by framework, which eventually comes from
> DDL or
> ConnectDescriptor. But if we register a TableSource to Table Env, we have
> no choice but
> have to rely on TableSource::getTableSchema. It will make the design
> obscure, sometimes
> TableSource should trust the information comes from framework, but
> sometimes it should
> also generate its own schema information.
>
> Furthermore, if the authority about schema information is not clear, it
> will make things much
> more complicated if we want to improve the table api usability such as
> introducing automatic
> schema inference in the near future.
>
> Since this is an API break change, I've also included user mailing list to
> gather more feedbacks.
>
> Best,
> Kurt
>
> [1]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>



signature.asc
Description: OpenPGP digital signature

44 matches

Mail list logo