from:"Jark"

Re: How to specify dependencies for an application that needs to use modified version of Flink

2016-05-12 Thread Jark

Hi Saiph, 
You can enter flink directory and run  `mvn clean install -DskipTest=true` 
to install all the modules (including flunk-streaming-java) into your local .m2 
repository .  After that, change your app dependencies version to the version 
of your flink, such as “1.1-SNAPSHOT”. At last, reimport your app project.
  
- Jark Wu

> 在 2016年5月12日，上午2:33，Saiph Kappa  写道：
> 
> Hi,
> 
> I'm performing some modifications on Flink (current trunk version). I want
> a scala app (sbt based) to use that modified version. I'm only modifying
> the flink-streaming-java module, what is the typical way to specify the
> dependencies for my application in this case? Should I copy all jars to the
> lib folder of my app, or to build a big fat jar? how do the devs here do it?
> 
> Thanks.

Re: How to specify dependencies for an application that needs to use modified version of Flink

2016-05-12 Thread Jark

Sorry for mistyped the command. You can enter into flink/flink-streaming-java 
and run `mvn clean package install -DskipTests=true` . It will install only 
flink-streaming-java module.

> 在 2016年5月12日，上午10:02，Jark  写道：
> 
> Hi Saiph, 
>You can enter flink directory and run  `mvn clean install -DskipTest=true` 
> to install all the modules (including flunk-streaming-java) into your local 
> .m2 repository .  After that, change your app dependencies version to the 
> version of your flink, such as “1.1-SNAPSHOT”. At last, reimport your app 
> project.
> 
> - Jark Wu
> 
>> 在 2016年5月12日，上午2:33，Saiph Kappa  写道：
>> 
>> Hi,
>> 
>> I'm performing some modifications on Flink (current trunk version). I want
>> a scala app (sbt based) to use that modified version. I'm only modifying
>> the flink-streaming-java module, what is the typical way to specify the
>> dependencies for my application in this case? Should I copy all jars to the
>> lib folder of my app, or to build a big fat jar? how do the devs here do it?
>> 
>> Thanks.
>

Re: [DISCUSS] API breaking change in DataStream Windows

2016-08-09 Thread Jark

As an user, I don’t like “casting option”. Because people who need set 
parallelism after CoGroup will certainly fall into this issue. They will 
subconsciously think Flink does not support this feature. We can’t assume most 
users will read JavaDocs and document carefully. 

Maybe we can post this mail to the user list, and listen whether most users can 
accept the “casting” work-around and which work-around they prefer. 


> 在 2016年8月8日，下午10:45，Robert Metzger  写道：
> 
> Thank you for bringing this discussion to the mailing list.
> 
> I agree with Chesnay's comment on GitHub that we should consider the
> "casting option" as well. I don't think we should break the API if there is
> a (admittedly ugly) work-around for those who run into the problem.
> If we put the work-around into the JavaDocs and the DataStream API
> documentation, everybody who is seriously blocked by this should find it.
> 
> For 2.0, we can break the API by changing the method signature.
> 
> 
> On Mon, Aug 8, 2016 at 4:11 PM, Stephan Ewen  wrote:
> 
>> Hi all!
>> 
>> We have a problem in the *DataStream API* around Windows for *CoGroup* and
>> *Join*.
>> These operations currently do not allow to set a parallelism, which is a
>> pretty heavy problem.
>> 
>> To fix it properly, we need to change the return types of the coGroup() and
>> join() operations, which *breaks the binary compatibility* - it* retains
>> source compatibility*, though.
>> 
>> The pull request with the change is:
>> https://github.com/apache/flink/pull/2305
>> 
>> There are very clumsy ways to work around this (custom casts in the user
>> code or making the join() / coGroup() behave differently than the other
>> operators) which we did not really think of as viable, because they would
>> need to be changed again in the future once we pull the API straight
>> (breaking even source compatibility then).
>> 
>> *I would suggest to actually break the API* at that point (binary, not
>> source) for *Flink 1.2* and add a big note in the release docs. An
>> uncomfortable step, but the alternatives are quite bad, too.
>> 
>> Have a look at what has been suggested in the pull request discussion and
>> please let us know what you think about that so we can proceed.
>> 
>> Greetings,
>> Stephan
>>

Re: Add MapState for keyed streams

2016-10-19 Thread Jark

That makes sense! Maybe we can make MapState implement Iterable interface.

> 在 2016年10月19日，下午5:48，SHI Xiaogang  写道：
> 
> Hi Jark
> 
> If the state is very big, it may occupy a lot of memory if we return
> Set>.
> 
> By wrapping the returned iterator, we can easily implement a method
> returning  Iterable>.
> Users can use that returned Iterable in the foreach loop.
> 
> Regards
> Xiaogang
> 
> 
> 
> 2016-10-19 17:43 GMT+08:00 Jark Wu :
> 
>> Hi Xiaogang,
>> 
>> I think maybe return Set> is better than
>> Iterator>.
>> Because users can use foreach on Set but not Iterator, and can use
>> iterator access via set.iterator().
>> Maybe Map.entrySet() is a more familiar way to users.
>> 
>> 
>> - Jark Wu
>> 
>>> 在 2016年10月19日，下午5:18，SHI Xiaogang  写道：
>>> 
>>> Agreed.
>>> 
>>> contains(K key) should be provided.
>>> The iterator() method should return Iterator> instead of
>>> Iterator>.
>>> 
>>> Besides, size() may also be provided.
>>> 
>>> With these methods, MapStates appear very similar to Java Maps. Users
>> will
>>> be very happy to use them.
>>> 
>>> Regards,
>>> Xiaogang
>>> 
>>> 
>>> 2016-10-19 16:55 GMT+08:00 Till Rohrmann :
>>> 
>>>> Hi Xiaogang,
>>>> 
>>>> I really like your proposal and think that this would be a valuable
>>>> addition to Flink :-)
>>>> 
>>>> For convenience we could maybe add contains(K key), too.
>>>> 
>>>> Java's Map interface returns a Set of Entry when calling entrySet()
>> (which
>>>> is the equivalent of iterator() in our interface). The Entry interface
>> not
>>>> only allows to get access to the key and value of the map entry but also
>>>> allows to set a value for the respective key via setValue (even though
>> it's
>>>> an optional operation). Maybe we want to offer something similar when
>>>> getting access to the entry set via the iterator method.
>>>> 
>>>> Cheers,
>>>> Till
>>>> 
>>>> On Wed, Oct 19, 2016 at 4:18 AM, SHI Xiaogang 
>>>> wrote:
>>>> 
>>>>> Hi, all. I created the JIRA https://issues.apache.org/
>>>>> jira/browse/FLINK-4856 to
>>>>> propose adding MapStates into Flink.
>>>>> 
>>>>> MapStates are very useful in our daily jobs. For example, when
>>>> implementing
>>>>> DistinctCount, we store the values into a MapState and the result of
>> each
>>>>> group(key) is exactly the number of entries in the MapState.
>>>>> 
>>>>> In my opinion, the methods provided by the MapState may include:
>>>>> * void put(K key, V value)
>>>>> * V get(K key)
>>>>> * Iterable keys()
>>>>> * Iterable values()
>>>>> * Iterator> iterator()
>>>>> 
>>>>> Do you have any comments? Any is appreciated.
>>>>> 
>>>>> Xiaogang
>>>>> 
>>>> 
>> 
>>

Re: [ANNOUNCE] New Flink committer Jincheng Sun

2017-07-10 Thread Jark

Welcome on board and congratulations Jincheng!


> 在 2017年7月11日，上午12:15，Matthias J. Sax  写道：
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA512
> 
> Congrats!
> 
> On 7/10/17 6:42 AM, Ted Yu wrote:
>> Congratulations, Jincheng.
>> 
>> On Mon, Jul 10, 2017 at 6:17 AM, Fabian Hueske 
>> wrote:
>> 
>>> Hi everybody,
>>> 
>>> On behalf of the PMC, I'm very happy to announce that Jincheng
>>> Sun has accepted the invitation of the PMC to become a Flink
>>> committer.
>>> 
>>> Since more than nine month, Jincheng is one of the most active
>>> contributors of the Table API / SQL module. He has contributed
>>> several major features, reported and fixed many bugs, and also
>>> spent a lot of time reviewing pull requests.
>>> 
>>> Please join in me congratulating Jincheng for becoming a Flink
>>> committer.
>>> 
>>> Thanks, Fabian
>>> 
>> 
> -BEGIN PGP SIGNATURE-
> Comment: GPGTools - https://gpgtools.org
> 
> iQIYBAEBCgAGBQJZY6gZAAoJELz8Z8hxAGOiMPMP4IL5jpqC1zBTSjOzBBwOc5BA
> TuZSLJG4z9Da8WVvHe+HjTfqkDI8Ctm+WbFqerdGhfpARdpgfb9e7HPpmfxGC1kQ
> fAGB14/1YWzOAWXGMDNLvaA01z1IJj6cTnuDbcG6rQePI+jjh2U22bGZiuvyGmCX
> gIIvmf/1D43uU/YWcprGmrlnLEuXLHK9GO1PPxTrdWWYt1m4DQgojKrUm7HatSKe
> QzdKwcvU6W9WK3hx7w0sPg1Pj9Rr9MNWLs1HGz5vn9GPKl0GhTw+TkjtHnNil4ly
> jZdbsIt3vlEqlT3fQbVbNKWtnhGh1pb/GetyIzQKw9GXorIDkVNQPP7d0QInRBfC
> FviddaSnh8jKr6PIu0AlMC/dx7a3f/h51Nrq+yp2OYV3TpOl+smUClKHL3uOOy4w
> lAtXc/RMj+13zgnPjZqc7q+YAjFc6Hv4uzJrzr5ibVyfA0ZmGrsgYsVglk01h2ys
> Ui95l1ZmTyKggoNVrPeGg/W68j4WEiEUcK1MvitnxSQfvWyUHqOTC/fTLzQHaSuk
> 295pZtowzSUnIsUcLokAOYwixgdU/ASgPHZ0rtgZuEtJGwVHjpQtTqh35EmM188L
> J3v4CYwAoGlTV7IyUIOcjwYiZM1q2RGu0ydGQ1he3GMmJcO8j3BYG9gzkRJM1oHC
> 59jkKzl2S23m5t8=
> =nghB
> -END PGP SIGNATURE-

Re: [ANNOUNCE] New Flink PMC member: Tzu-Li (Gordon) Tai

2017-07-10 Thread Jark

Congrats Gordon! 


> 在 2017年7月11日，上午9:37，Zhuoluo Yang  写道：
> 
> Congrats.
> 
> Thanks,
> 
> Zhuoluo 😀
> 
> 
> 
> 
> 
>> 在 2017年7月11日，上午3:34，Henry Saputra > > 写道：
>> 
>> Welcome and congrats!
>> 
>> On Mon, Jul 10, 2017 at 9:24 AM Fabian Hueske > > wrote:
>> 
>>> Congrats Gordon!
>>> 
>>> Cheers, Fabian
>>> 
>>> 2017-07-10 17:03 GMT+02:00 jincheng sun >> >:
>>> 
 Hi Gordon, Congratulations !!!
 
 Cheers,
 Jincheng
 
 2017-07-10 22:44 GMT+08:00 Robert Metzger >>> >:
 
> Hi Everybody,
> 
> On behalf of the Flink PMC, I'm very excited to announce Gordon as the
> latest addition to the Flink PMC.
> 
> Gordon is a very active community member, helping with a lot of the
 release
> tasks, project discussions and of course work on the codebase itself.
> 
> 
> Regards,
> Robert
> 
 
>>> 
>

Re: [ANNOUNCE] Apache Flink 1.8.1 released

2019-07-02 Thread Jark Wu

Thanks for being the release manager and the great job!

Cheers,
Jark

On Wed, 3 Jul 2019 at 10:16, Dian Fu  wrote:

> Awesome! Thanks a lot for being the release manager. Great job! @Jincheng
>
> Regards,
> Dian
>
> 在 2019年7月3日，上午10:08，jincheng sun  写道：
>
> I've also tweeted about it from my twitter:
> https://twitter.com/sunjincheng121/status/1146236834344648704
> later would be tweeted it from @ApacheFlink!
>
> Best, Jincheng
>
> Hequn Cheng  于2019年7月3日周三 上午9:48写道：
>
>> Thanks for being the release manager and the great work Jincheng!
>> Also thanks to Gorden and the community making this release possible!
>>
>> Best, Hequn
>>
>> On Wed, Jul 3, 2019 at 9:40 AM jincheng sun 
>> wrote:
>>
>>> Hi,
>>>
>>> The Apache Flink community is very happy to announce the release of
>>> Apache Flink 1.8.1, which is the first bugfix release for the Apache Flink
>>> 1.8 series.
>>>
>>> Apache Flink® is an open-source stream processing framework for
>>> distributed, high-performing, always-available, and accurate data streaming
>>> applications.
>>>
>>> The release is available for download at:
>>> https://flink.apache.org/downloads.html
>>>
>>> Please check out the release blog post for an overview of the
>>> improvements for this bugfix release:
>>> https://flink.apache.org/news/2019/07/02/release-1.8.1.html
>>>
>>> The full release notes are available in Jira:
>>>
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345164
>>>
>>> We would like to thank all contributors of the Apache Flink community
>>> who made this release possible!
>>>
>>> Great thanks to @Tzu-Li (Gordon) Tai  's offline
>>> kind help!
>>>
>>> Regards,
>>> Jincheng
>>>
>>
>

Re: [DISCUSS]Support Upsert mode for Streaming Non-window FlatAggregate

2019-07-04 Thread Jark Wu

Hi jingcheng,

I agree with Kurt's point. As you said "the user must know the keys of the
output of UDTAGG clearly".
If I understand correctly, the key information is strongly relative to the
UDTAGG implementation.
Users may call `flatAggregate` on a UDTAGG instance with different keys
which may result in a wrong result.
So I think it would be better to couple key information with UDTAGG
interface (i.e. "Approach 3" in your design doc).

Regards,
Jark

On Thu, 4 Jul 2019 at 18:06, Kurt Young  wrote:

> Hi Jincheng,
>
> Thanks for the clarification. I think you just pointed out my concern by
> yourself:
>
> > When a user uses a User-defined table aggregate function (UDTAGG), he
> must understand the behavior of the UDTAGG, including the return type and
> the characteristics of the returned data. such as the key fields.
>
> This indicates that the UDTAGG is somehow be classified to different types,
> one will no key, one with key information. So the developer of the UDTAGG
> should choose which type of this function should be. In this case,
> my question would be, why don't we have explicit information about keys
> such as we split UDTAGG to keyed UDTAGG and non-keyed UDTAGG. So the user
> and the framework will have a better understanding of
> this UDTAGG. `withKeys` solution is letting user to choose the key and it
> seems it will only work correctly only if the user choose the *right* key
> this UDTAGG has.
>
> Let me know if this makes sense to you.
>
> Best,
> Kurt
>
>
> On Thu, Jul 4, 2019 at 4:32 PM jincheng sun 
> wrote:
>
> > Hi All,
> >
> > @Kurt Young  one user-defined table aggregate function
> > can be used in both with(out) keys case, and we do not introduce any
> other
> > aggregations. just like the explanation from @Hequn.
> >
> > @Hequn Cheng  thanks for your explanation!
> >
> > One thing should be mentioned here:
> >
> > When a user uses a User-defined table aggregate function (UDTAGG), he
> must
> > understand the behavior of the UDTAGG, including the return type and the
> > characteristics of the returned data. such as the key fields.  So
> although
> > using `withKeys` approach it is not rigorous enough(we do not need) but
> > intuitive enough, considering that if `flatAggregate` is followed by an
> > `upsertSink`, then the user must know the keys of the output of UDTAGG
> > clearly, otherwise the keys of `upsertSink` cannot be defined. So I still
> > prefer the `withKeys` solution by now.
> >
> > Looking forward to any feedback from all of you!
> >
> > Best,
> > Jincheng
> >
> >
> >
> > Hequn Cheng  于2019年7月1日周一 下午5:35写道：
> >
> >> Hi Kurt,
> >>
> >> Thanks for your questions. Here are my thoughts.
> >>
> >> > if I want to write such kind function, should I make sure that this
> >> function is used with some keys?
> >> The key information may not be used. We can also use RetractSink to emit
> >> the table directly.
> >>
> >> >  If I need a use case to calculate topn without key, should I write
> >> another function or I can reuse previous one.
> >> For the TopN example, you can reuse the previous function if you don't
> >> care
> >> about the key information.
> >>
> >> So, the key information is only an indicator(or a description), not an
> >> operator, as Jincheng mentioned above.
> >> We do not need to change the function logic and it will not add any
> other
> >> aggregations.
> >>
> >> BTW, we have three approaches in the document. Approach 1 defines keys
> on
> >> API level as we think it's more common to define keys on Table.
> >> While approach 3 defines keys in the TableAggregateFunction which is
> more
> >> precise but it is not very clear for Table users. So, we should take all
> >> these into consideration, and make the decision in this discussion
> thread.
> >>
> >> You can take a look at the document and welcome any suggestions or other
> >> better solutions.
> >>
> >> Best, Hequn
> >>
> >>
> >> On Mon, Jul 1, 2019 at 12:13 PM Kurt Young  wrote:
> >>
> >> > Hi Jincheng,
> >> >
> >> > Thanks for the clarification. Take 'TopN' for example, if I want to
> >> write
> >> > such kind function,
> >> > should I make sure that this function is used with some keys? If I
> need
> >> a
> >> > use case to calculate
> >> > topn without key, should I wri

Re: Flink 1.9 blink planner

2019-07-04 Thread Jark Wu

Hi Kaka,

Thanks for trying to use blink planner.

We are working on bridging blink planner to the public API (i.e.
TableEnvironment) and this is on a good way.
Until then, we will remove blink's own TableEnvironment and use API's
TableEnvironment and your problem will be fixed.
Ideally, we can finish this at the end of this week.

Best,
Jark

On Thu, 4 Jul 2019 at 19:20, kaka chen  wrote:

> Hi All:
>
> We found TableEnvironment in Flink 1.9 blink planner didn't support
> functions such as sqlUpdate() and  registerTableSink(), when I used Scala
> api, it will throw error.
>
> *Error:(61, 20) value sqlUpdate is not a member of
> org.apache.flink.table.api.scala.StreamTableEnvironment.*
>
> Is it normal? Could someone help, thanks.
>
> Thanks,
> Kaka Chen
>

Re: [VOTE] Migrate to sponsored Travis account

2019-07-04 Thread Jark Wu

+1 for the migration and great thanks to Chesnay and Bowen for pushing this!

Cheers,
Jark

On Fri, 5 Jul 2019 at 09:34, Congxian Qiu  wrote:

> +1 for the migration.
>
> Best,
> Congxian
>
>
> Hequn Cheng  于2019年7月4日周四 下午9:42写道：
>
> > +1.
> >
> > And thanks a lot to Chesnay for pushing this.
> >
> > Best, Hequn
> >
> > On Thu, Jul 4, 2019 at 8:07 PM Chesnay Schepler 
> > wrote:
> >
> > > Note that the Flinkbot approach isn't that trivial either; we can't
> > > _just_ trigger builds for a branch in the apache repo, but would first
> > > have to clone the branch/pr into a separate repository (that is owned
> by
> > > the github account that the travis account would be tied to).
> > >
> > > One roadblock after the next showing up...
> > >
> > > On 04/07/2019 11:59, Chesnay Schepler wrote:
> > > > Small update with mostly bad news:
> > > >
> > > > INFRA doesn't know whether it is possible, and referred my to Travis
> > > > support.
> > > > They did point out that it could be problematic in regards to
> > > > read/write permissions for the repository.
> > > >
> > > > From my own findings /so far/ with a test repo/organization, it does
> > > > not appear possible to configure the Travis account used for a
> > > > specific repository.
> > > >
> > > > So yeah, if we go down this route we may have to pimp the Flinkbot to
> > > > trigger builds through the Travis REST API.
> > > >
> > > > On 04/07/2019 10:46, Chesnay Schepler wrote:
> > > >> I've raised a JIRA
> > > >> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> > > >> inquire whether it would be possible to switch to a different Travis
> > > >> account, and if so what steps would need to be taken.
> > > >> We need a proper confirmation from INFRA since we are not in full
> > > >> control of the flink repository (for example, we cannot access the
> > > >> settings page).
> > > >>
> > > >> If this is indeed possible, Ververica is willing sponsor a Travis
> > > >> account for the Flink project.
> > > >> This would provide us with more than enough resources than we need.
> > > >>
> > > >> Since this makes the project more reliant on resources provided by
> > > >> external companies I would like to vote on this.
> > > >>
> > > >> Please vote on this proposal, as follows:
> > > >> [ ] +1, Approve the migration to a Ververica-sponsored Travis
> > > >> account, provided that INFRA approves
> > > >> [ ] -1, Do not approach the migration to a Ververica-sponsored
> Travis
> > > >> account
> > > >>
> > > >> The vote will be open for at least 24h, and until we have
> > > >> confirmation from INFRA. The voting period may be shorter than the
> > > >> usual 3 days since our current is effectively not working.
> > > >>
> > > >> On 04/07/2019 06:51, Bowen Li wrote:
> > > >>> Re: > Are they using their own Travis CI pool, or did the switch to
> > > >>> an entirely different CI service?
> > > >>>
> > > >>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
> > > >>> currently moving away from ASF's Travis to their own in-house metal
> > > >>> machines at [1] with custom CI application at [2]. They've seen
> > > >>> significant improvement w.r.t both much higher performance and
> > > >>> basically no resource waiting time, "night-and-day" difference
> > > >>> quoting Wes.
> > > >>>
> > > >>> Re: > If we can just switch to our own Travis pool, just for our
> > > >>> project, then this might be something we can do fairly quickly?
> > > >>>
> > > >>> I believe so, according to [3] and [4]
> > > >>>
> > > >>>
> > > >>> [1] https://ci.ursalabs.org/ <https://ci.ursalabs.org/#/>
> > > >>> [2] https://github.com/ursa-labs/ursabot
> > > >>> [3]
> > > >>>
> > >
> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
> > > >>>
> > > >>> [4]
> > > >>>

Re: [DISCUSS]Support Upsert mode for Streaming Non-window FlatAggregate

2019-07-04 Thread Jark Wu

Hi Hequn,

> If the TopN table aggregate function
> outputs three columns(rankid, time, value), either rankid or rankid+time
could be
> used as the key. Which one to be chosen is more likely to be decided by
the user
> according to his business.
In this case, the TopN table aggregate function should return two sets of
unique key, one is "rankid", another is "rankid, time".
This will be more align with current TopN node in blink planner and let
optimizer to decide which key based on the downstream information (column
selection, sink's primary key).


Best,
Jark

On Fri, 5 Jul 2019 at 00:05, Hequn Cheng  wrote:

> Hi Kurt and Jark,
>
> Thanks a lot for your great inputs!
>
> The keys of the query may not strongly be related to the UDTAGG.
> It may also be related to the corresponding scenarios that a user wants to
> achieve.
>
> For example, take TopN again as an example. If the TopN table aggregate
> function
> outputs three columns(rankid, time, value), either rankid or rankid+time
> could be
> used as the key. Which one to be chosen is more likely to be decided by
> the user
> according to his business.
>
> Best, Hequn
>
> On Thu, Jul 4, 2019 at 8:11 PM Jark Wu  wrote:
>
>> Hi jingcheng,
>>
>> I agree with Kurt's point. As you said "the user must know the keys of
>> the output of UDTAGG clearly".
>> If I understand correctly, the key information is strongly relative to
>> the UDTAGG implementation.
>> Users may call `flatAggregate` on a UDTAGG instance with different keys
>> which may result in a wrong result.
>> So I think it would be better to couple key information with UDTAGG
>> interface (i.e. "Approach 3" in your design doc).
>>
>> Regards,
>> Jark
>>
>> On Thu, 4 Jul 2019 at 18:06, Kurt Young  wrote:
>>
>>> Hi Jincheng,
>>>
>>> Thanks for the clarification. I think you just pointed out my concern by
>>> yourself:
>>>
>>> > When a user uses a User-defined table aggregate function (UDTAGG), he
>>> must understand the behavior of the UDTAGG, including the return type and
>>> the characteristics of the returned data. such as the key fields.
>>>
>>> This indicates that the UDTAGG is somehow be classified to different
>>> types,
>>> one will no key, one with key information. So the developer of the UDTAGG
>>> should choose which type of this function should be. In this case,
>>> my question would be, why don't we have explicit information about keys
>>> such as we split UDTAGG to keyed UDTAGG and non-keyed UDTAGG. So the user
>>> and the framework will have a better understanding of
>>> this UDTAGG. `withKeys` solution is letting user to choose the key and it
>>> seems it will only work correctly only if the user choose the *right* key
>>> this UDTAGG has.
>>>
>>> Let me know if this makes sense to you.
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Thu, Jul 4, 2019 at 4:32 PM jincheng sun 
>>> wrote:
>>>
>>> > Hi All,
>>> >
>>> > @Kurt Young  one user-defined table aggregate
>>> function
>>> > can be used in both with(out) keys case, and we do not introduce any
>>> other
>>> > aggregations. just like the explanation from @Hequn.
>>> >
>>> > @Hequn Cheng  thanks for your explanation!
>>> >
>>> > One thing should be mentioned here:
>>> >
>>> > When a user uses a User-defined table aggregate function (UDTAGG), he
>>> must
>>> > understand the behavior of the UDTAGG, including the return type and
>>> the
>>> > characteristics of the returned data. such as the key fields.  So
>>> although
>>> > using `withKeys` approach it is not rigorous enough(we do not need) but
>>> > intuitive enough, considering that if `flatAggregate` is followed by an
>>> > `upsertSink`, then the user must know the keys of the output of UDTAGG
>>> > clearly, otherwise the keys of `upsertSink` cannot be defined. So I
>>> still
>>> > prefer the `withKeys` solution by now.
>>> >
>>> > Looking forward to any feedback from all of you!
>>> >
>>> > Best,
>>> > Jincheng
>>> >
>>> >
>>> >
>>> > Hequn Cheng  于2019年7月1日周一 下午5:35写道：
>>> >
>>> >> Hi Kurt,
>>> >>
>>> >> Thanks for your questions. Here are my thoughts.
>>> >

Re: [DISCUSS]Support Upsert mode for Streaming Non-window FlatAggregate

2019-07-08 Thread Jark Wu

Thanks Jincheng,

I think approach 3 there is no ambiguity in the semantics and can better
explain a deterministic result, although it is not so easy to use.
So I'm +1 to approach 3.

Best,
Jark

On Fri, 5 Jul 2019 at 18:13, jincheng sun  wrote:

> Hi,
>
> @Kurt @Jark thanks again for your comments.
>
> @Kurt
> Separating key and non-key for UDTAGG can definitely provide more
> information for the system, however, it will also add more burden to our
> users and bring some code reuse problems. BTW, approach 3 can also be used
> to separate UDTAGG into keyed or non-keyed as we can check whether the key
> list is empty. So from this point of view, we can use approach 3 to solve
> your problem.
>
> @Jark
> It's great that the TopN in Blink can decide the key automatically. But,
> I'd like to point out another case that the keys cannot be decided by the
> system, i.e., can only be decided by the user. For example, for the TopN,
> let's say top1 for better understanding. Support the Top1 outputs three
> columns(rankid, value, seller_name), and the user wants to upsert the
> result either with key of rankid or with the key of rankid+seller_name.
> 1. With the key of rankid: In this case, the user just wants to get the
> top 1 record.
> 2. With the key of rankid+seller_name: In this case, the user wants to get
> all seller_names that have ever belong to top1. This can not be solved by
> the approach 3 if using only one function. However, it is very easy to
> implement with the withKey approach.
>
> Even though, I have thought more clearly about these things and find more
> interesting things that I want to share with you all. For the TopN example
> which I listed above, it may also lead to a problem in which batch and
> streaming are not unified.
>
> To make it worse, the upsert sink is not supported in batch and we even
> don't have any clear implementation plan about how to support upsert on the
> batch, the unification problem for `withKeys` approach becomes hang in
> doubt.
>
> So, to avoid the unification problem, I think we can also use the approach
> 3. It is more rigorous although less flexible compared to the `withKeys`
> approach.
>
> Meanwhile, I will think more about the unification problem later. Maybe
> new ideas about it may come through. :)
>
> Best，
> Jincheng
>
> Jark Wu  于2019年7月5日周五 上午10:48写道：
>
>> Hi Hequn,
>>
>> > If the TopN table aggregate function
>> > outputs three columns(rankid, time, value), either rankid or
>> rankid+time could be
>> > used as the key. Which one to be chosen is more likely to be decided by
>> the user
>> > according to his business.
>> In this case, the TopN table aggregate function should return two sets of
>> unique key, one is "rankid", another is "rankid, time".
>> This will be more align with current TopN node in blink planner and let
>> optimizer to decide which key based on the downstream information (column
>> selection, sink's primary key).
>>
>>
>> Best,
>> Jark
>>
>> On Fri, 5 Jul 2019 at 00:05, Hequn Cheng  wrote:
>>
>>> Hi Kurt and Jark,
>>>
>>> Thanks a lot for your great inputs!
>>>
>>> The keys of the query may not strongly be related to the UDTAGG.
>>> It may also be related to the corresponding scenarios that a user wants
>>> to achieve.
>>>
>>> For example, take TopN again as an example. If the TopN table aggregate
>>> function
>>> outputs three columns(rankid, time, value), either rankid or rankid+time
>>> could be
>>> used as the key. Which one to be chosen is more likely to be decided by
>>> the user
>>> according to his business.
>>>
>>> Best, Hequn
>>>
>>> On Thu, Jul 4, 2019 at 8:11 PM Jark Wu  wrote:
>>>
>>>> Hi jingcheng,
>>>>
>>>> I agree with Kurt's point. As you said "the user must know the keys of
>>>> the output of UDTAGG clearly".
>>>> If I understand correctly, the key information is strongly relative to
>>>> the UDTAGG implementation.
>>>> Users may call `flatAggregate` on a UDTAGG instance with different keys
>>>> which may result in a wrong result.
>>>> So I think it would be better to couple key information with UDTAGG
>>>> interface (i.e. "Approach 3" in your design doc).
>>>>
>>>> Regards,
>>>> Jark
>>>>
>>>> On Thu, 4 Jul 2019 at 18:06, Kurt Young  wrote:
>>>>
>>>>> Hi Jincheng,
>>>>

Re: [ANNOUNCE] Rong Rong becomes a Flink committer

2019-07-11 Thread Jark Wu

Congratulations Rong Rong!
Welcome on board!

On Thu, 11 Jul 2019 at 22:25, Fabian Hueske  wrote:

> Hi everyone,
>
> I'm very happy to announce that Rong Rong accepted the offer of the Flink
> PMC to become a committer of the Flink project.
>
> Rong has been contributing to Flink for many years, mainly working on SQL
> and Yarn security features. He's also frequently helping out on the
> user@f.a.o mailing lists.
>
> Congratulations Rong!
>
> Best, Fabian
> (on behalf of the Flink PMC)
>

Re: blink planner issue

2019-07-16 Thread Jark Wu

Hi Kaka,

Thanks for reporting this. We didn't cover integrate tests for connectors
yet because of FLINK-13276. We will cover that after FLINK-13276 is fixed.

The problem you raised might because we misused `LogicalType.equals` which
will checks field names as well.
I have created an issue (FLINK-13290) to track this problem.

Best,
Jark


On Tue, 16 Jul 2019 at 17:35, kaka chen  wrote:

> I am looking this issue, the related codes which throw errors are:
>
> SinkCodeGenerator:: validateFieldType()
>
> ...
>
>  // Tuple/Case class/Row type requested
>   case tt: TupleTypeInfoBase[_] =>
> fieldTypes.zipWithIndex foreach {
>   case (fieldTypeInfo: GenericTypeInfo[_], i) =>
> val requestedTypeInfo = tt.getTypeAt(i)
> if (!requestedTypeInfo.isInstanceOf[GenericTypeInfo[Object]]) {
>   throw new TableException(
> s"Result field '${fieldNames(i)}' does not match requested
> type. " +
> s"Requested: $requestedTypeInfo; Actual:
> $fieldTypeInfo")
> }
>   case (fieldTypeInfo, i) =>
> val requestedTypeInfo = tt.getTypeAt(i)
> validateFieldType(requestedTypeInfo)
>* if (fromTypeInfoToLogicalType(fieldTypeInfo) !=*
> *fromTypeInfoToLogicalType(requestedTypeInfo) &&*
> *!requestedTypeInfo.isInstanceOf[GenericTypeInfo[Object]])
> {*
> *  val fieldNames = tt.getFieldNames*
> *  throw new TableException(s"Result field '${fieldNames(i)}'
> does not match requested" +*
> *  s" type. Requested: $requestedTypeInfo; Actual:
> $fieldTypeInfo")*
> *}*
> }
>...
>
> Thanks,
> Frank
>
> kaka chen  于2019年7月16日周二 下午5:23写道：
>
> > Hi All,
> >
> >
> > We are trying to switch to blink table planner in HBase connector, which
> > found the following error:
> >
> >
> > Running org.apache.flink.addons.hbase.HBaseSinkITCase
> >
> > Formatting using clusterid: testClusterID
> >
> > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 28.042
> sec
> > <<< FAILURE! - in org.apache.flink.addons.hbase.HBaseSinkITCase
> >
> > testTableSink(org.apache.flink.addons.hbase.HBaseSinkITCase)  Time
> > elapsed: 2.431 sec  <<< ERROR!
> >
> > org.apache.flink.table.api.TableException: Result field 'family1' does
> not
> > match requested type. Requested: Row(col1: Integer); Actual: Row(EXPR$0:
> > Integer)
> >
> > at
> >
> org.apache.flink.addons.hbase.HBaseSinkITCase.testTableSink(HBaseSinkITCase.java:140)
> >
> >
> > The original flink table planner executed successfully.
> >
> >
> > Thanks,
> >
> > Kaka Chen
> >
>

Re: blink planner issue

2019-07-16 Thread Jark Wu

Hi Caizhi and Kaka,

Actually, equals-with-field-names and equals-without-field-names are both
needed in Flink SQL.
It's not correct to ignore field names comparison simply in RowType#equals.
We have encountered this problems before because RowTypeInfo doesn't
compare field names (see FLINK-12848).

Best,
Jark

On Tue, 16 Jul 2019 at 23:38, kaka chen  wrote:

> Hi Caizhi and Jark,
>
> I think you are correct, from the quick view for source code, it should
> only compares field types in the equals method.
> Currently some composite logical row type has compared name and
> description, such as RowType and StructuredType.
>
> Thanks,
> Kaka Chen
>
> Caizhi Weng  于2019年7月16日周二 下午11:16写道：
>
> > Hi Kaka and Jark,
> >
> > On a side note, `RowTypeInfo` only compares field types in its `equals`
> > method. I think our new logical row type shouldn't break this behavior.
> >
> > kaka chen  于2019年7月16日周二 下午10:53写道：
> >
> > > Hi Jark,
> > >
> > > Thanks!
> > >
> > > Thanks,
> > > Kaka Chen
> > >
> > > Jark Wu  于2019年7月16日周二 下午10:30写道：
> > >
> > > > Hi Kaka,
> > > >
> > > > Thanks for reporting this. We didn't cover integrate tests for
> > connectors
> > > > yet because of FLINK-13276. We will cover that after FLINK-13276 is
> > > fixed.
> > > >
> > > > The problem you raised might because we misused `LogicalType.equals`
> > > which
> > > > will checks field names as well.
> > > > I have created an issue (FLINK-13290) to track this problem.
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > >
> > > > On Tue, 16 Jul 2019 at 17:35, kaka chen 
> wrote:
> > > >
> > > > > I am looking this issue, the related codes which throw errors
> > are:
> > > > >
> > > > > SinkCodeGenerator:: validateFieldType()
> > > > >
> > > > > ...
> > > > >
> > > > >  // Tuple/Case class/Row type requested
> > > > >   case tt: TupleTypeInfoBase[_] =>
> > > > > fieldTypes.zipWithIndex foreach {
> > > > >   case (fieldTypeInfo: GenericTypeInfo[_], i) =>
> > > > > val requestedTypeInfo = tt.getTypeAt(i)
> > > > > if
> > > > (!requestedTypeInfo.isInstanceOf[GenericTypeInfo[Object]]) {
> > > > >   throw new TableException(
> > > > > s"Result field '${fieldNames(i)}' does not match
> > > > requested
> > > > > type. " +
> > > > > s"Requested: $requestedTypeInfo; Actual:
> > > > > $fieldTypeInfo")
> > > > > }
> > > > >   case (fieldTypeInfo, i) =>
> > > > > val requestedTypeInfo = tt.getTypeAt(i)
> > > > > validateFieldType(requestedTypeInfo)
> > > > >* if (fromTypeInfoToLogicalType(fieldTypeInfo) !=*
> > > > > *fromTypeInfoToLogicalType(requestedTypeInfo) &&*
> > > > > *
> > > > !requestedTypeInfo.isInstanceOf[GenericTypeInfo[Object]])
> > > > > {*
> > > > > *  val fieldNames = tt.getFieldNames*
> > > > > *  throw new TableException(s"Result field
> > > '${fieldNames(i)}'
> > > > > does not match requested" +*
> > > > > *  s" type. Requested: $requestedTypeInfo; Actual:
> > > > > $fieldTypeInfo")*
> > > > > *}*
> > > > > }
> > > > >...
> > > > >
> > > > > Thanks,
> > > > > Frank
> > > > >
> > > > > kaka chen  于2019年7月16日周二 下午5:23写道：
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > >
> > > > > > We are trying to switch to blink table planner in HBase
> connector,
> > > > which
> > > > > > found the following error:
> > > > > >
> > > > > >
> > > > > > Running org.apache.flink.addons.hbase.HBaseSinkITCase
> > > > > >
> > > > > > Formatting using clusterid: testClusterID
> > > > > >
> > > > > > Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
> > > 28.042
> > > > > sec
> > > > > > <<< FAILURE! - in org.apache.flink.addons.hbase.HBaseSinkITCase
> > > > > >
> > > > > > testTableSink(org.apache.flink.addons.hbase.HBaseSinkITCase)
> Time
> > > > > > elapsed: 2.431 sec  <<< ERROR!
> > > > > >
> > > > > > org.apache.flink.table.api.TableException: Result field 'family1'
> > > does
> > > > > not
> > > > > > match requested type. Requested: Row(col1: Integer); Actual:
> > > > Row(EXPR$0:
> > > > > > Integer)
> > > > > >
> > > > > > at
> > > > > >
> > > > >
> > > >
> > >
> >
> org.apache.flink.addons.hbase.HBaseSinkITCase.testTableSink(HBaseSinkITCase.java:140)
> > > > > >
> > > > > >
> > > > > > The original flink table planner executed successfully.
> > > > > >
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Kaka Chen
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Flink "allow lateness" for tables

2019-07-17 Thread Jark Wu

Hi Ramya,

Are you looking for "allow lateness" for tumbling window in Flink SQL?

If yes, Flink SQL doesn't provide such things currently.
However, blink planner (the new runner of Flink SQL) will support this
maybe in the next release (v1.10).
Actually, we already implemented it in blink planner but didn't expose the
feature in v1.9 to let us have enough time to review the API.

For now, you can convert Table to DataStream, and can use tumbling window
functionality on the DataStream which supports allow lateness.
Or you can increase the watermark delay to wait more late records, but the
window output delay will get larger.

Best,
Jark

On Wed, 17 Jul 2019 at 17:39, Ramya Ramamurthy  wrote:

> Hi,
>
> I would like to know if there is some configuration which enabled to
> configure allow lateness in table. The documentation mentions about streams
> and not tables.
> If this is not present, is there a way to collect it on a side output for
> tables.
>
> Today, we see some late packet drops in Flink, where my tumbling window is
> of 1 second and check-pointing every 5 mins.
>
> Thanks.
>

Re: [ANNOUNCE] Jiangjie (Becket) Qin has been added as a committer to the Flink project

2019-07-18 Thread Jark Wu

Congratulations Becket! Well deserved.

Cheers,
Jark

On Thu, 18 Jul 2019 at 15:56, Paul Lam  wrote:

> Congrats Becket!
>
> Best,
> Paul Lam
>
> > 在 2019年7月18日，15:41，Robert Metzger  写道：
> >
> > Hi all,
> >
> > I'm excited to announce that Jiangjie (Becket) Qin just became a Flink
> > committer!
> >
> > Congratulations Becket!
> >
> > Best,
> > Robert (on behalf of the Flink PMC)
>
>

Re: 请问这些名词，在翻译 Glossary 时，有必要翻译成中文吗？

2019-07-18 Thread Jark Wu

Hi highfei,

Thanks for bringing up this discussion. I would suggest to move the
discussion to the Glossary translation JIRA FLINK-13037
<https://issues.apache.org/jira/browse/FLINK-13037>.


Thanks,
Jark




On Fri, 19 Jul 2019 at 09:00, Zili Chen  wrote:

> Hi,
>
> 欢迎有 PR 后同步到这个 thread 上 :-)
>
> Best,
> tison.
>
>
> highfei2011  于2019年7月19日周五 上午8:34写道：
>
> > Hi，Zili Chen:
> > 早上好，你讲的没错，谢谢。另外我发现，Glossary 英文文档中没有 Slot 和 Parallelism
> > 的说明，建议添加。这样可以方便初学者和用户的学习和使用！
> >
> > 祝好
> >
> >
> >
> >  Original Message 
> > Subject: Re: 请问这些名词，在翻译 Glossary 时，有必要翻译成中文吗？
> > From: Zili Chen
> > To: user...@flink.apache.org
> > CC:
> >
> > 没有可援引的通译出处建议专有名词不要翻译。Glossary 的解释部分可以解释得详尽一点，上面像 record task
> > 这些有比较普遍共识的还有商讨空间，像 transformation "operator chain"
> > 强行翻译很可能是懂的人本来就看得懂，不懂的人看了还是不懂。现在不翻译在有通译之后可以改，先根据个人喜好翻译了以后就不好改了。
> >
> > 一点拙见。
> >
> >
> > Best,
> > tison.
> >
> >
> > highfei2011  于2019年7月18日周四 下午11:35写道：
> >
> > > Hi 各位,
> > >   晚上好！
> > >   以下名词在翻译 Glossary 章节时，有必要翻译成中文吗？名词列表如下：
> > >
> > >
> > >
> > > Flink Application Cluster
> > >
> > >
> > > Flink Cluster
> > >
> > >
> > > Event
> > >
> > >
> > > ExecutionGraph
> > >
> > >
> > > Function
> > >
> > >
> > > Instance
> > >
> > >
> > > Flink Job
> > >
> > >
> > > JobGraph
> > >
> > >
> > > Flink JobManager
> > >
> > >
> > > Logical Graph
> > >
> > >
> > > Managed State
> > >
> > >
> > > Flink Master
> > >
> > >
> > > Operator
> > >
> > >
> > > Operator Chain
> > >
> > >
> > > Partition
> > >
> > >
> > > Physical Graph
> > >
> > >
> > > Record
> > >
> > >
> > > Flink Session Cluster
> > >
> > >
> > > State Backend
> > >
> > >
> > > Sub-Task
> > >
> > >
> > > Task
> > >
> > >
> > > Flink TaskManager
> > >
> > >
> > > Transformation
> > >
> > >
> > >
> > >
> > > 祝好！
> >
> >
>

Re: 请问这些名词，在翻译 Glossary 时，有必要翻译成中文吗？

2019-07-18 Thread Jark Wu

Hi,

Just find the Glossary translation PR is created [1]. Let's move the
discussion there.

[1]. https://github.com/apache/flink/pull/9173

On Fri, 19 Jul 2019 at 11:22, Jark Wu  wrote:

> Hi highfei,
>
> Thanks for bringing up this discussion. I would suggest to move the
> discussion to the Glossary translation JIRA FLINK-13037
> <https://issues.apache.org/jira/browse/FLINK-13037>.
>
>
> Thanks,
> Jark
>
>
>
>
> On Fri, 19 Jul 2019 at 09:00, Zili Chen  wrote:
>
>> Hi,
>>
>> 欢迎有 PR 后同步到这个 thread 上 :-)
>>
>> Best,
>> tison.
>>
>>
>> highfei2011  于2019年7月19日周五 上午8:34写道：
>>
>> > Hi，Zili Chen:
>> > 早上好，你讲的没错，谢谢。另外我发现，Glossary 英文文档中没有 Slot 和 Parallelism
>> > 的说明，建议添加。这样可以方便初学者和用户的学习和使用！
>> >
>> > 祝好
>> >
>> >
>> >
>> >  Original Message 
>> > Subject: Re: 请问这些名词，在翻译 Glossary 时，有必要翻译成中文吗？
>> > From: Zili Chen
>> > To: user...@flink.apache.org
>> > CC:
>> >
>> > 没有可援引的通译出处建议专有名词不要翻译。Glossary 的解释部分可以解释得详尽一点，上面像 record task
>> > 这些有比较普遍共识的还有商讨空间，像 transformation "operator chain"
>> > 强行翻译很可能是懂的人本来就看得懂，不懂的人看了还是不懂。现在不翻译在有通译之后可以改，先根据个人喜好翻译了以后就不好改了。
>> >
>> > 一点拙见。
>> >
>> >
>> > Best,
>> > tison.
>> >
>> >
>> > highfei2011  于2019年7月18日周四 下午11:35写道：
>> >
>> > > Hi 各位,
>> > >   晚上好！
>> > >   以下名词在翻译 Glossary 章节时，有必要翻译成中文吗？名词列表如下：
>> > >
>> > >
>> > >
>> > > Flink Application Cluster
>> > >
>> > >
>> > > Flink Cluster
>> > >
>> > >
>> > > Event
>> > >
>> > >
>> > > ExecutionGraph
>> > >
>> > >
>> > > Function
>> > >
>> > >
>> > > Instance
>> > >
>> > >
>> > > Flink Job
>> > >
>> > >
>> > > JobGraph
>> > >
>> > >
>> > > Flink JobManager
>> > >
>> > >
>> > > Logical Graph
>> > >
>> > >
>> > > Managed State
>> > >
>> > >
>> > > Flink Master
>> > >
>> > >
>> > > Operator
>> > >
>> > >
>> > > Operator Chain
>> > >
>> > >
>> > > Partition
>> > >
>> > >
>> > > Physical Graph
>> > >
>> > >
>> > > Record
>> > >
>> > >
>> > > Flink Session Cluster
>> > >
>> > >
>> > > State Backend
>> > >
>> > >
>> > > Sub-Task
>> > >
>> > >
>> > > Task
>> > >
>> > >
>> > > Flink TaskManager
>> > >
>> > >
>> > > Transformation
>> > >
>> > >
>> > >
>> > >
>> > > 祝好！
>> >
>> >
>>
>

Re: [DISCUSS] A more restrictive JIRA workflow

2019-07-18 Thread Jark Wu

A quick question, what should we do if a developer creates a JIRA issue and
then create a pull request at once without assigning?


Regards,
Jark

On Thu, 18 Jul 2019 at 18:50, Zili Chen  wrote:

> Checking the result, as a discovery, I found that one can
> still file a JIRA with "blocker" priority.
>
> IIRC someone in this thread once mentioned that
> "Don't allow contributors to set a blocker priority."
>
> Chesnay,
>
> Thanks for the clarification.
>
>
> Best,
> tison.
>
>
> Chesnay Schepler  于2019年7月18日周四 下午6:40写道：
>
> > We haven't wiped the set of contributors yet. Not sure if there's an
> > easy way to remove the permissions for all of them; someone from the PMC
> > may have to bite the bullet and click 600 times in a row :)
> >
> > On 18/07/2019 12:32, Zili Chen wrote:
> > > Robert,
> > >
> > > Thanks for your effort. Rejecting contributor permission request
> > > with a nice message and pointing them to the announcement sounds
> > > reasonable. Just to be clear, we now have no person with contributor
> > > role, right?
> > >
> > > Chesnay,
> > >
> > > https://flink.apache.org/contributing/contribute-code.html has been
> > > updated and mentions that "Only committers can assign a Jira ticket."
> > >
> > > I think the corresponding update has been done.
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Chesnay Schepler  于2019年7月18日周四 下午6:25写道：
> > >
> > >> Do our contribution guidelines contain anything that should be
> updated?
> > >>
> > >> On 18/07/2019 12:24, Chesnay Schepler wrote:
> > >>> Sounds good to me.
> > >>>
> > >>> On 18/07/2019 12:07, Robert Metzger wrote:
> > >>>> Infra has finally changed the permissions. I just announced the
> > >>>> change in a
> > >>>> separate email [1].
> > >>>>
> > >>>> One thing I wanted to discuss here is, how do we want to handle all
> > the
> > >>>> "contributor permissions" requests?
> > >>>>
> > >>>> My proposal is to basically reject them with a nice message,
> pointing
> > >>>> them
> > >>>> to my announcement.
> > >>>>
> > >>>> What do you think?
> > >>>>
> > >>>>
> > >>>>
> > >>>> [1]
> > >>>>
> > >>
> >
> https://lists.apache.org/thread.html/4ed570c7110b7b55b5c3bd52bb61ff35d5bda88f47939d8e7f1844c4@%3Cdev.flink.apache.org%3E
> > >>>>
> > >>>>
> > >>>> On Thu, Jul 4, 2019 at 1:21 PM Robert Metzger 
> > >>>> wrote:
> > >>>>
> > >>>>> This is the Jira ticket I opened
> > >>>>> https://issues.apache.org/jira/browse/INFRA-18644 a long time ago
> :)
> > >>>>>
> > >>>>> On Thu, Jul 4, 2019 at 10:47 AM Chesnay Schepler <
> ches...@apache.org
> > >
> > >>>>> wrote:
> > >>>>>
> > >>>>>> @Robert what's the state here?
> > >>>>>>
> > >>>>>> On 24/06/2019 16:16, Robert Metzger wrote:
> > >>>>>>> Hey all,
> > >>>>>>>
> > >>>>>>> I would like to drive this discussion to an end soon.
> > >>>>>>> I've just merged the updated contribution guide to the Flink
> > website:
> > >>>>>>> https://flink.apache.org/contributing/contribute-code.html
> > >>>>>>>
> > >>>>>>> I will now ask Apache IINFRA to change the permissions in our
> Jira.
> > >>>>>>>
> > >>>>>>> Here's the updated TODO list:
> > >>>>>>>
> > >>>>>>> 1. I update the contribution guide DONE
> > >>>>>>> 2. Update Flinkbot to close invalid PRs, and show warnings on PRs
> > >>>>>>> with
> > >>>>>>> unassigned JIRAs IN PROGRESS
> > >>>>>>> 3. We ask Infra to change the permissions of our JIRA so that: IN
> > >>>>>> PROGRESS
> > >>>>>>>  a) only committers can assign users to tickets
> > >>>>>&

Re: flink-mapr-fs failed in travis

2019-07-18 Thread Jark Wu

It seems that it is introduced by this commit:
https://github.com/apache/flink/commit/5c36c650e6520d92191ce2da33f7dcae774319f6
Hi @Chesnay Schepler  , do we need to add
"-Punsafe-mapr-repo" to the ".travis.yml"?

Best,
Jark

On Fri, 19 Jul 2019 at 10:58, JingsongLee 
wrote:

> Hi everyone:
>
> flink-mapr-fs failed in travis, and I retried many times, and also failed.
> Anyone has idea about this?
>
> 01:32:54.755 [ERROR] Failed to execute goal on project flink-mapr-fs:
> Could not resolve dependencies for project
> org.apache.flink:flink-mapr-fs:jar:1.10-SNAPSHOT: Failed to collect
> dependencies at com.mapr.hadoop:maprfs:jar:5.2.1-mapr: Failed to read
> artifact descriptor for com.mapr.hadoop:maprfs:jar:5.2.1-mapr: Could not
> transfer artifact com.mapr.hadoop:maprfs:pom:5.2.1-mapr from/to
> mapr-releases (https://repository.mapr.com/maven/):
> sun.security.validator.ValidatorException: PKIX path building failed:
> sun.security.provider.certpath.SunCertPathBuilderException: unable to find
> valid certification path to requested target -> [Help 1]
>
> https://api.travis-ci.org/v3/job/560790299/log.txt
>
> Best, Jingsong Lee

Re: flink-mapr-fs failed in travis

2019-07-19 Thread Jark Wu

Great! Thanks Chesnay for the quick fixing.


On Fri, 19 Jul 2019 at 16:40, Chesnay Schepler  wrote:

> I think I found the issue; I forgot to update travis_controller.sh .
>
> On 19/07/2019 10:02, Chesnay Schepler wrote:
> > Ah, I added it to the common options in the travis_manv_watchdog.sh .
> >
> > On 19/07/2019 09:58, Chesnay Schepler wrote:
> >> I did modify the .travis.yml do activate the unsafe-mapr-repo
> >> profile; did I modified the wrong profile?...
> >>
> >>
> >> On 19/07/2019 07:57, Jark Wu wrote:
> >>> It seems that it is introduced by this commit:
> >>>
> https://github.com/apache/flink/commit/5c36c650e6520d92191ce2da33f7dcae774319f6
> >>>
> >>> Hi @Chesnay Schepler  , do we need to add
> >>> "-Punsafe-mapr-repo" to the ".travis.yml"?
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>> On Fri, 19 Jul 2019 at 10:58, JingsongLee
> >>> 
> >>> wrote:
> >>>
> >>>> Hi everyone:
> >>>>
> >>>> flink-mapr-fs failed in travis, and I retried many times, and also
> >>>> failed.
> >>>> Anyone has idea about this?
> >>>>
> >>>> 01:32:54.755 [ERROR] Failed to execute goal on project flink-mapr-fs:
> >>>> Could not resolve dependencies for project
> >>>> org.apache.flink:flink-mapr-fs:jar:1.10-SNAPSHOT: Failed to collect
> >>>> dependencies at com.mapr.hadoop:maprfs:jar:5.2.1-mapr: Failed to read
> >>>> artifact descriptor for com.mapr.hadoop:maprfs:jar:5.2.1-mapr:
> >>>> Could not
> >>>> transfer artifact com.mapr.hadoop:maprfs:pom:5.2.1-mapr from/to
> >>>> mapr-releases (https://repository.mapr.com/maven/):
> >>>> sun.security.validator.ValidatorException: PKIX path building failed:
> >>>> sun.security.provider.certpath.SunCertPathBuilderException: unable
> >>>> to find
> >>>> valid certification path to requested target -> [Help 1]
> >>>>
> >>>> https://api.travis-ci.org/v3/job/560790299/log.txt
> >>>>
> >>>> Best, Jingsong Lee
> >>
> >>
> >>
> >
> >
>
>

[DISCUSS] Setup a bui...@flink.apache.org mailing list for travis builds

2019-07-19 Thread Jark Wu

Hi all,

As far as I know, currently, email notifications of Travis builds for
master branch are sent to the commit author when a build was just broken or
still is broken. And there is no email notifications for CRON builds.

Recently, we are suffering from compile errors for scala-2.12 and java-9
which are only ran in CRON jobs. So I'm figuring out a way to get
notifications of CRON builds (or all builds) to quick fix compile errors
and failed cron tests.

After reaching out to @Chesnay Schepler  (thanks for
the helping), I know that we are using a Slack channel to receive all
failed build notifications. However, IMO, email notification might be a
better way than Slack channel to encourage more people to pay attention on
the builds.

So I'm here to propose to setup a bui...@flink.apache.org mailing list for
receiving build notifications. I also find that Beam has such mailing list
too[1]. After we have such a mailing list, we can integrate it to travis
according to the travis doc[2].

What do you think? Do we need a formal vote for this?

Best and thanks,
Jark

[1]: https://beam.apache.org/community/contact-us/
[2]:
https://docs.travis-ci.com/user/notifications/#configuring-email-notifications

<https://docs.travis-ci.com/user/notifications/#configuring-email-notifications>

<https://docs.travis-ci.com/user/notifications/#configuring-email-notifications>

Re: Re: [ANNOUNCE] Zhijiang Wang has been added as a committer to the Flink project

2019-07-22 Thread Jark Wu

Congratulations Zhijiang!


On Tue, 23 Jul 2019 at 11:30, vino yang  wrote:

> Congratulations Zhijiang!
>
> Haibo Sun  于2019年7月23日周二 上午10:48写道：
>
> > Congrats, Zhejiang!
> >
> >
> > Best,
> > Haibo
> > 在 2019-07-23 10:26:20，"Yun Tang"  写道：
> > >Congratulations Zhijiang, well deserved.
> > >
> > >Best
> > >
> > >From: Yingjie Cao 
> > >Sent: Tuesday, July 23, 2019 10:23
> > >To: dev@flink.apache.org 
> > >Subject: Re: [ANNOUNCE] Zhijiang Wang has been added as a committer to
> > the Flink project
> > >
> > >Congratulations Zhijiang!
> > >
> > >yangtao.yt  于2019年7月23日周二 上午10:17写道：
> > >
> > >> Congrats, Zhejiang!
> > >>
> > >> Best,
> > >> Tao Yang
> > >>
> > >> > 在 2019年7月23日，上午9:46，boshu Zheng  写道：
> > >> >
> > >> > Congratulations Zhijiang
> > >> >
> > >> > 发自我的 iPhone
> > >> >
> > >> >> 在 2019年7月23日，上午12:55，Xuefu Z  写道：
> > >> >>
> > >> >> Congratulations, Zhijiang!
> > >> >>
> > >> >>> On Mon, Jul 22, 2019 at 7:42 AM Bo WANG  >
> > >> wrote:
> > >> >>>
> > >> >>> Congratulations Zhijiang!
> > >> >>>
> > >> >>>
> > >> >>> Best,
> > >> >>>
> > >> >>> Bo WANG
> > >> >>>
> > >> >>>
> > >> >>> On Mon, Jul 22, 2019 at 10:12 PM Robert Metzger <
> > rmetz...@apache.org>
> > >> >>> wrote:
> > >> >>>
> > >>  Hey all,
> > >> 
> > >>  We've added another committer to the Flink project: Zhijiang
> Wang.
> > >> 
> > >>  Congratulations Zhijiang!
> > >> 
> > >>  Best,
> > >>  Robert
> > >>  (on behalf of the Flink PMC)
> > >> 
> > >> >>>
> > >> >>
> > >> >>
> > >> >> --
> > >> >> Xuefu Zhang
> > >> >>
> > >> >> "In Honey We Trust!"
> > >>
> > >>
> >
>

Re: [ANNOUNCE] Kete Young is now part of the Flink PMC

2019-07-23 Thread Jark Wu

Congratulations Kurt! Well deserved.

Cheers,
Jark

On Tue, 23 Jul 2019 at 17:43, LakeShen  wrote:

> Congratulations Kurt!
>
> Congxian Qiu  于2019年7月23日周二 下午5:37写道：
>
> > Congratulations Kurt!
> > Best,
> > Congxian
> >
> >
> > Dian Fu  于2019年7月23日周二 下午5:36写道：
> >
> > > Congrats, Kurt!
> > >
> > > > 在 2019年7月23日，下午5:33，Zili Chen  写道：
> > > >
> > > > Congratulations Kurt!
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > JingsongLee  于2019年7月23日周二
> 下午5:29写道：
> > > >
> > > >> Congratulations Kurt!
> > > >>
> > > >> Best, Jingsong Lee
> > > >>
> > > >>
> > > >> --
> > > >> From:Robert Metzger 
> > > >> Send Time:2019年7月23日(星期二) 17:24
> > > >> To:dev 
> > > >> Subject:[ANNOUNCE] Kete Young is now part of the Flink PMC
> > > >>
> > > >> Hi all,
> > > >>
> > > >> On behalf of the Flink PMC, I'm happy to announce that Kete Young is
> > now
> > > >> part of the Apache Flink Project Management Committee (PMC).
> > > >>
> > > >> Kete has been a committer since February 2017, working a lot on
> Table
> > > API /
> > > >> SQL. He's currently co-managing the 1.9 release! Thanks a lot for
> your
> > > work
> > > >> for Flink!
> > > >>
> > > >> Congratulations & Welcome Kurt!
> > > >>
> > > >> Best,
> > > >> Robert
> > > >>
> > >
> > >
> >
>

Re: [DISCUSS] Setup a bui...@flink.apache.org mailing list for travis builds

2019-07-23 Thread Jark Wu

Thank you all for your positive feedback.

We have three binding +1s, so I think, we can proceed with this.

Hi @Robert Metzger  , could you create a request to
INFRA for the mailing list?
I'm not sure if this needs a PMC permission.

Thanks,
Jark

On Tue, 23 Jul 2019 at 16:42, jincheng sun  wrote:

> +1
>
> Robert Metzger  于2019年7月23日周二 下午4:01写道：
>
> > +1
> >
> > On Mon, Jul 22, 2019 at 10:27 AM Biao Liu  wrote:
> >
> > > +1, make sense to me.
> > > Mailing list seems to be a more "community" way.
> > >
> > > Timo Walther  于2019年7月22日周一 下午4:06写道：
> > >
> > > > +1 sounds good to inform people about instabilities or other issues
> > > >
> > > > Regards,
> > > > Timo
> > > >
> > > >
> > > > Am 22.07.19 um 09:58 schrieb Haibo Sun:
> > > > > +1. Sounds good.Letting the PR creators know the build results of
> the
> > > > master branch can help to determine more quickly whether Travis
> > failures
> > > of
> > > > their PR are an exact failure or because of the instability of test
> > case.
> > > > By the way, if the PR creator can abort their own Travis build, I
> think
> > > it
> > > > can improve the efficient use of Travis resources (of course, this
> > > > discussion does not deal with this issue).
> > > > >
> > > > >
> > > > > Best,
> > > > > Haibo
> > > > > At 2019-07-22 12:36:31, "Yun Tang"  wrote:
> > > > >> +1 sounds good, more people could be encouraged to involve in
> paying
> > > > attention to failure commit.
> > > > >>
> > > > >> Best
> > > > >> Yun Tang
> > > > >> 
> > > > >> From: Becket Qin 
> > > > >> Sent: Monday, July 22, 2019 9:44
> > > > >> To: dev 
> > > > >> Subject: Re: [DISCUSS] Setup a bui...@flink.apache.org mailing
> list
> > > > for travis builds
> > > > >>
> > > > >> +1. Sounds a good idea to me.
> > > > >>
> > > > >> On Sat, Jul 20, 2019 at 7:07 PM Dian Fu 
> > > wrote:
> > > > >>
> > > > >>> Thanks Jark for the proposal, sounds reasonable for me. +1. This
> ML
> > > > could
> > > > >>> be used for all the build notifications including master and CRON
> > > jobs.
> > > > >>>
> > > > >>>> 在 2019年7月20日，下午2:55，Xu Forward  写道：
> > > > >>>>
> > > > >>>> +1 ,Thanks jark for the proposal.
> > > > >>>> Best
> > > > >>>> Forward
> > > > >>>>
> > > > >>>> Jark Wu  于2019年7月20日周六 下午12:14写道：
> > > > >>>>
> > > > >>>>> Hi all,
> > > > >>>>>
> > > > >>>>> As far as I know, currently, email notifications of Travis
> builds
> > > for
> > > > >>>>> master branch are sent to the commit author when a build was
> just
> > > > >>> broken or
> > > > >>>>> still is broken. And there is no email notifications for CRON
> > > builds.
> > > > >>>>>
> > > > >>>>> Recently, we are suffering from compile errors for scala-2.12
> and
> > > > java-9
> > > > >>>>> which are only ran in CRON jobs. So I'm figuring out a way to
> get
> > > > >>>>> notifications of CRON builds (or all builds) to quick fix
> compile
> > > > errors
> > > > >>>>> and failed cron tests.
> > > > >>>>>
> > > > >>>>> After reaching out to @Chesnay Schepler 
> > > (thanks
> > > > >>> for
> > > > >>>>> the helping), I know that we are using a Slack channel to
> receive
> > > all
> > > > >>>>> failed build notifications. However, IMO, email notification
> > might
> > > > be a
> > > > >>>>> better way than Slack channel to encourage more people to pay
> > > > attention
> > > > >>> on
> > > > >>>>> the builds.
> > > > >>>>>
> > > > >>>>> So I'm here to propose to setup a bui...@flink.apache.org
> > mailing
> > > > list
> > > > >>> for
> > > > >>>>> receiving build notifications. I also find that Beam has such
> > > mailing
> > > > >>> list
> > > > >>>>> too[1]. After we have such a mailing list, we can integrate it
> to
> > > > travis
> > > > >>>>> according to the travis doc[2].
> > > > >>>>>
> > > > >>>>> What do you think? Do we need a formal vote for this?
> > > > >>>>>
> > > > >>>>> Best and thanks,
> > > > >>>>> Jark
> > > > >>>>>
> > > > >>>>> [1]: https://beam.apache.org/community/contact-us/
> > > > >>>>> [2]:
> > > > >>>>>
> > > > >>>>>
> > > > >>>
> > > >
> > >
> >
> https://docs.travis-ci.com/user/notifications/#configuring-email-notifications
> > > > >>>>> <
> > > > >>>>>
> > > > >>>
> > > >
> > >
> >
> https://docs.travis-ci.com/user/notifications/#configuring-email-notifications
> > > > >>>>> <
> > > > >>>>>
> > > > >>>
> > > >
> > >
> >
> https://docs.travis-ci.com/user/notifications/#configuring-email-notifications
> > > > >>>
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Setup a bui...@flink.apache.org mailing list for travis builds

2019-07-24 Thread Jark Wu

Thanks Robert for helping out that.

Best,
Jark

On Wed, 24 Jul 2019 at 19:16, Robert Metzger  wrote:

> I've requested the creation of the list, and made Jark, Chesnay and me
> moderators of it.
>
> On Wed, Jul 24, 2019 at 1:12 PM Robert Metzger 
> wrote:
>
> > @Jark: Yes, I will request the creation of a mailing list!
> >
> > On Tue, Jul 23, 2019 at 4:48 PM Hugo Louro  wrote:
> >
> >> +1
> >>
> >> > On Jul 23, 2019, at 6:15 AM, Till Rohrmann 
> >> wrote:
> >> >
> >> > Good idea Jark. +1 for the proposal.
> >> >
> >> > Cheers,
> >> > Till
> >> >
> >> >> On Tue, Jul 23, 2019 at 1:59 PM Hequn Cheng 
> >> wrote:
> >> >>
> >> >> Hi Jark,
> >> >>
> >> >> Good idea. +1!
> >> >>
> >> >>> On Tue, Jul 23, 2019 at 6:23 PM Jark Wu  wrote:
> >> >>>
> >> >>> Thank you all for your positive feedback.
> >> >>>
> >> >>> We have three binding +1s, so I think, we can proceed with this.
> >> >>>
> >> >>> Hi @Robert Metzger  , could you create a
> >> request to
> >> >>> INFRA for the mailing list?
> >> >>> I'm not sure if this needs a PMC permission.
> >> >>>
> >> >>> Thanks,
> >> >>> Jark
> >> >>>
> >> >>> On Tue, 23 Jul 2019 at 16:42, jincheng sun <
> sunjincheng...@gmail.com>
> >> >>> wrote:
> >> >>>
> >> >>>> +1
> >> >>>>
> >> >>>> Robert Metzger  于2019年7月23日周二 下午4:01写道：
> >> >>>>
> >> >>>>> +1
> >> >>>>>
> >> >>>>> On Mon, Jul 22, 2019 at 10:27 AM Biao Liu 
> >> >> wrote:
> >> >>>>>
> >> >>>>>> +1, make sense to me.
> >> >>>>>> Mailing list seems to be a more "community" way.
> >> >>>>>>
> >> >>>>>> Timo Walther  于2019年7月22日周一 下午4:06写道：
> >> >>>>>>
> >> >>>>>>> +1 sounds good to inform people about instabilities or other
> >> >> issues
> >> >>>>>>>
> >> >>>>>>> Regards,
> >> >>>>>>> Timo
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>>> Am 22.07.19 um 09:58 schrieb Haibo Sun:
> >> >>>>>>>> +1. Sounds good.Letting the PR creators know the build results
> >> >> of
> >> >>>> the
> >> >>>>>>> master branch can help to determine more quickly whether Travis
> >> >>>>> failures
> >> >>>>>> of
> >> >>>>>>> their PR are an exact failure or because of the instability of
> >> >> test
> >> >>>>> case.
> >> >>>>>>> By the way, if the PR creator can abort their own Travis build,
> I
> >> >>>> think
> >> >>>>>> it
> >> >>>>>>> can improve the efficient use of Travis resources (of course,
> >> >> this
> >> >>>>>>> discussion does not deal with this issue).
> >> >>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>> Best,
> >> >>>>>>>> Haibo
> >> >>>>>>>> At 2019-07-22 12:36:31, "Yun Tang"  wrote:
> >> >>>>>>>>> +1 sounds good, more people could be encouraged to involve in
> >> >>>> paying
> >> >>>>>>> attention to failure commit.
> >> >>>>>>>>>
> >> >>>>>>>>> Best
> >> >>>>>>>>> Yun Tang
> >> >>>>>>>>> 
> >> >>>>>>>>> From: Becket Qin 
> >> >>>>>>>>> Sent: Monday, July 22, 2019 9:44
> >> >>>>>>>>> To: dev 
> >> >>>>>>>>> Subject: Re: [DISCUSS] Setup a bui...@flink.apache.org
> >> >>

Re: [DISCUSS] Support computed column for Flink SQL

2019-07-29 Thread Jark Wu

Hi Danny,

Thanks for bringing this. I agree with Timo. We can have a thorough
discussion once 1.9 release is published.
And the computed column support & implementation might be good to be
discussed with "table source and sink concept"[1].
Whatever, I left some my initial thoughts in the doc.

Best,
Jark

[1]:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Ground-Source-and-Sink-Concepts-in-Flink-SQL-tt29126.html

On Mon, 29 Jul 2019 at 16:52, Timo Walther  wrote:

> Hi Danny,
>
> thanks for working on this issue and writing down the concept
> suggestion. We are currently still in the progress of finalizing the 1.9
> release. Having proper streaming DDL support will definitely be part of
> Flink 1.10. I will take a look at the whole DDL efforts very soon once
> the 1.9 release is out.
>
> Thanks,
> Timo
>
> Am 23.07.19 um 11:00 schrieb Danny Chan:
> > In umbrella task FLINK-10232[1] we have introduced CREATE TABLE grammar
> in our new module flink-sql-parser. And we proposed to use computed column
> to describe the time attribute of process time in the design doc FLINK SQL
> DDL[2], so user may create a table with process time attribute as following:
> >
> > create table T1(
> > a int,
> > b bigint,
> > c varchar,
> > d as PROC_TIME,
> > ) with (
> > k1 = v1,
> > k2 = v2
> > );
> >
> > The column d would be a process time attribute for table T1. There are
> also many other use cases for computed columns[3].
> >
> > It may not be a big change here, but may touch the TableSchema, which is
> a public API for user now, so i'm very appreciate for your
> suggestions(especially its relationship with the TableSchema).
> >
> > I write a simple design doc here[3].
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-10232
> > [2]
> https://docs.google.com/document/d/1OmVyuPk9ibGUC-CnPHbXvCg_fdG1TeC3lXSnqcUEYmM
> > [3]
> https://docs.google.com/document/d/110TseRtTCphxETPY7uhiHpu-dph3NEesh3mYKtJ7QOY/edit?usp=sharing
> >
> > Best,
> > Danny Chan
> >
>
>

Re: [DISCUSS] Setup a bui...@flink.apache.org mailing list for travis builds

2019-07-30 Thread Jark Wu

Hi all,

Progress updates:
1. the bui...@flink.apache.org can be subscribed now (thanks @Robert), you
can send an email to builds-subscr...@flink.apache.org to subscribe.
2. We have a pull request [1] to send only apache/flink builds
notifications and it works well.
3. However, all the notifications are rejected by the builds mailing list
(the MODERATE mails).
I added & checked bui...@travis-ci.org to the subscriber/allow list,
but still doesn't work. It might be recognized as spam by the mailing list.
We are still trying to figure it out, and will update here if we have
some progress.


Thanks,
Jark



[1]: https://github.com/apache/flink/pull/9230


On Thu, 25 Jul 2019 at 22:59, Robert Metzger  wrote:

> The mailing list has been created, you can now subscribe to it.
>
> On Wed, Jul 24, 2019 at 1:43 PM Jark Wu  wrote:
>
> > Thanks Robert for helping out that.
> >
> > Best,
> > Jark
> >
> > On Wed, 24 Jul 2019 at 19:16, Robert Metzger 
> wrote:
> >
> > > I've requested the creation of the list, and made Jark, Chesnay and me
> > > moderators of it.
> > >
> > > On Wed, Jul 24, 2019 at 1:12 PM Robert Metzger 
> > > wrote:
> > >
> > > > @Jark: Yes, I will request the creation of a mailing list!
> > > >
> > > > On Tue, Jul 23, 2019 at 4:48 PM Hugo Louro 
> wrote:
> > > >
> > > >> +1
> > > >>
> > > >> > On Jul 23, 2019, at 6:15 AM, Till Rohrmann 
> > > >> wrote:
> > > >> >
> > > >> > Good idea Jark. +1 for the proposal.
> > > >> >
> > > >> > Cheers,
> > > >> > Till
> > > >> >
> > > >> >> On Tue, Jul 23, 2019 at 1:59 PM Hequn Cheng <
> chenghe...@gmail.com>
> > > >> wrote:
> > > >> >>
> > > >> >> Hi Jark,
> > > >> >>
> > > >> >> Good idea. +1!
> > > >> >>
> > > >> >>> On Tue, Jul 23, 2019 at 6:23 PM Jark Wu 
> wrote:
> > > >> >>>
> > > >> >>> Thank you all for your positive feedback.
> > > >> >>>
> > > >> >>> We have three binding +1s, so I think, we can proceed with this.
> > > >> >>>
> > > >> >>> Hi @Robert Metzger  , could you create a
> > > >> request to
> > > >> >>> INFRA for the mailing list?
> > > >> >>> I'm not sure if this needs a PMC permission.
> > > >> >>>
> > > >> >>> Thanks,
> > > >> >>> Jark
> > > >> >>>
> > > >> >>> On Tue, 23 Jul 2019 at 16:42, jincheng sun <
> > > sunjincheng...@gmail.com>
> > > >> >>> wrote:
> > > >> >>>
> > > >> >>>> +1
> > > >> >>>>
> > > >> >>>> Robert Metzger  于2019年7月23日周二 下午4:01写道：
> > > >> >>>>
> > > >> >>>>> +1
> > > >> >>>>>
> > > >> >>>>> On Mon, Jul 22, 2019 at 10:27 AM Biao Liu  >
> > > >> >> wrote:
> > > >> >>>>>
> > > >> >>>>>> +1, make sense to me.
> > > >> >>>>>> Mailing list seems to be a more "community" way.
> > > >> >>>>>>
> > > >> >>>>>> Timo Walther  于2019年7月22日周一 下午4:06写道：
> > > >> >>>>>>
> > > >> >>>>>>> +1 sounds good to inform people about instabilities or other
> > > >> >> issues
> > > >> >>>>>>>
> > > >> >>>>>>> Regards,
> > > >> >>>>>>> Timo
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>>> Am 22.07.19 um 09:58 schrieb Haibo Sun:
> > > >> >>>>>>>> +1. Sounds good.Letting the PR creators know the build
> > results
> > > >> >> of
> > > >> >>>> the
> > > >> >>>>>>> master branch can help to determine more quickly whether
> > Travis
> > > >> >>>>> failures
> > > >> >&g

Re: NoSuchMethodError: org.apache.calcite.tools.FrameworkConfig.getTraitDefs()

2019-07-30 Thread Jark Wu

Hi Lakeshen,

Thanks for trying out blink planner.
First question, are you using blink-1.5.1 or flink-1.9-table-planner-blink
?
We suggest to use the latter one because we don't maintain blink-1.5.1, you
can try flink 1.9 instead.


Best,
Jark


On Tue, 30 Jul 2019 at 17:02, LakeShen  wrote:

> Hi all,when I use blink flink-sql-parser module,the maven dependency like
> this:
>
> 
> com.alibaba.blink
> flink-sql-parser
> 1.5.1
> 
>
> I also import the flink 1.9 blink-table-planner module , I
> use FlinkPlannerImpl to parse the sql to get the List. But
> when I run the program , it throws the exception like this:
>
>
>
> *Exception in thread "main" java.lang.NoSuchMethodError:
>
> org.apache.calcite.tools.FrameworkConfig.getTraitDefs()Lorg/apache/flink/shaded/calcite/com/google/common/collect/ImmutableList;
> at
>
> org.apache.flink.sql.parser.plan.FlinkPlannerImpl.(FlinkPlannerImpl.java:93)
> at
>
> com.youzan.bigdata.allsqldemo.utils.FlinkSqlUtil.getSqlNodeInfos(FlinkSqlUtil.java:33)
> at
>
> com.youzan.bigdata.allsqldemo.KafkaSrcKafkaSinkSqlDemo.main(KafkaSrcKafkaSinkSqlDemo.java:56)*
>
> * How can I solve this problem? Thanks to your reply.*
>

Re: [VOTE] Publish the PyFlink into PyPI

2019-08-01 Thread Jark Wu

+1 (non-binding)

Cheers,
Jark

On Thu, 1 Aug 2019 at 17:45, Yu Li  wrote:

> +1 (non-binding)
>
> Thanks for driving this!
>
> Best Regards,
> Yu
>
>
> On Thu, 1 Aug 2019 at 11:41, Till Rohrmann  wrote:
>
> > +1
> >
> > Cheers,
> > Till
> >
> > On Thu, Aug 1, 2019 at 10:39 AM vino yang  wrote:
> >
> > > +1 （non-binding）
> > >
> > > Jeff Zhang  于2019年8月1日周四 下午4:33写道：
> > >
> > > > +1 （non-binding）
> > > >
> > > > Stephan Ewen  于2019年8月1日周四 下午4:29写道：
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > On Thu, Aug 1, 2019 at 9:52 AM Dian Fu 
> > wrote:
> > > > >
> > > > > > Hi Jincheng,
> > > > > >
> > > > > > Thanks a lot for driving this.
> > > > > > +1 (non-binding).
> > > > > >
> > > > > > Regards,
> > > > > > Dian
> > > > > >
> > > > > > > 在 2019年8月1日，下午3:24，jincheng sun  写道：
> > > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > Publish the PyFlink into PyPI is very important for our user,
> > > Please
> > > > > vote
> > > > > > > on the following proposal:
> > > > > > >
> > > > > > > 1. Create PyPI Project for Apache Flink Python API, named:
> > > > > "apache-flink"
> > > > > > > 2. Release one binary with the default Scala version same with
> > > flink
> > > > > > > default config.
> > > > > > > 3. Create an account, named "pyflink" as owner(only PMC can
> > manage
> > > > it).
> > > > > > PMC
> > > > > > > can add account for the Release Manager, but Release Manager
> can
> > > not
> > > > > > delete
> > > > > > > the release.
> > > > > > >
> > > > > > > [ ] +1, Approve the proposal.
> > > > > > > [ ] -1, Disapprove the proposal, because ...
> > > > > > >
> > > > > > > The vote will be open for at least 72 hours. It is adopted by a
> > > > simple
> > > > > > > majority with a minimum of three positive votes.
> > > > > > >
> > > > > > > See discussion threads for more details [1].
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jincheng
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Publish-the-PyFlink-into-PyPI-td30095.html
> > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Best Regards
> > > >
> > > > Jeff Zhang
> > > >
> > >
> >
>

Re: [RESULT][VOTE] Migrate to sponsored Travis account

2019-08-01 Thread Jark Wu

Hi Chesnay,

Can we assign Flink Committers the permission of flink-ci/flink repo?
Several times, when I pushed some new commits, the old build jobs are still
in pending and not canceled.
Before we fix that, we can manually cancel some old jobs to save build
resource.

Best,
Jark


On Wed, 10 Jul 2019 at 16:17, Chesnay Schepler  wrote:

> Your best bet would be to check the first commit in the PR and check the
> parent commit.
>
> To re-run things, you will have to rebase the PR on the latest master.
>
> On 10/07/2019 03:32, Kurt Young wrote:
> > Thanks for all your efforts Chesnay, it indeed improves a lot for our
> > develop experience. BTW, do you know how to find the master branch
> > information which the CI runs with?
> >
> > For example, like this one:
> > https://travis-ci.com/flink-ci/flink/jobs/214542568
> > It shows pass with the commits, which rebased on the master when the CI
> > is triggered. But it's both possible that the master branch CI runs on is
> > the
> > same or different with current master. If it's the same, I can simply
> rely
> > on the
> > passed information to push commits, but if it's not, I think i should
> find
> > another
> > way to re-trigger tests based on the newest master.
> >
> > Do you know where can I get such information?
> >
> > Best,
> > Kurt
> >
> >
> > On Tue, Jul 9, 2019 at 3:27 AM Chesnay Schepler 
> wrote:
> >
> >> The kinks have been worked out; the bot is running again and pr builds
> >> are yet again no longer running on ASF resources.
> >>
> >> PRs are mirrored to: https://github.com/flink-ci/flink
> >> Bot source: https://github.com/flink-ci/ci-bot
> >>
> >> On 08/07/2019 17:14, Chesnay Schepler wrote:
> >>> I have temporarily re-enabled running PR builds on the ASF account;
> >>> migrating to the Travis subscription caused some issues in the bot
> >>> that I have to fix first.
> >>>
> >>> On 07/07/2019 23:01, Chesnay Schepler wrote:
> >>>> The vote has passed unanimously in favor of migrating to a separate
> >>>> Travis account.
> >>>>
> >>>> I will now set things up such that no PullRequest is no longer run on
> >>>> the ASF servers.
> >>>> This is a major setup in reducing our usage of ASF resources.
> >>>> For the time being we'll use free Travis plan for flink-ci (i.e. 5
> >>>> workers, which is the same the ASF gives us). Over the course of the
> >>>> next week we'll setup the Ververica subscription to increase this
> limit.
> >>>>
> >>>>  From now now, a bot will mirror all new and updated PullRequests to a
> >>>> mirror repository (https://github.com/flink-ci/flink-ci) and write an
> >>>> update into the PR once the build is complete.
> >>>> I have ran the bots for the past 3 days in parallel to our existing
> >>>> Travis and it was working without major issues.
> >>>>
> >>>> The biggest change that contributors will see is that there's no
> >>>> longer a icon next to each commit. We may revisit this in the future.
> >>>>
> >>>> I'll setup a repo with the source of the bot later.
> >>>>
> >>>> On 04/07/2019 10:46, Chesnay Schepler wrote:
> >>>>> I've raised a JIRA
> >>>>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> >>>>> inquire whether it would be possible to switch to a different Travis
> >>>>> account, and if so what steps would need to be taken.
> >>>>> We need a proper confirmation from INFRA since we are not in full
> >>>>> control of the flink repository (for example, we cannot access the
> >>>>> settings page).
> >>>>>
> >>>>> If this is indeed possible, Ververica is willing sponsor a Travis
> >>>>> account for the Flink project.
> >>>>> This would provide us with more than enough resources than we need.
> >>>>>
> >>>>> Since this makes the project more reliant on resources provided by
> >>>>> external companies I would like to vote on this.
> >>>>>
> >>>>> Please vote on this proposal, as follows:
> >>>>> [ ] +1, Approve the migration to a Ververica-sponsored Travis
> >>>>> account, provided that INFRA approves
> >>>>> [ ]

Re: [DISCUSS] CPU flame graph for a job vertex in web UI.

2019-08-01 Thread Jark Wu

Hi David,

The demo looks charming! I think it will definitely help a lot when
performance tuning.
A big +1 for this.

I cc-ed Yadong who's one of the main contributors of the new Web UI.
Maybe he can give some help on the front end.

Regards,
Jark

On Fri, 2 Aug 2019 at 04:26, David Morávek  wrote:

> Hi Till, thanks for the feedback! These endpoints are only called when the
> vertex is selected in the UI, so there should be any heavy RPC load. For
> back-pressure, we only sample top 3 calls of the stack (depth = 3). For the
> flame-graph, we want to sample the whole stack trace and we need different
> sampling rate (longer period, more samples). Those are the main reasons to
> split these in two "trackers", but I may be missing something.
>
> I've prepared a little demo, so others can have a better idea of what I
> have in mind.
>
> https://youtu.be/GUNDehj9z9o
>
> Please note that this is a proof of concept and I'm not frontend person, so
> it may look little clumsy :)
>
> D.
>
> On Thu, Aug 1, 2019 at 11:40 AM Till Rohrmann 
> wrote:
>
> > Hi David,
> >
> > thanks for starting this discussion. I like the idea of improving
> insights
> > into Flink's execution and I believe that a flame graph could be helpful.
> >
> > I quickly glanced over your changes and I think they go in a good
> > direction. One idea could be to share the `StackTraceSample` produced by
> > the `StackTraceSampleCoordinator` between the different
> > `StackTraceOperatorTracker` so that we don't send multiple requests for
> the
> > same operators. That way we would decrease a bit the RPC load.
> >
> > Apart from that, I think the next steps would be to find a committer who
> > could shepherd this effort and help you with merging it.
> >
> > Cheers,
> > Till
> >
> > On Wed, Jul 31, 2019 at 7:05 PM David Morávek  wrote:
> >
> > > Hello,
> > >
> > > While looking into Flink internals, I've noticed that there is already
> a
> > > mechanism for stack-trace sampling of a particular job vertex.
> > >
> > > I think it may be really useful to allow user to easily render a cpu
> > > flamegraph <http://www.brendangregg.com/flamegraphs.html> in a new UI
> > for
> > > a
> > > selected vertex (new tab next to back pressure) of a running job. Back
> > > pressure tab already provides a good idea of which vertex causes
> trouble,
> > > but it's hard to say what's actually going on.
> > >
> > > I've tried to implement a basic REST endpoint
> > > <
> > >
> >
> https://github.com/dmvk/flink/commit/716231822d2fe99004895cdd0a365560479445b9
> > > >,
> > > that prepares data for the flame graph rendering and it seems to be
> > > providing good insight.
> > >
> > > It should be straightforward to render data from the endpoint in new UI
> > > using existing <https://github.com/spiermar/d3-flame-graph> javascript
> > > libraries.
> > >
> > > WDYT? Is this worth pushing forward?
> > >
> > > D.
> > >
> >
>

Re: [DISCUSS][CODE STYLE] Usage of Java Optional

2019-08-02 Thread Jark Wu

Hi Andrey,

I have some concern on point (3) "even class fields as e.g. optional config
options with implicit default values".

Regarding to the Oracle's guide (4) "Optional should not be used for class
fields".
And IntelliJ IDEA also report warnings if a class field is Optional,
because Optional is not serializable.


Do we allow Optional as class field only if the class is not serializable
or forbid this totally?

Thanks,
Jark

On Fri, 2 Aug 2019 at 16:30, Biao Liu  wrote:

> Hi Andrey,
>
> Thanks for working on this.
>
> +1 it's clear and acceptable for me.
>
> To Qi,
>
> IMO the most performance critical codes are "per record" code path. We
> should definitely avoid Optional there. For your concern, it's "per buffer"
> code path which seems to be acceptable with Optional.
>
> Just one more question, is there any other code paths which are also
> critical? I think we'd better note that clearly.
>
> Thanks,
> Biao /'bɪ.aʊ/
>
>
>
> On Fri, Aug 2, 2019 at 11:08 AM qi luo  wrote:
>
> > Agree that using Optional will improve code robustness. However we’re
> > hesitating to use Optional in data intensive operations.
> >
> > For example, SingleInputGate is already creating Optional for every
> > BufferOrEvent in getNextBufferOrEvent(). How much performance gain would
> we
> > get if it’s replaced by null check?
> >
> > Regards,
> > Qi
> >
> > > On Aug 1, 2019, at 11:00 PM, Andrey Zagrebin 
> > wrote:
> > >
> > > Hi all,
> > >
> > > This is the next follow up discussion about suggestions for the recent
> > > thread about code style guide in Flink [1].
> > >
> > > In general, one could argue that any variable, which is nullable, can
> be
> > > replaced by wrapping it with Optional to explicitly show that it can be
> > > null. Examples are:
> > >
> > >   - returned values to force user to check not null
> > >   - optional function arguments, e.g. with implicit default values
> > >   - even class fields as e.g. optional config options with implicit
> > >   default values
> > >
> > >
> > > At the same time, we also have @Nullable annotation to express this
> > > intention.
> > >
> > > Also, when the class Optional was introduced, Oracle posted a guideline
> > > about its usage [2]. Basically, it suggests to use it mostly in APIs
> for
> > > returned values to inform and force users to check the returned value
> > > instead of returning null and avoid NullPointerException.
> > >
> > > Wrapping with Optional also comes with the performance overhead.
> > >
> > > Following the Oracle's guide in general, the suggestion is:
> > >
> > >   - Avoid using Optional in any performance critical code
> > >   - Use Optional only to return nullable values in the API/public
> methods
> > >   unless it is performance critical then rather use @Nullable
> > >   - Passing an Optional argument to a method can be allowed if it is
> > >   within a private helper method and simplifies the code, example is in
> > [3]
> > >   - Optional should not be used for class fields
> > >
> > >
> > > Please, feel free to share you thoughts.
> > >
> > > Best,
> > > Andrey
> > >
> > > [1]
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E
> > > [2]
> > >
> >
> https://www.oracle.com/technetwork/articles/java/java8-optional-2175753.html
> > > [3]
> > >
> >
> https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroFactory.java#L95
> >
> >
>

Re: [DISCUSS][CODE STYLE] Usage of Java Optional

2019-08-02 Thread Jark Wu

Hi Zili,

Yes. I agree to use @Nullable/@Nonnull/SerializableOptional as the class
field instead of Optional.



On Fri, 2 Aug 2019 at 17:00, Zili Chen  wrote:

> Hi Jark,
>
> Follow your opinion, for class field, we can make
> use of @Nullable/@Nonnull annotation or Flink's
> SerializableOptional. It would be sufficient.
>
> Best,
> tison.
>
>
> Jark Wu  于2019年8月2日周五 下午4:57写道：
>
> > Hi Andrey,
> >
> > I have some concern on point (3) "even class fields as e.g. optional
> config
> > options with implicit default values".
> >
> > Regarding to the Oracle's guide (4) "Optional should not be used for
> class
> > fields".
> > And IntelliJ IDEA also report warnings if a class field is Optional,
> > because Optional is not serializable.
> >
> >
> > Do we allow Optional as class field only if the class is not serializable
> > or forbid this totally?
> >
> > Thanks,
> > Jark
> >
> > On Fri, 2 Aug 2019 at 16:30, Biao Liu  wrote:
> >
> > > Hi Andrey,
> > >
> > > Thanks for working on this.
> > >
> > > +1 it's clear and acceptable for me.
> > >
> > > To Qi,
> > >
> > > IMO the most performance critical codes are "per record" code path. We
> > > should definitely avoid Optional there. For your concern, it's "per
> > buffer"
> > > code path which seems to be acceptable with Optional.
> > >
> > > Just one more question, is there any other code paths which are also
> > > critical? I think we'd better note that clearly.
> > >
> > > Thanks,
> > > Biao /'bɪ.aʊ/
> > >
> > >
> > >
> > > On Fri, Aug 2, 2019 at 11:08 AM qi luo  wrote:
> > >
> > > > Agree that using Optional will improve code robustness. However we’re
> > > > hesitating to use Optional in data intensive operations.
> > > >
> > > > For example, SingleInputGate is already creating Optional for every
> > > > BufferOrEvent in getNextBufferOrEvent(). How much performance gain
> > would
> > > we
> > > > get if it’s replaced by null check?
> > > >
> > > > Regards,
> > > > Qi
> > > >
> > > > > On Aug 1, 2019, at 11:00 PM, Andrey Zagrebin  >
> > > > wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > This is the next follow up discussion about suggestions for the
> > recent
> > > > > thread about code style guide in Flink [1].
> > > > >
> > > > > In general, one could argue that any variable, which is nullable,
> can
> > > be
> > > > > replaced by wrapping it with Optional to explicitly show that it
> can
> > be
> > > > > null. Examples are:
> > > > >
> > > > >   - returned values to force user to check not null
> > > > >   - optional function arguments, e.g. with implicit default values
> > > > >   - even class fields as e.g. optional config options with implicit
> > > > >   default values
> > > > >
> > > > >
> > > > > At the same time, we also have @Nullable annotation to express this
> > > > > intention.
> > > > >
> > > > > Also, when the class Optional was introduced, Oracle posted a
> > guideline
> > > > > about its usage [2]. Basically, it suggests to use it mostly in
> APIs
> > > for
> > > > > returned values to inform and force users to check the returned
> value
> > > > > instead of returning null and avoid NullPointerException.
> > > > >
> > > > > Wrapping with Optional also comes with the performance overhead.
> > > > >
> > > > > Following the Oracle's guide in general, the suggestion is:
> > > > >
> > > > >   - Avoid using Optional in any performance critical code
> > > > >   - Use Optional only to return nullable values in the API/public
> > > methods
> > > > >   unless it is performance critical then rather use @Nullable
> > > > >   - Passing an Optional argument to a method can be allowed if it
> is
> > > > >   within a private helper method and simplifies the code, example
> is
> > in
> > > > [3]
> > > > >   - Optional should not be used for class fields
> > > > >
> > > > >
> > > > > Please, feel free to share you thoughts.
> > > > >
> > > > > Best,
> > > > > Andrey
> > > > >
> > > > > [1]
> > > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/flink-dev/201906.mbox/%3ced91df4b-7cab-4547-a430-85bc710fd...@apache.org%3E
> > > > > [2]
> > > > >
> > > >
> > >
> >
> https://www.oracle.com/technetwork/articles/java/java8-optional-2175753.html
> > > > > [3]
> > > > >
> > > >
> > >
> >
> https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroFactory.java#L95
> > > >
> > > >
> > >
> >
>

Re: [RESULT][VOTE] Migrate to sponsored Travis account

2019-08-02 Thread Jark Wu

Wow. That's great! Thanks Chesnay.

On Fri, 2 Aug 2019 at 17:50, Chesnay Schepler  wrote:

> I'm currently modifying the cibot to do this automatically; should be
> finished until Monday.
>
> On 02/08/2019 07:41, Jark Wu wrote:
> > Hi Chesnay,
> >
> > Can we assign Flink Committers the permission of flink-ci/flink repo?
> > Several times, when I pushed some new commits, the old build jobs are
> still
> > in pending and not canceled.
> > Before we fix that, we can manually cancel some old jobs to save build
> > resource.
> >
> > Best,
> > Jark
> >
> >
> > On Wed, 10 Jul 2019 at 16:17, Chesnay Schepler 
> wrote:
> >
> >> Your best bet would be to check the first commit in the PR and check the
> >> parent commit.
> >>
> >> To re-run things, you will have to rebase the PR on the latest master.
> >>
> >> On 10/07/2019 03:32, Kurt Young wrote:
> >>> Thanks for all your efforts Chesnay, it indeed improves a lot for our
> >>> develop experience. BTW, do you know how to find the master branch
> >>> information which the CI runs with?
> >>>
> >>> For example, like this one:
> >>> https://travis-ci.com/flink-ci/flink/jobs/214542568
> >>> It shows pass with the commits, which rebased on the master when the CI
> >>> is triggered. But it's both possible that the master branch CI runs on
> is
> >>> the
> >>> same or different with current master. If it's the same, I can simply
> >> rely
> >>> on the
> >>> passed information to push commits, but if it's not, I think i should
> >> find
> >>> another
> >>> way to re-trigger tests based on the newest master.
> >>>
> >>> Do you know where can I get such information?
> >>>
> >>> Best,
> >>> Kurt
> >>>
> >>>
> >>> On Tue, Jul 9, 2019 at 3:27 AM Chesnay Schepler 
> >> wrote:
> >>>> The kinks have been worked out; the bot is running again and pr builds
> >>>> are yet again no longer running on ASF resources.
> >>>>
> >>>> PRs are mirrored to: https://github.com/flink-ci/flink
> >>>> Bot source: https://github.com/flink-ci/ci-bot
> >>>>
> >>>> On 08/07/2019 17:14, Chesnay Schepler wrote:
> >>>>> I have temporarily re-enabled running PR builds on the ASF account;
> >>>>> migrating to the Travis subscription caused some issues in the bot
> >>>>> that I have to fix first.
> >>>>>
> >>>>> On 07/07/2019 23:01, Chesnay Schepler wrote:
> >>>>>> The vote has passed unanimously in favor of migrating to a separate
> >>>>>> Travis account.
> >>>>>>
> >>>>>> I will now set things up such that no PullRequest is no longer run
> on
> >>>>>> the ASF servers.
> >>>>>> This is a major setup in reducing our usage of ASF resources.
> >>>>>> For the time being we'll use free Travis plan for flink-ci (i.e. 5
> >>>>>> workers, which is the same the ASF gives us). Over the course of the
> >>>>>> next week we'll setup the Ververica subscription to increase this
> >> limit.
> >>>>>>   From now now, a bot will mirror all new and updated PullRequests
> to a
> >>>>>> mirror repository (https://github.com/flink-ci/flink-ci) and write
> an
> >>>>>> update into the PR once the build is complete.
> >>>>>> I have ran the bots for the past 3 days in parallel to our existing
> >>>>>> Travis and it was working without major issues.
> >>>>>>
> >>>>>> The biggest change that contributors will see is that there's no
> >>>>>> longer a icon next to each commit. We may revisit this in the
> future.
> >>>>>>
> >>>>>> I'll setup a repo with the source of the bot later.
> >>>>>>
> >>>>>> On 04/07/2019 10:46, Chesnay Schepler wrote:
> >>>>>>> I've raised a JIRA
> >>>>>>> <https://issues.apache.org/jira/browse/INFRA-18703>with INFRA to
> >>>>>>> inquire whether it would be possible to switch to a different
> Travis
> >>>>>>> account, and if so what steps would need to be taken.
> >&g

Re: [ANNOUNCE] Hequn becomes a Flink committer

2019-08-07 Thread Jark Wu

Congratulations Hequn! It's great to have you in the community!



On Wed, 7 Aug 2019 at 21:00, Fabian Hueske  wrote:

> Congratulations Hequn!
>
> Am Mi., 7. Aug. 2019 um 14:50 Uhr schrieb Robert Metzger <
> rmetz...@apache.org>:
>
>> Congratulations!
>>
>> On Wed, Aug 7, 2019 at 1:09 PM highfei2...@126.com 
>> wrote:
>>
>> > Congrats Hequn!
>> >
>> > Best,
>> > Jeff Yang
>> >
>> >
>> >  Original Message 
>> > Subject: Re: [ANNOUNCE] Hequn becomes a Flink committer
>> > From: Piotr Nowojski
>> > To: JingsongLee
>> > CC: Biao Liu ,Zhu Zhu ,Zili Chen ,Jeff Zhang ,Paul Lam ,jincheng sun
>> ,dev ,user
>> >
>> >
>> > Congratulations :)
>> >
>> > On 7 Aug 2019, at 12:09, JingsongLee  wrote:
>> >
>> > Congrats Hequn!
>> >
>> > Best,
>> > Jingsong Lee
>> >
>> > --
>> > From:Biao Liu 
>> > Send Time:2019年8月7日(星期三) 12:05
>> > To:Zhu Zhu 
>> > Cc:Zili Chen ; Jeff Zhang ;
>> Paul
>> > Lam ; jincheng sun ;
>> dev
>> > ; user 
>> > Subject:Re: [ANNOUNCE] Hequn becomes a Flink committer
>> >
>> > Congrats Hequn!
>> >
>> > Thanks,
>> > Biao /'bɪ.aʊ/
>> >
>> >
>> >
>> > On Wed, Aug 7, 2019 at 6:00 PM Zhu Zhu  wrote:
>> > Congratulations to Hequn!
>> >
>> > Thanks,
>> > Zhu Zhu
>> >
>> > Zili Chen  于2019年8月7日周三 下午5:16写道：
>> > Congrats Hequn!
>> >
>> > Best,
>> > tison.
>> >
>> >
>> > Jeff Zhang  于2019年8月7日周三 下午5:14写道：
>> > Congrats Hequn!
>> >
>> > Paul Lam  于2019年8月7日周三 下午5:08写道：
>> > Congrats Hequn! Well deserved!
>> >
>> > Best,
>> > Paul Lam
>> >
>> > 在 2019年8月7日，16:28，jincheng sun  写道：
>> >
>> > Hi everyone,
>> >
>> > I'm very happy to announce that Hequn accepted the offer of the Flink
>> PMC
>> > to become a committer of the Flink project.
>> >
>> > Hequn has been contributing to Flink for many years, mainly working on
>> > SQL/Table API features. He's also frequently helping out on the user
>> > mailing lists and helping check/vote the release.
>> >
>> > Congratulations Hequn!
>> >
>> > Best, Jincheng
>> > (on behalf of the Flink PMC)
>> >
>> >
>> >
>> > --
>> > Best Regards
>> >
>> > Jeff Zhang
>> >
>> >
>> >
>>
>

Re: [DISCUSS] Repository split

2019-08-08 Thread Jark Wu

Hi,

First of all, I agree with Dawid and David's point.

I will share some experience on the repository split. We have been through
it for Alibaba Blink, which is the most worthwhile project to learn from I
think.
We split Blink project into "blink-connectors" and "blink", but we didn't
get much benefit for better development process. In the contrary, it slow
down the development sometimes.
We have suffered from the following issues after split as far as I can see:

1. Unstable build and test:
The interface or behavior changes in the underlying (e.g. core, table) will
lead to build fail and tests fail in the connectors repo. AFAIK, table api
are still under heavy evolution.
This will make connectors repo more unstable and makes us busy to fix the
build problems and tests problems **after-commit**.
First, it's not easy to locate which commit of main repo lead to the
connectors repo fail (we have over 70+ commits every day in flink master
now and it is growing).
Second, when 2 or 3 build/test problems happened at one time, it's hard to
fix the problem because we can't make the build/test pass in separate
hotfix pull requests.

2. Debug difficulty:
As modules are separate in different repositories, if we want to debug a
Kafka IT case,
we may need to debug some code in flink runtime or verify whether the
runtime code change
can fix the Kafka case. However, it will be more complex because they are
not in one project.

IMO, this actually slows down the development process.

--

In my understanding, the issues we want to solve with the split include:
1) long build/testing time
2) unstable tests
3) increasing number of PRs

Ad. 1 I think we have several ways to reduce the build/testing time. As
Dawid said, we can trigger corresponding CI in a single repository (without
to run all the tests).
An easy way might be to analyse the pom.xml that which modules depends on
the changed module. And one thing we can do right now is skipping all the
tests for documentation changes.

Ad. 2 I can't see how unstable connectors tests can be fixed more quickly
after moved to a separate repositories. As far as I can tell, this problem
might be more significant.

Ad. 3 I also doubt how repository split could help with this. I think this
will give the sub-repositories less exposure and bahir-flink[1] is an
example (only 3 commits in the last 2 months).

At the end, from my point of view,
  1) if we want to reduce build/testing time, we can start a new thread to
collect ideas from community. We can try some approaches to see if they can
solve most of the problems.
  2) if we want to split repository, we need to be cautious enough to the
potential development slow down we might meet.

Regards,
Jark

[1]: https://github.com/apache/bahir-flink/graphs/commit-activity

On Fri, 9 Aug 2019 at 00:26, Till Rohrmann  wrote:

> I pretty much agree with your points Dav/wid. Some problems which we want
> to solve with a respository split are clearly caused by the existing build
> system (no incremental builds, not enough flexibility to only build a
> subset of modules). Given that a repository split would be a major
> endeavour with a lot of uncertainties, changing Flink's build system might
> actually be simpler.
>
> In the past I tried to build Flink with Gradle because it better supports
> incremental builds. Unfortunately, I never got it really off the grounds
> because of too little time. Maybe it could be an option to investigate
> other build systems like Gradle or Bazel and whether they could solve the
> pain points around build time allowing us to keep a single repository.
>
> I second Piotr's concerns that we would actually lose test coverage with
> splitting the repository. Just with the 1.9 release we found a problem in
> the CheckpointFailureManager because of failing Kafka tests. It might have
> taken us more time to figure this problem out if the test were failing in a
> separate repository.
>
> Cheers,
> Till
>
> On Thu, Aug 8, 2019 at 5:47 PM Piotr Nowojski  wrote:
>
> > Hey,
> >
> > I retract my +1 (at least temporarily, until we discuss about alternative
> > solutions).
> >
> > >>  I would like to also raise an additional issue: currently quite some
> > bugs (like release blockers [1]) are being discovered by ITCases of the
> > connectors. It means that at least initially, the main repository will
> lose
> > some test coverage.
> > >>
> > > True, but I think this is more a symptom of us not properly testing the
> > contracts that are exposed to connectors.
> >
> > Sure. In ideal world we should have properly test coverage and
> > self-contained modules. In reality, especially when it comes to weird and
> > quirky race conditions, some executions paths/races are triggered o

Re: Flink cannot recognized catalog set by registerCatalog.

2019-08-12 Thread Jark Wu

I think we might need to improve the javadoc of
tableEnv.registerTableSource/registerTableSink.
Currently, the comment says

"Registers an external TableSink with already configured field names and
field types in this TableEnvironment's catalog."

But, what catalog? The current one or default in-memory one?
I think, it would be better to improve the description and add a NOTE on
it.

Regards,
Jark

On Tue, 13 Aug 2019 at 10:52, Xuefu Z  wrote:

> Yes, tableEnv.registerTable(_) etc always registers in the default catalog.
> To create table in your custom catalog, you could use
> tableEnv.sqlUpdate("create table ").
>
> Thanks,
> Xuefu
>
> On Mon, Aug 12, 2019 at 6:17 PM Simon Su  wrote:
>
> > Hi Xuefu
> >
> > Thanks for you reply.
> >
> > Actually I have tried it as your advises. I have tried to call
> > tableEnv.useCatalog and useDatabase. Also I have tried to use
> > “catalogname.databasename.tableName”  in SQL. I think the root cause is
> > that when I call tableEnv.registerTableSource, it’s always use a
> “build-in”
> > Catalog and Database rather than the custom one. So if I want to use a
> > custom one, I have to write code like this:
> >
> > StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env,
> > EnvironmentSettings.newInstance()
> > .useBlinkPlanner()
> > .inStreamingMode()
> > .withBuiltInCatalogName("ca1")
> > .withBuiltInDatabaseName("db1")
> > .build());
> >
> >
> > As Dawid said, if I want to store in my custom catalog, I can call
> > catalog.createTable or using DDL.
> >
> > Thanks,
> > SImon
> >
> > On 08/13/2019 02:55，Xuefu Z 
> wrote：
> >
> > Hi Simon,
> >
> > Thanks for reporting the problem. There is some rough edges around
> catalog
> > API and table environments, and we are improving post 1.9 release.
> >
> > Nevertheless, tableEnv.registerCatalog() is just to put a new catalog in
> > Flink's CatalogManager, It doens't change the default catalog/database as
> > you expected. To switch to your newly registered catalog, you could call
> > tableEnv.useCatalog() and .useDatabase().
> >
> > As an alternative, you could fully qualify your table name with a
> > "catalog.db.table" syntax without switching current catalog/database.
> >
> > Please try those and let me know if you find new problems.
> >
> > Thanks,
> > Xuefu
> >
> >
> >
> > On Mon, Aug 12, 2019 at 12:38 AM Simon Su  wrote:
> >
> >> Hi All
> >> I want to use a custom catalog by setting the name “ca1” and create
> a
> >> database under this catalog. When I submit the
> >> SQL, and it raises the error like :
> >>
> >>
> >> Exception in thread "main"
> >> org.apache.flink.table.api.ValidationException: SQL validation failed.
> From
> >> line 1, column 98 to line 1, column 116: Object 'orderstream' not found
> >> within 'ca1.db1'
> >> at
> >>
> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:125)
> >> at
> >>
> org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:82)
> >> at
> >>
> org.apache.flink.table.planner.operations.SqlToOperationConverter.convertSqlInsert(SqlToOperationConverter.java:154)
> >> at
> >>
> org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:89)
> >> at
> >>
> org.apache.flink.table.planner.delegation.PlannerBase.parse(PlannerBase.scala:130)
> >> at
> >>
> org.apache.flink.table.api.internal.TableEnvironmentImpl.sqlUpdate(TableEnvironmentImpl.java:335)
> >> at sqlrunner.RowTimeTest.memoryCatalog(RowTimeTest.java:126)
> >> at sqlrunner.RowTimeTest.main(RowTimeTest.java:137)
> >> Caused by: org.apache.calcite.runtime.CalciteContextException: From line
> >> 1, column 98 to line 1, column 116: Object 'orderstream' not found
> within
> >> 'ca1.db1'
> >> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> >> at
> >>
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> >> at
> >>
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> >> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> >&g

Re: [VOTE] Apache Flink Release 1.9.0, release candidate #2

2019-08-12 Thread Jark Wu

Hi all,

I just find an issue when testing connector DDLs against blink planner for
rc2.
This issue lead to the DDL doesn't work when containing timestamp/date/time
type.
I have created an issue FLINK-13699[1] and a pull request for this.

IMO, this can be a blocker issue of 1.9 release. Because
timestamp/date/time are primitive types, and this will break the DDL
feature.
However, I want to hear more thoughts from the community whether we should
recognize it as a blocker.

Thanks,
Jark


[1]: https://issues.apache.org/jira/browse/FLINK-13699



On Mon, 12 Aug 2019 at 22:46, Becket Qin  wrote:

> Thanks Gordon, will do that.
>
> On Mon, Aug 12, 2019 at 4:42 PM Tzu-Li (Gordon) Tai 
> wrote:
>
> > Concerning FLINK-13231:
> >
> > Since this is a @PublicEvolving interface, technically it is ok to break
> > it across releases (including across bugfix releases?).
> > So, @Becket if you do merge it now, please mark the fix version as 1.9.1.
> >
> > During the voting process, in the case a new RC is created, we usually
> > check the list of changes compared to the previous RC, and correct the
> "Fix
> > Version" of the corresponding JIRAs to be the right version (in the case,
> > it would be corrected to 1.9.0 instead of 1.9.1).
> >
> > On Mon, Aug 12, 2019 at 4:25 PM Till Rohrmann 
> > wrote:
> >
> >> I agree that it would be nicer. Not sure whether we should cancel the RC
> >> for this issue given that it is open for quite some time and hasn't been
> >> addressed until very recently. Maybe we could include it on the
> shortlist
> >> of nice-to-do things which we do in case that the RC gets cancelled.
> >>
> >> Cheers,
> >> Till
> >>
> >> On Mon, Aug 12, 2019 at 4:18 PM Becket Qin 
> wrote:
> >>
> >>> Hi Till,
> >>>
> >>> Yes, I think we have already documented in that way. So technically
> >>> speaking it is fine to change it later. It is just better if we could
> >>> avoid
> >>> doing that.
> >>>
> >>> Thanks,
> >>>
> >>> Jiangjie (Becket) Qin
> >>>
> >>> On Mon, Aug 12, 2019 at 4:09 PM Till Rohrmann 
> >>> wrote:
> >>>
> >>> > Could we say that the PubSub connector is public evolving instead?
> >>> >
> >>> > Cheers,
> >>> > Till
> >>> >
> >>> > On Mon, Aug 12, 2019 at 3:18 PM Becket Qin 
> >>> wrote:
> >>> >
> >>> > > Hi all,
> >>> > >
> >>> > > FLINK-13231(palindrome!) has a minor Google PubSub connector API
> >>> change
> >>> > > regarding how to config rate limiting. The GCP PubSub connector is
> a
> >>> > newly
> >>> > > introduced connector in 1.9, so it would be nice to include this
> >>> change
> >>> > > into 1.9 rather than later to avoid a public API change. I am
> >>> thinking of
> >>> > > making this as a blocker for 1.9. Want to check what do others
> think.
> >>> > >
> >>> > > Thanks,
> >>> > >
> >>> > > Jiangjie (Becket) Qin
> >>> > >
> >>> > > On Mon, Aug 12, 2019 at 2:04 PM Zili Chen 
> >>> wrote:
> >>> > >
> >>> > > > Hi Kurt,
> >>> > > >
> >>> > > > Thanks for your explanation. For [1] I think at least we should
> >>> change
> >>> > > > the JIRA issue field, like unset the fixed version. For [2] I can
> >>> see
> >>> > > > the change is all in test scope but wonder if such a commit still
> >>> > invalid
> >>> > > > the release candidate. IIRC previous RC VOTE threads would
> contain
> >>> a
> >>> > > > release manual/guide, I will try to look up it, too.
> >>> > > >
> >>> > > > Best,
> >>> > > > tison.
> >>> > > >
> >>> > > >
> >>> > > > Kurt Young  于2019年8月12日周一 下午5:42写道：
> >>> > > >
> >>> > > > > Hi Zili,
> >>> > > > >
> >>> > > > > Thanks for the heads up. The 2 issues you mentioned were opened
> >>> by
> >>> > me.
> >>> > > We
> >>> > > > > have
> >>> >

Re: Flink cannot recognized catalog set by registerCatalog.

2019-08-12 Thread Jark Wu

Hi Simon,

This is a temporary workaround for 1.9 release. We will fix the behavior in
1.10, see FLINK-13461.

Regards,
Jark

On Tue, 13 Aug 2019 at 13:57, Simon Su  wrote:

> Hi Jark
>
> Thanks for your reply.
>
> It’s weird that In this case the tableEnv provide the api called
> “registerCatalog”, but it does not work in some cases ( like my cases ).
> Do you think it’s feasible to unify this behaviors ? I think the document
> is necessary, but a unify way to use tableEnv is also important.
>
> Thanks,
> SImon
>
> On 08/13/2019 12:27，Jark Wu  wrote：
>
> I think we might need to improve the javadoc of
> tableEnv.registerTableSource/registerTableSink.
> Currently, the comment says
>
> "Registers an external TableSink with already configured field names and
> field types in this TableEnvironment's catalog."
>
> But, what catalog? The current one or default in-memory one?
> I think, it would be better to improve the description and add a NOTE on
> it.
>
> Regards,
> Jark
>
> On Tue, 13 Aug 2019 at 10:52, Xuefu Z  wrote:
>
>> Yes, tableEnv.registerTable(_) etc always registers in the default
>> catalog.
>> To create table in your custom catalog, you could use
>> tableEnv.sqlUpdate("create table ").
>>
>> Thanks,
>> Xuefu
>>
>> On Mon, Aug 12, 2019 at 6:17 PM Simon Su  wrote:
>>
>> > Hi Xuefu
>> >
>> > Thanks for you reply.
>> >
>> > Actually I have tried it as your advises. I have tried to call
>> > tableEnv.useCatalog and useDatabase. Also I have tried to use
>> > “catalogname.databasename.tableName”  in SQL. I think the root cause is
>> > that when I call tableEnv.registerTableSource, it’s always use a
>> “build-in”
>> > Catalog and Database rather than the custom one. So if I want to use a
>> > custom one, I have to write code like this:
>> >
>> > StreamTableEnvironment tableEnv = StreamTableEnvironment.create(env,
>> > EnvironmentSettings.newInstance()
>> > .useBlinkPlanner()
>> > .inStreamingMode()
>> > .withBuiltInCatalogName("ca1")
>> > .withBuiltInDatabaseName("db1")
>> > .build());
>> >
>> >
>> > As Dawid said, if I want to store in my custom catalog, I can call
>> > catalog.createTable or using DDL.
>> >
>> > Thanks,
>> > SImon
>> >
>> > On 08/13/2019 02:55，Xuefu Z 
>> wrote：
>> >
>> > Hi Simon,
>> >
>> > Thanks for reporting the problem. There is some rough edges around
>> catalog
>> > API and table environments, and we are improving post 1.9 release.
>> >
>> > Nevertheless, tableEnv.registerCatalog() is just to put a new catalog in
>> > Flink's CatalogManager, It doens't change the default catalog/database
>> as
>> > you expected. To switch to your newly registered catalog, you could call
>> > tableEnv.useCatalog() and .useDatabase().
>> >
>> > As an alternative, you could fully qualify your table name with a
>> > "catalog.db.table" syntax without switching current catalog/database.
>> >
>> > Please try those and let me know if you find new problems.
>> >
>> > Thanks,
>> > Xuefu
>> >
>> >
>> >
>> > On Mon, Aug 12, 2019 at 12:38 AM Simon Su  wrote:
>> >
>> >> Hi All
>> >> I want to use a custom catalog by setting the name “ca1” and
>> create a
>> >> database under this catalog. When I submit the
>> >> SQL, and it raises the error like :
>> >>
>> >>
>> >> Exception in thread "main"
>> >> org.apache.flink.table.api.ValidationException: SQL validation failed.
>> From
>> >> line 1, column 98 to line 1, column 116: Object 'orderstream' not found
>> >> within 'ca1.db1'
>> >> at
>> >>
>> org.apache.flink.table.planner.calcite.FlinkPlannerImpl.validate(FlinkPlannerImpl.scala:125)
>> >> at
>> >>
>> org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:82)
>> >> at
>> >>
>> org.apache.flink.table.planner.operations.SqlToOperationConverter.convertSqlInsert(SqlToOperationConverter.java:154)
>> >> at
>> >>
>> org.apache.flink.table.planner.operations.SqlToOperationConverter.convert(SqlToOperationConverter.java:89)
>> >> at
>> >>

Re: [VOTE] Apache Flink Release 1.9.0, release candidate #2

2019-08-13 Thread Jark Wu

Hi Till,

After thinking about we can use VARCHAR as an alternative of
timestamp/time/date.
I'm fine with not recognize it as a blocker issue.
We can fix it into 1.9.1.


Thanks,
Jark


On Tue, 13 Aug 2019 at 15:10, Richard Deurwaarder  wrote:

> Hello all,
>
> I noticed the PubSub example jar is not included in the examples/ dir of
> flink-dist. I've created https://issues.apache.org/jira/browse/FLINK-13700
>  + https://github.com/apache/flink/pull/9424/files to fix this.
>
> I will leave it up to you to decide if we want to add this to 1.9.0.
>
> Regards,
>
> Richard
>
> On Tue, Aug 13, 2019 at 9:04 AM Till Rohrmann 
> wrote:
>
> > Hi Jark,
> >
> > thanks for reporting this issue. Could this be a documented limitation of
> > Blink's preview version? I think we have agreed that the Blink SQL
> planner
> > will be rather a preview feature than production ready. Hence it could
> > still contain some bugs. My concern is that there might be still other
> > issues which we'll discover bit by bit and could postpone the release
> even
> > further if we say Blink bugs are blockers.
> >
> > Cheers,
> > Till
> >
> > On Tue, Aug 13, 2019 at 7:42 AM Jark Wu  wrote:
> >
> > > Hi all,
> > >
> > > I just find an issue when testing connector DDLs against blink planner
> > for
> > > rc2.
> > > This issue lead to the DDL doesn't work when containing
> > timestamp/date/time
> > > type.
> > > I have created an issue FLINK-13699[1] and a pull request for this.
> > >
> > > IMO, this can be a blocker issue of 1.9 release. Because
> > > timestamp/date/time are primitive types, and this will break the DDL
> > > feature.
> > > However, I want to hear more thoughts from the community whether we
> > should
> > > recognize it as a blocker.
> > >
> > > Thanks,
> > > Jark
> > >
> > >
> > > [1]: https://issues.apache.org/jira/browse/FLINK-13699
> > >
> > >
> > >
> > > On Mon, 12 Aug 2019 at 22:46, Becket Qin  wrote:
> > >
> > > > Thanks Gordon, will do that.
> > > >
> > > > On Mon, Aug 12, 2019 at 4:42 PM Tzu-Li (Gordon) Tai <
> > tzuli...@apache.org
> > > >
> > > > wrote:
> > > >
> > > > > Concerning FLINK-13231:
> > > > >
> > > > > Since this is a @PublicEvolving interface, technically it is ok to
> > > break
> > > > > it across releases (including across bugfix releases?).
> > > > > So, @Becket if you do merge it now, please mark the fix version as
> > > 1.9.1.
> > > > >
> > > > > During the voting process, in the case a new RC is created, we
> > usually
> > > > > check the list of changes compared to the previous RC, and correct
> > the
> > > > "Fix
> > > > > Version" of the corresponding JIRAs to be the right version (in the
> > > case,
> > > > > it would be corrected to 1.9.0 instead of 1.9.1).
> > > > >
> > > > > On Mon, Aug 12, 2019 at 4:25 PM Till Rohrmann <
> trohrm...@apache.org>
> > > > > wrote:
> > > > >
> > > > >> I agree that it would be nicer. Not sure whether we should cancel
> > the
> > > RC
> > > > >> for this issue given that it is open for quite some time and
> hasn't
> > > been
> > > > >> addressed until very recently. Maybe we could include it on the
> > > > shortlist
> > > > >> of nice-to-do things which we do in case that the RC gets
> cancelled.
> > > > >>
> > > > >> Cheers,
> > > > >> Till
> > > > >>
> > > > >> On Mon, Aug 12, 2019 at 4:18 PM Becket Qin 
> > > > wrote:
> > > > >>
> > > > >>> Hi Till,
> > > > >>>
> > > > >>> Yes, I think we have already documented in that way. So
> technically
> > > > >>> speaking it is fine to change it later. It is just better if we
> > could
> > > > >>> avoid
> > > > >>> doing that.
> > > > >>>
> > > > >>> Thanks,
> > > > >>>
> > > > >>> Jiangjie (Becket) Qin
> > > > >>>
> > > > >>> On Mon, Aug 12, 2019 at 4:09 PM Till Rohrmann <
&g

Re: [VOTE] Flink Project Bylaws

2019-08-13 Thread Jark Wu

+1 (non-binding)

Best,
Jark

On Wed, 14 Aug 2019 at 09:22, Kurt Young  wrote:

> +1 (binding)
>
> Best,
> Kurt
>
>
> On Wed, Aug 14, 2019 at 1:34 AM Yun Tang  wrote:
>
> > +1 (non-binding)
> >
> > But I have a minor question about "code change" action, for those
> > "[hotfix]" github pull requests [1], the dev mailing list would not be
> > notified currently. I think we should change the description of this
> action.
> >
> >
> > [1]
> >
> https://flink.apache.org/contributing/contribute-code.html#code-contribution-process
> >
> > Best
> > Yun Tang
> > 
> > From: JingsongLee 
> > Sent: Tuesday, August 13, 2019 23:56
> > To: dev 
> > Subject: Re: [VOTE] Flink Project Bylaws
> >
> > +1 (non-binding)
> > Thanks Becket.
> > I've learned a lot from current bylaws.
> >
> > Best,
> > Jingsong Lee
> >
> >
> > --
> > From:Yu Li 
> > Send Time:2019年8月13日(星期二) 17:48
> > To:dev 
> > Subject:Re: [VOTE] Flink Project Bylaws
> >
> > +1 (non-binding)
> >
> > Thanks for the efforts Becket!
> >
> > Best Regards,
> > Yu
> >
> >
> > On Tue, 13 Aug 2019 at 16:09, Xintong Song 
> wrote:
> >
> > > +1 (non-binding)
> > >
> > > Thank you~
> > >
> > > Xintong Song
> > >
> > >
> > >
> > > On Tue, Aug 13, 2019 at 1:48 PM Robert Metzger 
> > > wrote:
> > >
> > > > +1 (binding)
> > > >
> > > > On Tue, Aug 13, 2019 at 1:47 PM Becket Qin 
> > wrote:
> > > >
> > > > > Thanks everyone for voting.
> > > > >
> > > > > For those who have already voted, just want to bring this up to
> your
> > > > > attention that there is a minor clarification to the bylaws wiki
> this
> > > > > morning. The change is in bold format below:
> > > > >
> > > > > one +1 from a committer followed by a Lazy approval (not counting
> the
> > > > vote
> > > > > > of the contributor), moving to lazy majority if a -1 is received.
> > > > > >
> > > > >
> > > > >
> > > > > Note that this implies that committers can +1 their own commits and
> > > merge
> > > > > > right away. *However, the committe**rs should use their best
> > > judgement
> > > > to
> > > > > > respect the components expertise and ongoing development plan.*
> > > > >
> > > > >
> > > > > This addition does not really change anything the bylaws meant to
> > set.
> > > It
> > > > > is simply a clarification. If anyone who have casted the vote
> > objects,
> > > > > please feel free to withdraw the vote.
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Jiangjie (Becket) Qin
> > > > >
> > > > >
> > > > > On Tue, Aug 13, 2019 at 1:29 PM Piotr Nowojski <
> pi...@ververica.com>
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > > On 13 Aug 2019, at 13:22, vino yang 
> > wrote:
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > Tzu-Li (Gordon) Tai  于2019年8月13日周二
> > 下午6:32写道：
> > > > > > >
> > > > > > >> +1
> > > > > > >>
> > > > > > >> On Tue, Aug 13, 2019, 12:31 PM Hequn Cheng <
> > chenghe...@gmail.com>
> > > > > > wrote:
> > > > > > >>
> > > > > > >>> +1 (non-binding)
> > > > > > >>>
> > > > > > >>> Thanks a lot for driving this! Good job. @Becket Qin <
> > > > > > >> becket@gmail.com
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>> Best, Hequn
> > > > > > >>>
> > > > > > >>> On Tue, Aug 13, 2019 at 6:26 PM Stephan Ewen <
> se...@apache.org
> > >
> > > > > wrote:
> > > > > > >>>
> > > > > > >>>> +1
> > &g

Re: [VOTE] Apache Flink Release 1.9.0, release candidate #2

2019-08-14 Thread Jark Wu

Hi Gordon,

I have verified the following things:

- build the source release with Scala 2.12 and Scala 2.11 successfully
- checked/verified signatures and hashes
- checked that all POM files point to the same version
- ran some flink table related end-to-end tests locally and succeeded
(except TPC-H e2e failed which is reported in FLINK-13704)
- started cluster for both Scala 2.11 and 2.12, ran examples, verified web
ui and log output, nothing unexpected
- started cluster, ran a SQL query to temporal join with kafka source and
mysql jdbc table, and write results to kafka again. Using DDL to create the
source and sinks. looks good.
- reviewed the release PR

As FLINK-13704 is not recognized as blocker issue, so +1 from my side
(non-binding).

On Tue, 13 Aug 2019 at 17:07, Till Rohrmann  wrote:

> Hi Richard,
>
> although I can see that it would be handy for users who have PubSub set up,
> I would rather not include examples which require an external dependency
> into the Flink distribution. I think examples should be self-contained. My
> concern is that we would bloat the distribution for many users at the
> benefit of a few. Instead, I think it would be better to make these
> examples available differently, maybe through Flink's ecosystem website or
> maybe a new examples section in Flink's documentation.
>
> Cheers,
> Till
>
> On Tue, Aug 13, 2019 at 9:43 AM Jark Wu  wrote:
>
> > Hi Till,
> >
> > After thinking about we can use VARCHAR as an alternative of
> > timestamp/time/date.
> > I'm fine with not recognize it as a blocker issue.
> > We can fix it into 1.9.1.
> >
> >
> > Thanks,
> > Jark
> >
> >
> > On Tue, 13 Aug 2019 at 15:10, Richard Deurwaarder 
> wrote:
> >
> > > Hello all,
> > >
> > > I noticed the PubSub example jar is not included in the examples/ dir
> of
> > > flink-dist. I've created
> > https://issues.apache.org/jira/browse/FLINK-13700
> > >  + https://github.com/apache/flink/pull/9424/files to fix this.
> > >
> > > I will leave it up to you to decide if we want to add this to 1.9.0.
> > >
> > > Regards,
> > >
> > > Richard
> > >
> > > On Tue, Aug 13, 2019 at 9:04 AM Till Rohrmann 
> > > wrote:
> > >
> > > > Hi Jark,
> > > >
> > > > thanks for reporting this issue. Could this be a documented
> limitation
> > of
> > > > Blink's preview version? I think we have agreed that the Blink SQL
> > > planner
> > > > will be rather a preview feature than production ready. Hence it
> could
> > > > still contain some bugs. My concern is that there might be still
> other
> > > > issues which we'll discover bit by bit and could postpone the release
> > > even
> > > > further if we say Blink bugs are blockers.
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Aug 13, 2019 at 7:42 AM Jark Wu  wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I just find an issue when testing connector DDLs against blink
> > planner
> > > > for
> > > > > rc2.
> > > > > This issue lead to the DDL doesn't work when containing
> > > > timestamp/date/time
> > > > > type.
> > > > > I have created an issue FLINK-13699[1] and a pull request for this.
> > > > >
> > > > > IMO, this can be a blocker issue of 1.9 release. Because
> > > > > timestamp/date/time are primitive types, and this will break the
> DDL
> > > > > feature.
> > > > > However, I want to hear more thoughts from the community whether we
> > > > should
> > > > > recognize it as a blocker.
> > > > >
> > > > > Thanks,
> > > > > Jark
> > > > >
> > > > >
> > > > > [1]: https://issues.apache.org/jira/browse/FLINK-13699
> > > > >
> > > > >
> > > > >
> > > > > On Mon, 12 Aug 2019 at 22:46, Becket Qin 
> > wrote:
> > > > >
> > > > > > Thanks Gordon, will do that.
> > > > > >
> > > > > > On Mon, Aug 12, 2019 at 4:42 PM Tzu-Li (Gordon) Tai <
> > > > tzuli...@apache.org
> > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Concerning FLINK-13231:
> > > > > > >
> > > > > &g

Re: Unbearably slow Table API time-windowed stream join with RocksDBStateBackend

2019-08-14 Thread Jark Wu

Hi Xiao,

Thanks for reporting this.
You approach sounds good to me. But we have many similar problems in
existing streaming sql operator implementations.
So I think if State API / statebackend can provide a better state structure
to handle this situation would be great.

This is a similar problem with poor performance of RocksDBListState. And
the relative discussions have been raised several times [1][2].
The root cause is RocsDBStatBackend serialize the whole list as a byte[].
And there were some ideas proposed in the thread.

I cc'ed Yu Li who works on statebackend.

Thanks,
Jark


[1]: https://issues.apache.org/jira/browse/FLINK-8297
[2]:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Discuss-FLINK-8297-A-solution-for-FLINK-8297-Timebased-RocksDBListState-tc28259.html


On Wed, 14 Aug 2019 at 14:46, LIU Xiao  wrote:

> Example SQL:
>
> SELECT *
> FROM stream1 s1, stream2 s2
> WHERE s1.id = s2.id AND s1.rowtime = s2.rowtime
>
> And we have lots of messages in stream1 and stream2 share a same rowtime.
>
> It runs fine when using heap as the state backend,
> but requires lots of heap memory sometimes (when upstream out of sync,
> etc), and a risk of full gc exists.
>
> When we use RocksDBStateBackend to lower the heap memory usage, we found
> our program runs unbearably slow.
>
> After examing the code we found
> org.apache.flink.table.runtime.join.TimeBoundedStreamJoin#processElement1()
> may be the cause of the problem (we are using Flink 1.6 but 1.8 should be
> same):
> ...
> // Check if we need to cache the current row.
> if (rightOperatorTime < rightQualifiedUpperBound) {
>   // Operator time of right stream has not exceeded the upper window
> bound of the current
>   // row. Put it into the left cache, since later coming records from
> the right stream are
>   // expected to be joined with it.
>   var leftRowList = leftCache.get(timeForLeftRow)
>   if (null == leftRowList) {
> leftRowList = new util.ArrayList[JTuple2[Row, Boolean]](1)
>   }
>   leftRowList.add(JTuple2.of(leftRow, emitted))
>   leftCache.put(timeForLeftRow, leftRowList)
> ...
>
> In above code, if there are lots of messages with a same timeForLeftRow,
> the serialization and deserialization cost will be very high when using
> RocksDBStateBackend.
>
> A simple fix I came up with:
> ...
>   // cache to store rows from the left stream
>   //private var leftCache: MapState[Long, JList[JTuple2[Row, Boolean]]] = _
>   private var leftCache: MapState[JTuple2[Long, Integer],
> JList[JTuple2[Row, Boolean]]] = _
>   private var leftCacheSize: MapState[Long, Integer] = _
> ...
> // Check if we need to cache the current row.
> if (rightOperatorTime < rightQualifiedUpperBound) {
>   // Operator time of right stream has not exceeded the upper window
> bound of the current
>   // row. Put it into the left cache, since later coming records from
> the right stream are
>   // expected to be joined with it.
>   //var leftRowList = leftCache.get(timeForLeftRow)
>   //if (null == leftRowList) {
>   //  leftRowList = new util.ArrayList[JTuple2[Row, Boolean]](1)
>   //}
>   //leftRowList.add(JTuple2.of(leftRow, emitted))
>   //leftCache.put(timeForLeftRow, leftRowList)
>   var leftRowListSize = leftCacheSize.get(timeForLeftRow)
>   if (null == leftRowListSize) {
> leftRowListSize = new Integer(0)
>   }
>   leftCache.put(JTuple2.of(timeForLeftRow, leftRowListSize),
> JTuple2.of(leftRow, emitted))
>   leftCacheSize.put(timeForLeftRow, leftRowListSize + 1)
> ...
>
> --
> LIU Xiao 
>
>

Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink committer

2019-08-14 Thread Jark Wu

Congratulations Andrey!


Cheers,
Jark

On Thu, 15 Aug 2019 at 00:57, jincheng sun  wrote:

> Congrats Andrey! Very happy to have you onboard :)
>
> Best, Jincheng
>
> Yu Li  于2019年8月15日周四 上午12:06写道：
>
> > Congratulations Andrey! Well deserved!
> >
> > Best Regards,
> > Yu
> >
> >
> > On Wed, 14 Aug 2019 at 17:55, Aleksey Pak  wrote:
> >
> > > Congratulations, Andrey!
> > >
> > > On Wed, Aug 14, 2019 at 4:53 PM Markos Sfikas 
> > > wrote:
> > >
> > > > Congrats Andrey!
> > > >
> > > > On Wed, 14 Aug 2019 at 16:47, Becket Qin 
> wrote:
> > > >
> > > > > Congratulations, Andrey!
> > > > >
> > > > > On Wed, Aug 14, 2019 at 4:35 PM Thomas Weise 
> wrote:
> > > > >
> > > > > > Congrats!
> > > > > >
> > > > > >
> > > > > > On Wed, Aug 14, 2019, 7:12 AM Robert Metzger <
> rmetz...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > Congratulations! Very happy to have you onboard :)
> > > > > > >
> > > > > > > On Wed, Aug 14, 2019 at 4:06 PM Kostas Kloudas <
> > kklou...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > Congratulations Andrey!
> > > > > > > > Well deserved!
> > > > > > > >
> > > > > > > > Kostas
> > > > > > > >
> > > > > > > > On Wed, Aug 14, 2019 at 4:04 PM Yun Tang 
> > > wrote:
> > > > > > > > >
> > > > > > > > > Congratulations Andrey.
> > > > > > > > >
> > > > > > > > > Best
> > > > > > > > > Yun Tang
> > > > > > > > > 
> > > > > > > > > From: Xintong Song 
> > > > > > > > > Sent: Wednesday, August 14, 2019 21:40
> > > > > > > > > To: Oytun Tez 
> > > > > > > > > Cc: Zili Chen ; Till Rohrmann <
> > > > > > > > trohrm...@apache.org>; dev ; user <
> > > > > > > > u...@flink.apache.org>
> > > > > > > > > Subject: Re: [ANNOUNCE] Andrey Zagrebin becomes a Flink
> > > committer
> > > > > > > > >
> > > > > > > > > Congratulations Andery~!
> > > > > > > > >
> > > > > > > > > Thank you~
> > > > > > > > >
> > > > > > > > > Xintong Song
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Aug 14, 2019 at 3:31 PM Oytun Tez <
> > oy...@motaword.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Congratulations Andrey!
> > > > > > > > >
> > > > > > > > > I am glad the Flink committer team is growing at such a
> pace!
> > > > > > > > >
> > > > > > > > > ---
> > > > > > > > > Oytun Tez
> > > > > > > > >
> > > > > > > > > M O T A W O R D
> > > > > > > > > The World's Fastest Human Translation Platform.
> > > > > > > > > oy...@motaword.com — www.motaword.com
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Aug 14, 2019 at 9:29 AM Zili Chen <
> > > wander4...@gmail.com>
> > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > Congratulations Andrey!
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > tison.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Till Rohrmann  于2019年8月14日周三
> 下午9:26写道：
> > > > > > > > >
> > > > > > > > > Hi everyone,
> > > > > > > > >
> > > > > > > > > I'm very happy to announce that Andrey Zagrebin accepted
> the
> > > > offer
> > > > > of
> > > > > > > > the Flink PMC to become a committer of the Flink project.
> > > > > > > > >
> > > > > > > > > Andrey has been an active community member for more than 15
> > > > months.
> > > > > > He
> > > > > > > > has helped shaping numerous features such as State TTL,
> > FRocksDB
> > > > > > release,
> > > > > > > > Shuffle service abstraction, FLIP-1, result partition
> > management
> > > > and
> > > > > > > > various fixes/improvements. He's also frequently helping out
> on
> > > the
> > > > > > > > user@f.a.o mailing lists.
> > > > > > > > >
> > > > > > > > > Congratulations Andrey!
> > > > > > > > >
> > > > > > > > > Best, Till
> > > > > > > > > (on behalf of the Flink PMC)
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Markos Sfikas
> > > > +49 (0) 15759630002
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-51: Rework of the Expression Design

2019-08-14 Thread Jark Wu

Thanks Jingsong for starting the discussion.

The general design of the FLIP looks good to me. +1 for the FLIP. It's time
to get rid of the old Expression!

Regarding to the function behavior, shall we also include new functions
from blink planner (e.g. LISTAGG, REGEXP, TO_DATE, etc..) ?


Best,
Jark





On Wed, 14 Aug 2019 at 23:34, Timo Walther  wrote:

> Hi Jingsong,
>
> thanks for writing down this FLIP. Big +1 from my side to finally get
> rid of PlannerExpressions and have consistent and well-defined behavior
> for Table API and SQL updated to FLIP-37.
>
> We might need to discuss some of the behavior of particular functions
> but this should not affect the actual FLIP-51.
>
> Regards,
> Timo
>
>
> Am 13.08.19 um 12:55 schrieb JingsongLee:
> > Hi everyone,
> >
> > We would like to start a discussion thread on "FLIP-51: Rework of the
> > Expression Design"(Design doc: [1], FLIP: [2]), where we describe how
> >   to improve the new java Expressions to work with type inference and
> >   convert expression to the calcite RexNode. This is a follow-up plan
> > for FLIP-32[3] and FLIP-37[4]. This FLIP is mostly based on FLIP-37.
> >
> > This FLIP addresses several shortcomings of current:
> > - New Expressions still use PlannerExpressions to type inference and
> >   to RexNode. Flnk-planner and blink-planner have a lot of repetitive
> code
> >   and logic.
> > - Let TableApi and Cacite definitions consistent.
> > - Reduce the complexity of Function development.
> > - Powerful Function for user.
> >
> > Key changes can be summarized as follows:
> > - Improve the interface of FunctionDefinition.
> > - Introduce type inference for built-in functions.
> > - Introduce ExpressionConverter to convert Expression to calcite
> >   RexNode.
> > - Remove repetitive code and logic in planners.
> >
> > I also listed type inference and behavior of all built-in functions [5],
> to
> > verify that the interface is satisfied. After introduce type inference to
> > table-common module, planners should have a unified function behavior.
> > And this gives the community also the chance to quickly discuss types
> >   and behavior of functions a last time before they are declared stable.
> >
> > Looking forward to your feedbacks. Thank you.
> >
> > [1]
> https://docs.google.com/document/d/1yFDyquMo_-VZ59vyhaMshpPtg7p87b9IYdAtMXv5XmM/edit?usp=sharing
> > [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
> > [3]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-32%3A+Restructure+flink-table+for+future+contributions
> > [4]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-37%3A+Rework+of+the+Table+API+Type+System
> > [5]
> https://docs.google.com/document/d/1fyVmdGgbO1XmIyQ1BaoG_h5BcNcF3q9UJ1Bj1euO240/edit?usp=sharing
> >
> > Best,
> > Jingsong Lee
>
>
>

Re: flink 1.9 DDL nested json derived

2019-08-15 Thread Jark Wu

Hi Shengnan,

Yes. Flink 1.9 supports nested json derived. You should declare the ROW
type with nested schema explicitly. I tested a similar DDL against 1.9.0
RC2 and worked well.

CREATE TABLE kafka_json_source (
rowtime VARCHAR,
user_name VARCHAR,
event ROW
) WITH (
'connector.type' = 'kafka',
'connector.version' = 'universal',
'connector.topic' = 'test-json',
'connector.startup-mode' = 'earliest-offset',
'connector.properties.0.key' = 'zookeeper.connect',
'connector.properties.0.value' = 'localhost:2181',
'connector.properties.1.key' = 'bootstrap.servers',
'connector.properties.1.value' = 'localhost:9092',
'update-mode' = 'append',
'format.type' = 'json',
    'format.derive-schema' = 'true'
);

The kafka message is

{"rowtime": "2018-03-12T08:00:00Z", "user_name": "Alice", "event": {
"message_type": "WARNING", "message": "This is a warning."}}


Thanks,
Jark


On Thu, 15 Aug 2019 at 14:12, Shengnan YU  wrote:

>
> Hi guys
> I am trying the DDL feature in branch 1.9-releasae.  I am stucked in
> creating a table from kafka with nested json format. Is it possibe to
> specify a "Row" type of columns to derive the nested json schema?
>
> String sql = "create table kafka_stream(\n" +
> "  a varchar, \n" +
> "  b varchar,\n" +
> "  c int,\n" +
> "  inner_json row\n" +
> ") with (\n" +
> "  'connector.type' ='kafka',\n" +
> "  'connector.version' = '0.11',\n" +
> "  'update-mode' = 'append', \n" +
> "  'connector.topic' = 'test',\n" +
> "  'connector.properties.0.key' = 'bootstrap.servers',\n" +
> "  'connector.properties.0.value' = 'localhost:9092',\n" +
> "  'format.type' = 'json', \n" +
> "  'format.derive-schema' = 'true'\n" +
> ")\n";
>
>  Thank you very much!
>

Re: [VOTE] Apache Flink Release 1.9.0, release candidate #2

2019-08-15 Thread Jark Wu

Hi Bowen,

Thanks for reporting this.
However, I don't think this is an issue. IMO, it is by design.
The `tEnv.listUserDefinedFunctions()` in Table API and `show functions;` in
SQL CLI are intended to return only the registered UDFs, not including
built-in functions.
This is also the behavior in previous versions.

Best,
Jark

On Fri, 16 Aug 2019 at 06:52, Bowen Li  wrote:

> -1 for RC2.
>
> I found a bug https://issues.apache.org/jira/browse/FLINK-13741, and I
> think it's a blocker.  The bug means currently if users call
> `tEnv.listUserDefinedFunctions()` in Table API or `show functions;` thru
> SQL would not be able to see Flink's built-in functions.
>
> I'm preparing a fix right now.
>
> Bowen
>
>
> On Thu, Aug 15, 2019 at 8:55 AM Tzu-Li (Gordon) Tai 
> wrote:
>
> > Thanks for all the test efforts, verifications and votes so far.
> >
> > So far, things are looking good, but we still require one more PMC
> binding
> > vote for this RC to be the official release, so I would like to extend
> the
> > vote time for 1 more day, until *Aug. 16th 17:00 CET*.
> >
> > In the meantime, the release notes for 1.9.0 had only just been finalized
> > [1], and could use a few more eyes before closing the vote.
> > Any help with checking if anything else should be mentioned there
> regarding
> > breaking changes / known shortcomings would be appreciated.
> >
> > Cheers,
> > Gordon
> >
> > [1] https://github.com/apache/flink/pull/9438
> >
> > On Thu, Aug 15, 2019 at 3:58 PM Kurt Young  wrote:
> >
> > > Great, then I have no other comments on legal check.
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Thu, Aug 15, 2019 at 9:56 PM Chesnay Schepler 
> > > wrote:
> > >
> > > > The licensing items aren't a problem; we don't care about Flink
> modules
> > > > in NOTICE files, and we don't have to update the source-release
> > > > licensing since we don't have a pre-built version of the WebUI in the
> > > > source.
> > > >
> > > > On 15/08/2019 15:22, Kurt Young wrote:
> > > > > After going through the licenses, I found 2 suspicions but not sure
> > if
> > > > they
> > > > > are
> > > > > valid or not.
> > > > >
> > > > > 1. flink-state-processing-api is packaged in to flink-dist jar, but
> > not
> > > > > included in
> > > > > NOTICE-binary file (the one under the root directory) like other
> > > modules.
> > > > > 2. flink-runtime-web distributed some JavaScript dependencies
> through
> > > > source
> > > > > codes, the licenses and NOTICE file were only updated inside the
> > module
> > > > of
> > > > > flink-runtime-web, but not the NOTICE file and licenses directory
> > which
> > > > > under
> > > > > the  root directory.
> > > > >
> > > > > Another minor issue I just found is:
> > > > > FLINK-13558 tries to include table examples to flink-dist, but I
> > cannot
> > > > > find it in
> > > > > the binary distribution of RC2.
> > > > >
> > > > > Best,
> > > > > Kurt
> > > > >
> > > > >
> > > > > On Thu, Aug 15, 2019 at 6:19 PM Kurt Young 
> wrote:
> > > > >
> > > > >> Hi Gordon & Timo,
> > > > >>
> > > > >> Thanks for the feedback, and I agree with it. I will document this
> > in
> > > > the
> > > > >> release notes.
> > > > >>
> > > > >> Best,
> > > > >> Kurt
> > > > >>
> > > > >>
> > > > >> On Thu, Aug 15, 2019 at 6:14 PM Tzu-Li (Gordon) Tai <
> > > > tzuli...@apache.org>
> > > > >> wrote:
> > > > >>
> > > > >>> Hi Kurt,
> > > > >>>
> > > > >>> With the same argument as before, given that it is mentioned in
> the
> > > > >>> release
> > > > >>> announcement that it is a preview feature, I would not block this
> > > > release
> > > > >>> because of it.
> > > > >>> Nevertheless, it would be important to mention this explicitly in
> > the
> > > > >>> release notes [1].
> > > > >>>

Re: [DISCUSS] Reducing build times

2019-08-15 Thread Jark Wu

Thanks Chesnay for starting this discussion.

+1 for #1, it might be the easiest way to get a significant speedup.
If the only reason is for isolation. I think we can fix the static fields
or global state used in Flink if possible.

+1 for #2, and thanks Aleksey for the prototype. I think it's a good
approach which doesn't introduce too much things to maintain.

+1 for #3(run CRON or e2e tests on demand).
We have this requirement when reviewing some pull requests, because we
don't sure whether it will broken some specific e2e test.
Currently, we have to run it locally by building the whole project. Or
enable CRON jobs for the pushed branch in contributor's own travis.

Besides that, I think FLINK-11464[1] is also a good way to cache
distributions to save a lot of download time.

Best,
Jark

[1]: https://issues.apache.org/jira/browse/FLINK-11464

On Thu, 15 Aug 2019 at 21:47, Aleksey Pak  wrote:

> Hi all!
>
> Thanks for starting this discussion.
>
> I'd like to also add my 2 cents:
>
> +1 for #2, differential build scripts.
> I've worked on the approach. And with it, I think it's possible to reduce
> total build time with relatively low effort, without enforcing any new
> build tool and low maintenance cost.
>
> You can check a proposed change (for the old CI setup, when Flink PRs were
> running in Apache common CI pool) here:
> https://github.com/apache/flink/pull/9065
> In the proposed change, the dependency check is not heavily hardcoded and
> just uses maven's results for dependency graph analysis.
>
> > This approach is conceptually quite straight-forward, but has limits
> since it has to be pessimistic; > i.e. a change in flink-core _must_ result
> in testing all modules.
>
> Agree, in Flink case, there are some core modules that would trigger whole
> tests run with such approach. For developers who modify such components,
> the build time would be the longest. But this approach should really help
> for developers who touch more-or-less independent modules.
>
> Even for core modules, it's possible to create "abstraction" barriers by
> changing dependency graph. For example, it can look like: flink-core-api
> <-- flink-core, flink-core-api <-- flink-connectors.
> In that case, only change in flink-core-api would trigger whole tests run.
>
> +1 for #3, separating PR CI runs to different stages.
> Imo, it may require more change to current CI setup, compared to #2 and
> better it should not be silly. Best, if it integrates with the Flink bot
> and triggers some follow up build steps only when some prerequisites are
> done.
>
> +1 for #4, to move some tests into cron runs.
> But imo, this does not scale well, it applies only to a small subset of
> tests.
>
> +1 for #6, to use other CI service(s).
> More specifically, GitHub gives build actions for free that can be used to
> offload some build steps/PR checks. It can help to move out some PR checks
> from the main CI build (for example: documentation builds, license checks,
> code formatting checks).
>
> Regards,
> Aleksey
>
> On Thu, Aug 15, 2019 at 11:08 AM Till Rohrmann 
> wrote:
>
> > Thanks for starting this discussion Chesnay. I think it has become
> obvious
> > to the Flink community that with the existing build setup we cannot
> really
> > deliver fast build times which are essential for fast iteration cycles
> and
> > high developer productivity. The reasons for this situation are manifold
> > but it is definitely affected by Flink's project growth, not always
> optimal
> > tests and the inflexibility that everything needs to be built. Hence, I
> > consider the reduction of build times crucial for the project's health
> and
> > future growth.
> >
> > Without necessarily voicing a strong preference for any of the presented
> > suggestions, I wanted to comment on each of them:
> >
> > 1. This sounds promising. Could the reason why we don't reuse JVMs date
> > back to the time when we still had a lot of static fields in Flink which
> > made it hard to reuse JVMs and the potentially mutated global state?
> >
> > 2. Building hand-crafted solutions around a build system in order to
> > compensate for its limitations which other build systems support out of
> the
> > box sounds like the not invented here syndrome to me. Reinventing the
> wheel
> > has historically proven to be usually not the best solution and it often
> > comes with a high maintenance price tag. Moreover, it would add just
> > another layer of complexity around our existing build system. I think the
> > current state where we have the maven setup in pom files and for Travis
> > multiple

Re: [VOTE] FLIP-51: Rework of the Expression Design

2019-08-16 Thread Jark Wu

+1 from my side.

Thanks Jingsong for driving this.

Best,
Jark

On Thu, 15 Aug 2019 at 22:09, Timo Walther  wrote:

> +1 for this.
>
> Thanks,
> Timo
>
> Am 15.08.19 um 15:57 schrieb JingsongLee:
> > Hi Flink devs,
> >
> > I would like to start the voting for FLIP-51 Rework of the Expression
> >   Design.
> >
> > FLIP wiki:
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
> >
> > Discussion thread:
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-51-Rework-of-the-Expression-Design-td31653.html
> >
> > Google Doc:
> >
> https://docs.google.com/document/d/1yFDyquMo_-VZ59vyhaMshpPtg7p87b9IYdAtMXv5XmM/edit?usp=sharing
> >
> > Thanks,
> >
> > Best,
> > Jingsong Lee
>
>
>

Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

2019-08-16 Thread Jark Wu

Thanks for starting this design Timo and Dawid,

Improving ConfigOption has been hovering in my mind for a long time.
We have seen the benefit when developing blink configurations and connector
properties in 1.9 release.
Thanks for bringing it up and make such a detailed design.
I will leave my thoughts and comments there.

Cheers,
Jark


On Fri, 16 Aug 2019 at 22:30, Zili Chen  wrote:

> Hi Timo,
>
> It looks interesting. Thanks for preparing this FLIP!
>
> Client API enhancement benefit from this evolution which
> hopefully provides a better view of configuration of Flink.
> In client API enhancement, we likely make the deployment
> of cluster and submission of job totally defined by configuration.
>
> Will take a look at the document in days.
>
> Best,
> tison.
>
>
> Timo Walther  于2019年8月16日周五 下午10:12写道：
>
> > Hi everyone,
> >
> > Dawid and I are working on making parts of ExecutionConfig and
> > TableConfig configurable via config options. This is necessary to make
> > all properties also available in SQL. Additionally, with the new SQL DDL
> > based on properties as well as more connectors and formats coming up,
> > unified configuration becomes more important.
> >
> > We need more features around string-based configuration in the future,
> > which is why Dawid and I would like to propose FLIP-54 for evolving the
> > ConfigOption and Configuration classes:
> >
> >
> >
> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit
> >
> > In summary it adds:
> > - documented types and validation
> > - more common types such as memory size, duration, list
> > - simple non-nested object types
> >
> > Looking forward to your feedback,
> > Timo
> >
> >
>

Re: [VOTE] Apache Flink Release 1.9.0, release candidate #2

2019-08-19 Thread Jark Wu

Hi Gordon,

I agree that we should pick the minimal set of changes to shorten the
release testing time.
However, I would like to include FLINK-13699 into RC3. FLINK-13699 is a
critical DDL issue, and is a small change to flink table (won't affect the
runtime feature and stability).
I will do some tests around sql and blink planner if the RC3 include this
fix.

But if the community against to include it, I'm also fine with having it in
the next minor release.

Thanks,
Jark

On Mon, 19 Aug 2019 at 16:16, Stephan Ewen  wrote:

> +1 for Gordon's approach.
>
> If we do that, we can probably skip re-testing everything and mainly need
> to verify the release artifacts (signatures, build from source, etc.).
>
> If we open the RC up for changes, I fear a lot of small issues will rush in
> and destabilize the candidate again, meaning we have to do another larger
> testing effort.
>
>
>
> On Mon, Aug 19, 2019 at 9:48 AM Becket Qin  wrote:
>
> > Hi Gordon,
> >
> > I remember we mentioned earlier that if there is an additional RC, we can
> > piggyback the GCP PubSub API change (
> > https://issues.apache.org/jira/browse/FLINK-13231). It is a small patch
> to
> > avoid future API change. So should be able to merge it very shortly.
> Would
> > it be possible to include that into RC3 as well?
> >
> > Thanks,
> >
> > Jiangjie (Becket) Qin
> >
> > On Mon, Aug 19, 2019 at 9:43 AM Tzu-Li (Gordon) Tai  >
> > wrote:
> >
> > > Hi,
> > >
> > > https://issues.apache.org/jira/browse/FLINK-13752 turns out to be an
> > > actual
> > > blocker, so we would have to close this RC now in favor of a new one.
> > >
> > > Since we are already quite past the planned release time for 1.9.0, I
> > would
> > > like to limit the new changes included in RC3 to only the following:
> > > - https://issues.apache.org/jira/browse/FLINK-13752
> > > - Fix license and notice file issues that Kurt had found with
> > > flink-runtime-web and flink-state-processing-api
> > >
> > > This means that I will not be creating RC3 with the release-1.9 branch
> as
> > > is, but essentially only cherry-picking the above mentioned changes on
> > top
> > > of RC2.
> > > The minimal set of changes on top of RC2 should allow us to carry most
> if
> > > not all of the already existing votes without another round of
> extensive
> > > testing, and allow us to have a shortened voting time.
> > >
> > > I understand that there are other issues mentioned in this thread that
> > are
> > > already spotted and merged to release-1.9, especially for the Blink
> > planner
> > > and DDL, but I suggest not to include them in RC3.
> > > I think it would be better to collect all the remaining issues for
> those
> > > over a period of time, and include them as 1.9.1 which can ideally also
> > > happen a few weeks soon after 1.9.0.
> > >
> > > What do you think? If there are not objections, I would proceed with
> this
> > > plan and push out a new RC by the end of today (Aug. 19th CET).
> > >
> > > Regards,
> > > Gordon
> > >
> > > On Mon, Aug 19, 2019 at 4:09 AM Zili Chen 
> wrote:
> > >
> > > > We should investigate the performance regression but regardless the
> > > > regression I vote +1
> > > >
> > > > Have verified following things
> > > >
> > > > - Jobs running on YARN x (Session & Per Job) with high-availability
> > > > enabled.
> > > > - Simulate JM and TM failures.
> > > > - Simulate temporary network partition.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > Stephan Ewen  于2019年8月18日周日 下午10:12写道：
> > > >
> > > > > For reference, this is the JIRA issue about the regression in
> > question:
> > > > >
> > > > > https://issues.apache.org/jira/browse/FLINK-13752
> > > > >
> > > > >
> > > > > On Fri, Aug 16, 2019 at 10:57 AM Guowei Ma 
> > > wrote:
> > > > >
> > > > > > Hi, till
> > > > > > I can send the job to you offline.
> > > > > > It is just a datastream job and does not use
> > > > > TwoInputSelectableStreamTask.
> > > > > > A->B
> > > > > >  \
> > > > > >C
> > > > > >  /
> > > >

Re: [VOTE] Apache Flink 1.9.0, release candidate #3

2019-08-20 Thread Jark Wu

+1 (non-binding)

- build the source release with Scala 2.12 and Scala 2.11 successfully
- checked/verified signatures and hashes
- checked that all POM files point to the same version
- started a cluster, ran a SQL query to temporal join with kafka source and
mysql jdbc table, and write results to kafka again.
  Using DDL (with timestamp type) to create the source and sinks. looks
good. No error in the logs.
- started a cluster, ran a SQL query to read from kafka source and apply a
group aggregation, and write into mysql jdbc table.
  Using DDL (with timestamp type) to create source and sink. looks good
too. No error in the logs.

Cheers,
Jark

On Wed, 21 Aug 2019 at 04:20, Stephan Ewen  wrote:

> +1 (binding)
>
>  - Downloaded the binary release tarball
>  - started a standalone cluster with four nodes
>  - ran some examples through the Web UI
>  - checked the logs
>  - created a project from the Java quickstarts maven archetype
>  - ran a multi-stage DataSet job in batch mode
>  - killed as TaskManager and verified correct restart behavior, including
> failover region backtracking
>
>
> I found a few issues, and a common theme here is confusing error reporting
> and logging.
>
> (1) When testing batch failover and killing a TaskManager, the job reports
> as the failure cause "org.apache.flink.util.FlinkException: The assigned
> slot 6d0e469d55a2630871f43ad0f89c786c_0 was removed."
> I think that is a pretty bad error message, as a user I don't know what
> that means. Some internal book keeping thing?
> You need to know a lot about Flink to understand that this means
> "TaskManager failure".
> https://issues.apache.org/jira/browse/FLINK-13805
> I would not block the release on this, but think this should get pretty
> urgent attention.
>
> (2) The Metric Fetcher floods the log with error messages when a
> TaskManager is lost.
>  There are many exceptions being logged by the Metrics Fetcher due to
> not reaching the TM any more.
>  This pollutes the log and drowns out the original exception and the
> meaningful logs from the scheduler/execution graph.
>  https://issues.apache.org/jira/browse/FLINK-13806
>  Again, I would not block the release on this, but think this should
> get pretty urgent attention.
>
> (3) If you put "web.submit.enable: false" into the configuration, the web
> UI will still display the "SubmitJob" page, but errors will
> continuously pop up, stating "Unable to load requested file /jars."
> https://issues.apache.org/jira/browse/FLINK-13799
>
> (4) REST endpoint logs ERROR level messages when selecting the
> "Checkpoints" tab for batch jobs. That does not seem correct.
>  https://issues.apache.org/jira/browse/FLINK-13795
>
> Best,
> Stephan
>
>
>
>
> On Tue, Aug 20, 2019 at 11:32 AM Tzu-Li (Gordon) Tai 
> wrote:
>
> > +1
> >
> > Legal checks:
> > - verified signatures and hashes
> > - New bundled Javascript dependencies for flink-runtime-web are correctly
> > reflected under licenses-binary and NOTICE file.
> > - locally built from source (Scala 2.12, without Hadoop)
> > - No missing artifacts in staging repo
> > - No binaries in source release
> >
> > Functional checks:
> > - Quickstart working (both in IDE + job submission)
> > - Simple State Processor API program that performs offline key schema
> > migration (RocksDB backend). Generated savepoint is valid to restore
> from.
> > - All E2E tests pass locally
> > - Didn’t notice any issues with the new WebUI
> >
> > Cheers,
> > Gordon
> >
> > On Tue, Aug 20, 2019 at 3:53 AM Zili Chen  wrote:
> >
> > > +1 (non-binding)
> > >
> > > - build from source: OK(8u212)
> > > - check local setup tutorial works as expected
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > Yu Li  于2019年8月20日周二 上午8:24写道：
> > >
> > > > +1 (non-binding)
> > > >
> > > > - checked release notes: OK
> > > > - checked sums and signatures: OK
> > > > - repository appears to contain all expected artifacts
> > > > - source release
> > > >  - contains no binaries: OK
> > > >  - contains no 1.9-SNAPSHOT references: OK
> > > >  - build from source: OK (8u102)
> > > > - binary release
> > > >  - no examples appear to be missing
> > > >  - started a cluster; WebUI reachable, example ran successfully
> > > > - checked README.md file and found nothing unexpected
> > > >
> > > > Best R

Re: CiBot Update

2019-08-22 Thread Jark Wu

Great work! Thanks Chesnay!



On Thu, 22 Aug 2019 at 15:42, Xintong Song  wrote:

> The re-triggering travis feature is so convenient. Thanks Chesnay~!
>
> Thank you~
>
> Xintong Song
>
>
>
> On Thu, Aug 22, 2019 at 9:26 AM Stephan Ewen  wrote:
>
> > Nice, thanks!
> >
> > On Thu, Aug 22, 2019 at 3:59 AM Zili Chen  wrote:
> >
> > > Thanks for your announcement. Nice work!
> > >
> > > Best,
> > > tison.
> > >
> > >
> > > vino yang  于2019年8月22日周四 上午8:14写道：
> > >
> > > > +1 for "@flinkbot run travis", it is very convenient.
> > > >
> > > > Chesnay Schepler  于2019年8月21日周三 下午9:12写道：
> > > >
> > > > > Hi everyone,
> > > > >
> > > > > this is an update on recent changes to the CI bot.
> > > > >
> > > > >
> > > > > The bot now cancels builds if a new commit was added to a PR, and
> > > > > cancels all builds if the PR was closed.
> > > > > (This was implemented a while ago; I'm just mentioning it again for
> > > > > discoverability)
> > > > >
> > > > >
> > > > > Additionally, starting today you can now re-trigger a Travis run by
> > > > > writing a comment "@flinkbot run travis"; this means you no longer
> > have
> > > > > to commit an empty commit or do other shenanigans to get another
> > build
> > > > > running.
> > > > > Note that this will /not/ work if the PR was re-opened, until at
> > least
> > > 1
> > > > > new build was triggered by a push.
> > > > >
> > > >
> > >
> >
>

Re: [ANNOUNCE] Apache Flink 1.9.0 released

2019-08-22 Thread Jark Wu

Congratulations!

Thanks Gordon and Kurt for being the release manager and thanks a lot to
all the contributors.


Cheers,
Jark

On Thu, 22 Aug 2019 at 20:06, Oytun Tez  wrote:

> Congratulations team; thanks for the update, Gordon.
>
> ---
> Oytun Tez
>
> *M O T A W O R D*
> The World's Fastest Human Translation Platform.
> oy...@motaword.com — www.motaword.com
>
>
> On Thu, Aug 22, 2019 at 8:03 AM Tzu-Li (Gordon) Tai 
> wrote:
>
>> The Apache Flink community is very happy to announce the release of
>> Apache Flink 1.9.0, which is the latest major release.
>>
>> Apache Flink® is an open-source stream processing framework for
>> distributed, high-performing, always-available, and accurate data streaming
>> applications.
>>
>> The release is available for download at:
>> https://flink.apache.org/downloads.html
>>
>> Please check out the release blog post for an overview of the
>> improvements for this new major release:
>> https://flink.apache.org/news/2019/08/22/release-1.9.0.html
>>
>> The full release notes are available in Jira:
>>
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344601
>>
>> We would like to thank all contributors of the Apache Flink community who
>> made this release possible!
>>
>> Cheers,
>> Gordon
>>
>

Re: [DISCUSS] Use Java's Duration instead of Flink's Time

2019-08-25 Thread Jark Wu

+1 to use Java's Duration instead of Flink's Time.

Regarding to the Duration parsing, we have mentioned this in FLIP-54[1] to
use `org.apache.flink.util.TimeUtils` for the parsing.

Best,
Jark

[1]:
https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit#heading=h.egdwkc93dn1k

On Sat, 24 Aug 2019 at 18:24, Zhu Zhu  wrote:

> +1 since Java Duration is more common and powerful than Flink Time.
>
> For whether to drop scala Duration for parsing duration OptionConfig, I
> think it's another question and should be discussed in another thread.
>
> Thanks,
> Zhu Zhu
>
> Becket Qin  于2019年8月24日周六 下午4:16写道：
>
> > +1, makes sense. BTW, we probably need a FLIP as this is a public API
> > change.
> >
> > On Sat, Aug 24, 2019 at 8:11 AM SHI Xiaogang 
> > wrote:
> >
> > > +1 to replace Flink's time with Java's Duration.
> > >
> > > Besides, i also suggest to use Java's Instant for "point-in-time".
> > > It can take care of time units when we calculate Duration between
> > different
> > > instants.
> > >
> > > Regards,
> > > Xiaogang
> > >
> > > Zili Chen  于2019年8月24日周六 上午10:45写道：
> > >
> > > > Hi vino,
> > > >
> > > > I agree that it introduces extra complexity to replace
> Duration(Scala)
> > > > with Duration(Java) *in Scala code*. We could separate the usage for
> > each
> > > > language and use a bridge when necessary.
> > > >
> > > > As a matter of fact, Scala concurrent APIs(including Duration) are
> used
> > > > more than necessary at least in flink-runtime. Also we even try to
> make
> > > > flink-runtime scala free.
> > > >
> > > > Best,
> > > > tison.
> > > >
> > > >
> > > > vino yang  于2019年8月24日周六 上午10:05写道：
> > > >
> > > > > +1 to replace the Time class provided by Flink with Java's
> Duration:
> > > > >
> > > > >
> > > > >- Java's Duration has better representation than the Flink's
> Time
> > > > class;
> > > > >- As a built-in Java class, Duration class has a clear advantage
> > > over
> > > > >Java's Time class when interacting with other Java APIs and
> > > > third-party
> > > > >libraries;
> > > > >
> > > > >
> > > > > But I have reservations about replacing the Duration and
> FineDuration
> > > > > classes in scala with the Duration class in Java. Java and Scala
> have
> > > > > different types of systems. Currently, Duration (scala) and
> > > FineDuration
> > > > > (scala) work well.  In addition, this work brings additional
> > complexity
> > > > and
> > > > > cost compared to the gains obtained.
> > > > >
> > > > > Best,
> > > > > Vino
> > > > >
> > > > > Zili Chen  于2019年8月23日周五 下午11:14写道：
> > > > >
> > > > > > Hi Stephan,
> > > > > >
> > > > > > I like the idea unify usage of time/duration api. We actually
> > > > > > use at least five different classes for this purposes(see below).
> > > > > >
> > > > > > One thing I'd like to pick up is that duration configuration
> > > > > > in Flink is almost in pattern as "60 s" that fits in the pattern
> > > > > > parsed by scala.concurrent.duration.Duration. AFAIK Duration
> > > > > > in Java 8 doesn't support this pattern. However, we can solve
> > > > > > it by introduce a DurationUtils.
> > > > > >
> > > > > > Also to clarify, we now have (correct me if any other)
> > > > > >
> > > > > > java.time.Duration
> > > > > > scala.concurrent.duration.Duration
> > > > > > scala.concurrent.duration.FiniteDuration
> > > > > > org.apache.flink.api.common.time.Time
> > > > > > org.apache.flink.streaming.api.windowing.time.Time
> > > > > >
> > > > > > in use. If we'd prefer java.time.Duration, it is worth to
> consider
> > > > > > whether we unify all of them into Java's Duration, i.e., Java's
> > > > > > Duration is the first class time/duration api, while others
> should
> > > > > > be converted into or out from it.
> > > > > >
> > > > > > Best,
> > > > > > tison.
> > > > > >
> > > > > >
> > > > > > Stephan Ewen  于2019年8月23日周五 下午10:45写道：
> > > > > >
> > > > > > > Hi all!
> > > > > > >
> > > > > > > Many parts of the code use Flink's "Time" class. The Time
> really
> > > is a
> > > > > > "time
> > > > > > > interval" or a "Duration".
> > > > > > >
> > > > > > > Since Java 8, there is a Java class "Duration" that is nice and
> > > > > flexible
> > > > > > to
> > > > > > > use.
> > > > > > > I would suggest we start using Java Duration instead and drop
> > Time
> > > as
> > > > > > much
> > > > > > > as possible in the runtime from now on.
> > > > > > >
> > > > > > > Maybe even drop that class from the API in Flink 2.0.
> > > > > > >
> > > > > > > Best,
> > > > > > > Stephan
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Setup a bui...@flink.apache.org mailing list for travis builds

2019-08-25 Thread Jark Wu

Hi all,

Sorry it take so long to get back. I have some good news.

After some investigation and development and the help from Chesnay, we
finally integrated Travis build notification with bui...@flink.apache.org
mailing list with remaining the beautiful formatting!
Currently, only the failure and failure->success builds will be notified,
only builds (include CRON) on apache/flink branches will be notified, the
pull request builds will not be notified.

The builds mailing list is also available in Flink website community page
[1]

I would encourage devs to subscribe the builds mailing list, and help the
community to pay more attention to the build status, especially the CRON
builds.

Feel free to leave your suggestions and feedbacks here!

# The implementation detail:

I implemented a flink-notification-bot[2] to receive Travis webhook[3]
payload and generate an HTML email and send the email to
bui...@flink.apache.org.
The flink-notification-bot is deployed on my own VM in DigitalOcean. You
can refer the github page [2] of the project to learn more details about
the implementation and deployment.
Btw, I'm glad to contribute the project to https://github.com/flink-ci or
https://github.com/flinkbot if the community accepts.

With the flink-notification-bot, we can easily integrate it with other CI
service or our own CI, and we can also integrate it with some other
applications (e.g. DingTalk).

# Rejected Alternative:

Option#1: Sending email notifications via "Travis Email Notification"[4].
Reasons:
 - If the emailing notification is set, Travis CI only sends an emails to
the addresses specified there, rather than to the committer and author.
 - We will lose the beautiful email formatting when Travis send Email to
builds ML.
 - The return-path of emails from Travis CI is not constant, which makes it
difficult for mailing list to accept it.

Cheers,
Jark

[1]: https://flink.apache.org/community.html#mailing-lists
[2]: https://github.com/wuchong/flink-notification-bot
[3]:
https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
[4]:
https://docs.travis-ci.com/user/notifications/#configuring-email-notifications

On Tue, 30 Jul 2019 at 18:35, Jark Wu  wrote:

> Hi all,
>
> Progress updates:
> 1. the bui...@flink.apache.org can be subscribed now (thanks @Robert),
> you can send an email to builds-subscr...@flink.apache.org to subscribe.
> 2. We have a pull request [1] to send only apache/flink builds
> notifications and it works well.
> 3. However, all the notifications are rejected by the builds mailing list
> (the MODERATE mails).
> I added & checked bui...@travis-ci.org to the subscriber/allow list,
> but still doesn't work. It might be recognized as spam by the mailing list.
> We are still trying to figure it out, and will update here if we have
> some progress.
>
>
> Thanks,
> Jark
>
>
>
> [1]: https://github.com/apache/flink/pull/9230
>
>
> On Thu, 25 Jul 2019 at 22:59, Robert Metzger  wrote:
>
>> The mailing list has been created, you can now subscribe to it.
>>
>> On Wed, Jul 24, 2019 at 1:43 PM Jark Wu  wrote:
>>
>> > Thanks Robert for helping out that.
>> >
>> > Best,
>> > Jark
>> >
>> > On Wed, 24 Jul 2019 at 19:16, Robert Metzger 
>> wrote:
>> >
>> > > I've requested the creation of the list, and made Jark, Chesnay and me
>> > > moderators of it.
>> > >
>> > > On Wed, Jul 24, 2019 at 1:12 PM Robert Metzger 
>> > > wrote:
>> > >
>> > > > @Jark: Yes, I will request the creation of a mailing list!
>> > > >
>> > > > On Tue, Jul 23, 2019 at 4:48 PM Hugo Louro 
>> wrote:
>> > > >
>> > > >> +1
>> > > >>
>> > > >> > On Jul 23, 2019, at 6:15 AM, Till Rohrmann > >
>> > > >> wrote:
>> > > >> >
>> > > >> > Good idea Jark. +1 for the proposal.
>> > > >> >
>> > > >> > Cheers,
>> > > >> > Till
>> > > >> >
>> > > >> >> On Tue, Jul 23, 2019 at 1:59 PM Hequn Cheng <
>> chenghe...@gmail.com>
>> > > >> wrote:
>> > > >> >>
>> > > >> >> Hi Jark,
>> > > >> >>
>> > > >> >> Good idea. +1!
>> > > >> >>
>> > > >> >>> On Tue, Jul 23, 2019 at 6:23 PM Jark Wu 
>> wrote:
>> > > >> >>>
>> > > >> >>> Thank you all

Re: [CODE-STYLE] Builder pattern

2019-08-26 Thread Jark Wu

Hi Gyula,

Thanks for bringing this. I think it would be nice if we have a common
approach to create builder pattern.
Currently, we have a lot of builders but with different tastes.

 > 1. Creating the builder objects:
I prefer option a) too. It would be easier for users to get the builder
instance.

> 2. Setting properties on the builder:
I don't have a preference for it. But I think there is another option might
be more concise, i.e. "something()" without `with` or `set` prefix.
For example:

CsvTableSource source = new CsvTableSource.builder()
.path("/path/to/your/file.csv")
.field("myfield", Types.STRING)
.field("myfield2", Types.INT)
.build();

This pattern is heavily used in flink-table, e.g. `TableSchema`,
`TypeInference`, `BuiltInFunctionDefinition`.

> 3. Implementing the builder object:
I prefer  b) Mutable approach which is simpler from the implementation part.

Besides that, I think maybe we can add some other aspects:

4. Constructor of the main class.
 a) private constructor
 b) public constructor

5. setXXX methods of the main class
 a) setXXX methods are not allowed
 b) setXXX methods are allowed.

I prefer both option a). Because I think one of the reason to have the
builder is that we don't want the constructor public.
A public constructor makes it hard to maintain and evolve compatibly when
adding new parameters, FlinkKafkaProducer is a good example.
For set methods, I think in most cases, we want users to set the fields
eagerly (through the builder) and `setXXX` methods on the main class
is duplicate with the methods on the builder. We should avoid that.

Regards,
Jark

On Mon, 26 Aug 2019 at 20:18, Gyula Fóra  wrote:

> Hi All!
>
> I would like to start a code-style related discussion regarding how we
> implement the builder pattern in the Flink project.
>
> It would be the best to have a common approach, there are some aspects of
> the pattern that come to my mind please feel free to add anything I missed:
>
> 1. Creating the builder objects:
>
> a) Always create using static method in "built" class:
>Here we should have naming guidelines: .builder(..) or
> .xyzBuilder(...)
> b) Always use builder class constructor
> c) Mix: Again we should have some guidelines when to use which
>
> I personally prefer option a) to always have a static method to create the
> builder with static method names that end in builder.
>
> 2. Setting properties on the builder:
>
>  a) withSomething(...)
>  b) setSomething(...)
>  c) other
>
> I don't really have a preference but either a or b for consistency.
>
>
> 3. Implementing the builder object:
>
>  a) Immutable -> Creates a new builder object after setting a property
>  b) Mutable -> Returns (this) after setting the property
>
> I personally prefer the mutable version as it keeps the builder
> implementation much simpler and it seems to be a very common way of doing
> it.
>
> What do you all think?
>
> Regards,
> Gyula
>

Re: [DISCUSS] Enhance Support for Multicast Communication Pattern

2019-08-27 Thread Jark Wu

Hi all,

Thanks Yun for bringing this topic. I missed this discussion because of the
"multicast" title.
After reading the design, if I understand correctly, it is proposing a
custom event mach mechanism, i.e. broadcasting custom event.
It is a orthogonality topic with multicasting. So I would suggest to start
a new thread to discuss about it.

Regarding to broadcasting custom event:

I would +1 for motivation, because we also encountered similar requirements
when improving Table API & SQL before.

For example, the mini-batch mechanism in blink planner will emit a special
mini-batch event to the data stream to indicate this is a start of a new
mini-batch.
The downstream aggregation operator will buffer the data records until it
receive the mini-batch event, and then process the buffer at once. This
will reduce a lot of state access.
However, we don't have a proper custom event mechanism currently, so we
leverage the watermark as the mini-batch event (which is a little hack in
my opinion).

Another case is joining a huge dimension table which is stored/produced in
hive daily. We can scan the hive table and shuffle to the JOIN operators by
the join key to join with the main stream.
Note that the dimension table is changed every day, we want to join the
latest version of the hive table. Then we need to re-scan and re-shuffle
the hive table once a new daily partition is produced.
However, we need some special events to distinguish the boundary of
different version of the dimension table. The events will be used to notify
downstream operators (mainly the JOIN operator)
 to know "ok, I will receive a new version of the dimension table", "ok, I
received the all the data of this version."

>From my understanding, in order to support this feature, we might need to:
 1) expose collectEvent(CustomEvent) or broadcastEvent(CustomEvent) API to
users.
 2) support to register the serialization and deserialization of the custom
event
 3) expose processEvent(int channel, CustomEvent) API to StreamOperator

Regards,
Jark

On Tue, 27 Aug 2019 at 15:18, Piotr Nowojski  wrote:

> Hi,
>
> Before starting a work on the design doc, I would suggest to find someone
> to shepherd this project. Otherwise this effort might drown among other
> parallel things. I could take care of that from the runtime perspective,
> however most of the changes are about the API and changes, which are
> outside of my area of expertise.
>
> Regarding the multicast, before we start working on that, I would also
> prefer to see a motivation design doc, how that feature would be used for
> example for cross or theta joins in the Table API, since very similar
> questions would apply to that as well.
>
> Piotrek
>
> > On 27 Aug 2019, at 08:10, SHI Xiaogang  wrote:
> >
> > Hi Yun Gao,
> >
> > Thanks a lot for your clarification.
> >
> > Now that the notification of broadcast events requires alignment whose
> > implementation, in my opinion, will affect the correctness of synchronous
> > iterations, I prefer to postpone the discussion until you have completed
> > the design of the new iteration library, or at least the progress
> tracking
> > part. Otherwise, the discussion for broadcasting events may become an
> empty
> > talk if it does not fit in with the final design.
> >
> > What do you think?
> >
> > Regards,
> > Xiaogang
> >
> > Yun Gao  于2019年8月27日周二 上午11:33写道：
> >
> >> Hi Xiaogang,
> >>
> >>  Very thanks for also considering the iteration case! :) These
> points
> >> are really important for iteration. As a whole, we are implementing a
> new
> >> iteration library on top of Stream API. As a library, most of its
> >> implementation does not need to touch Runtime layer, but it really has
> some
> >> new requirements on the API, like the one for being able to broadcast
> the
> >> progressive events. To be more detail, these events indeed carry the
> >> sender's index and the downstream operators need to do alignment the
> events
> >> from all the upstream operators. It works very similar to watermark,
> thus
> >> these events do not need to be contained in checkpoints.
> >>
> >> Some other points are also under implementation. However, since some
> part
> >> of the design is still under discussion internally, we may not be able
> to
> >> start a new discussion on iteration immediately. Besides, we should also
> >> need to fix the problems that may have new requirements on the Runtime,
> >> like broadcasting events, to have a complete design. Therefore, I think
> we
> >> may still first have the broadcasting problem settled in this thread?
>

Re: [VOTE] FLIP-54: Evolve ConfigOption and Configuration

2019-08-27 Thread Jark Wu

+1 to the FLIP.


Regards,
Jark

> 在 2019年8月27日，19:28，Timo Walther  写道：
> 
> Hi everyone,
> 
> thanks for the great feedback we have received for the draft of FLIP-54. The 
> discussion seems to have reached an agreement. Of course this doesn't mean 
> that we can't propose further improvements on ConfigOption's and Flink 
> configuration in general in the future. It is just one step towards having a 
> better unified configuration for the project.
> 
> Please vote for the following design document:
> 
> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit#
> 
> I will convert it to a Wiki page afterwards.
> 
> Thanks,
> Timo
>

Re: [VOTE] FLIP-54: Evolve ConfigOption and Configuration

2019-08-28 Thread Jark Wu

Hi Timo,

The new changes looks good to me.

+1 to the FLIP.


Cheers,
Jark

On Wed, 28 Aug 2019 at 16:02, Timo Walther  wrote:

> Hi everyone,
>
> after some last minute changes yesterday, I would like to start a new
> vote on FLIP-54. The discussion seems to have reached an agreement. Of
> course this doesn't mean that we can't propose further improvements on
> ConfigOption's and Flink configuration in general in the future. It is
> just one step towards having a better unified configuration for the
> project.
>
> Please vote for the following design document:
>
>
> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit#
>
> The discussion can be found at:
>
>
> https://lists.apache.org/thread.html/a56c6b52e5f828d4a737602b031e36b5dd6eaa97557306696a8063a9@%3Cdev.flink.apache.org%3E
>
> This voting will be open for at least 72 hours. I'll try to close it on
> 2019-09-02 8:00 UTC, unless there is an objection or not enough votes.
>
> I will convert it to a Wiki page afterwards.
>
> Thanks,
>
> Timo
>
>

Re: [DISCUSS] Setup a bui...@flink.apache.org mailing list for travis builds

2019-08-28 Thread Jark Wu

Thank you for the suggestion Kurt. I just updated the notification.

Best,
Jark

On Thu, 29 Aug 2019 at 13:56, Kurt Young  wrote:

> one suggestion: we could also filter all notifications about *Cancelled*
> builds.
>
> Best,
> Kurt
>
>
> On Tue, Aug 27, 2019 at 10:53 AM jincheng sun 
> wrote:
>
> > Great Job Jark :)
> > Best, Jincheng
> >
> > Kurt Young  于2019年8月26日周一 下午6:38写道：
> >
> > > Thanks for the updates, Jark! I have subscribed the ML and everything
> > > looks good now.
> > >
> > > Best,
> > > Kurt
> > >
> > >
> > > On Mon, Aug 26, 2019 at 11:17 AM Jark Wu  wrote:
> > >
> > > > Hi all,
> > > >
> > > > Sorry it take so long to get back. I have some good news.
> > > >
> > > > After some investigation and development and the help from Chesnay,
> we
> > > > finally integrated Travis build notification with
> > > bui...@flink.apache.org
> > > > mailing list with remaining the beautiful formatting!
> > > > Currently, only the failure and failure->success builds will be
> > notified,
> > > > only builds (include CRON) on apache/flink branches will be notified,
> > the
> > > > pull request builds will not be notified.
> > > >
> > > > The builds mailing list is also available in Flink website community
> > page
> > > > [1]
> > > >
> > > > I would encourage devs to subscribe the builds mailing list, and help
> > the
> > > > community to pay more attention to the build status, especially the
> > CRON
> > > > builds.
> > > >
> > > > Feel free to leave your suggestions and feedbacks here!
> > > >
> > > > 
> > > >
> > > > # The implementation detail:
> > > >
> > > > I implemented a flink-notification-bot[2] to receive Travis
> webhook[3]
> > > > payload and generate an HTML email and send the email to
> > > > bui...@flink.apache.org.
> > > > The flink-notification-bot is deployed on my own VM in DigitalOcean.
> > You
> > > > can refer the github page [2] of the project to learn more details
> > about
> > > > the implementation and deployment.
> > > > Btw, I'm glad to contribute the project to
> https://github.com/flink-ci
> > > or
> > > > https://github.com/flinkbot if the community accepts.
> > > >
> > > > With the flink-notification-bot, we can easily integrate it with
> other
> > CI
> > > > service or our own CI, and we can also integrate it with some other
> > > > applications (e.g. DingTalk).
> > > >
> > > > # Rejected Alternative:
> > > >
> > > > Option#1: Sending email notifications via "Travis Email
> > Notification"[4].
> > > > Reasons:
> > > >  - If the emailing notification is set, Travis CI only sends an
> emails
> > to
> > > > the addresses specified there, rather than to the committer and
> author.
> > > >  - We will lose the beautiful email formatting when Travis send Email
> > to
> > > > builds ML.
> > > >  - The return-path of emails from Travis CI is not constant, which
> > makes
> > > it
> > > > difficult for mailing list to accept it.
> > > >
> > > > Cheers,
> > > > Jark
> > > >
> > > > [1]: https://flink.apache.org/community.html#mailing-lists
> > > > [2]: https://github.com/wuchong/flink-notification-bot
> > > > [3]:
> > > >
> > > >
> > >
> >
> https://docs.travis-ci.com/user/notifications/#configuring-webhook-notifications
> > > > [4]:
> > > >
> > > >
> > >
> >
> https://docs.travis-ci.com/user/notifications/#configuring-email-notifications
> > > >
> > > >
> > > >
> > > >
> > > > On Tue, 30 Jul 2019 at 18:35, Jark Wu  wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Progress updates:
> > > > > 1. the bui...@flink.apache.org can be subscribed now (thanks
> > @Robert),
> > > > > you can send an email to builds-subscr...@flink.apache.org to
> > > subscribe.
> > > > > 2. We have a pull request [1] to send only apache/flink builds
&

Re: [VOTE] FLIP-58: Flink Python User-Defined Function for Table API

2019-08-29 Thread Jark Wu

+1

Thanks for the great work!

On Fri, 30 Aug 2019 at 10:04, Xingbo Huang  wrote:

> Hi Dian,
>
> +1,
> Thanks a lot for driving this.
>
> Best,
> Xingbo
> > 在 2019年8月30日，上午9:39，Wei Zhong  写道：
> >
> > Hi Dian,
> >
> > +1 non-binding
> > Thanks for driving this!
> >
> > Best, Wei
> >
> >> 在 2019年8月29日，09:25，Hequn Cheng  写道：
> >>
> >> Hi Dian,
> >>
> >> +1
> >> Thanks a lot for driving this.
> >>
> >> Best, Hequn
> >>
> >> On Wed, Aug 28, 2019 at 2:01 PM jincheng sun 
> >> wrote:
> >>
> >>> Hi Dian,
> >>>
> >>> +1, Thanks for your great job!
> >>>
> >>> Best,
> >>> Jincheng
> >>>
> >>> Dian Fu  于2019年8月28日周三 上午11:04写道：
> >>>
>  Hi all,
> 
>  I'd like to start a voting thread for FLIP-58 [1] since that we have
>  reached an agreement on the design in the discussion thread [2],
> 
>  This vote will be open for at least 72 hours. Unless there is an
>  objection, I will try to close it by Sept 2, 2019 00:00 UTC if we have
>  received sufficient votes.
> 
>  PS: This doesn't mean that we cannot further improve the design. We
> can
>  still discuss the implementation details case by case in the JIRA as
> long
>  as it doesn't affect the overall design.
> 
>  [1]
> 
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Function+for+Table+API
>  <
> 
> >>>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58:+Flink+Python+User-Defined+Function+for+Table+API
> >
>  [2]
> 
> >>>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-User-Defined-Function-for-Table-API-td31673.html
>  <
> 
> >>>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-User-Defined-Function-for-Table-API-td31673.html
> >
> 
>  Thanks,
>  Dian
> >>>
> >
>
>

Re: [DISCUSS] Releasing Flink 1.8.2

2019-08-30 Thread Jark Wu

Thanks Jincheng for bringing this up.

+1 to the 1.8.2 release, because it already contains a couple of important
fixes and it has been a long time since 1.8.1 came out.
I'm willing to help the community as much as possible. I'm wondering if I
can be the release manager of 1.8.2 or work with you together @Jincheng？

Best,
Jark

On Fri, 30 Aug 2019 at 18:58, Hequn Cheng  wrote:

> Hi Jincheng,
>
> +1 for a 1.8.2 release.
> Thanks a lot for raising the discussion. It would be nice to have these
> critical fixes.
>
> Best, Hequn
>
>
> On Fri, Aug 30, 2019 at 6:31 PM Maximilian Michels  wrote:
>
> > Hi Jincheng,
> >
> > +1 I would be for a 1.8.2 release such that we can fix the problems with
> > the nested closure cleaner which currently block 1.8.1 users with Beam:
> > https://issues.apache.org/jira/browse/FLINK-13367
> >
> > Thanks,
> > Max
> >
> > On 30.08.19 11:25, jincheng sun wrote:
> > > Hi Flink devs,
> > >
> > > It has been nearly 2 months since the 1.8.1 released. So, what do you
> > think
> > > about releasing Flink 1.8.2 soon?
> > >
> > > We already have some blocker and critical fixes in the release-1.8
> > branch:
> > >
> > > [Blocker]
> > > - FLINK-13159 java.lang.ClassNotFoundException when restore job
> > > - FLINK-10368 'Kerberized YARN on Docker test' unstable
> > > - FLINK-12578 Use secure URLs for Maven repositories
> > >
> > > [Critical]
> > > - FLINK-12736 ResourceManager may release TM with allocated slots
> > > - FLINK-12889 Job keeps in FAILING state
> > > - FLINK-13484 ConnectedComponents end-to-end test instable with
> > > NoResourceAvailableException
> > > - FLINK-13508 CommonTestUtils#waitUntilCondition() may attempt to sleep
> > > with negative time
> > > - FLINK-13806 Metric Fetcher floods the JM log with errors when TM is
> > lost
> > >
> > > Furthermore, I think the following one blocker issue should be merged
> > > before 1.8.2 release.
> > >
> > > - FLINK-13897: OSS FS NOTICE file is placed in wrong directory
> > >
> > > It would also be great if we can have the fix of Elasticsearch6.x
> > connector
> > > threads leaking (FLINK-13689) in 1.8.2 release which is identified as
> > major.
> > >
> > > Please let me know what you think?
> > >
> > > Cheers,
> > > Jincheng
> > >
> >
>

Re: [DISCUSS] Releasing Flink 1.8.2

2019-09-01 Thread Jark Wu

Thanks Jincheng, I will look into the release guidelines.

Hi @Thomas Weise  , should we mark FLINK-13586 as a
blocker? And how long do you think this issue will take?

I summarized the current status of issues we need to track:

[Bloker]:
[FLINK-13897] OSS FS NOTICE file is placed in wrong directory (@Chesnay was
working on it, PR was reviewed)
[Major]:
[FLINK-13586] Method ClosureCleaner.clean broke backward compatibility
between 1.8.0 and 1.8.1 (need PR)
[FLINK-13689] Rest High Level Client for Elasticsearch6.x connector leaks
threads if no connection could be established (reviewed by @Gordon, PR need
to be updated)

And we have a new issue FLINK-13925 target to 1.8.2 marked as major, I'm
not sure whether we should wait this for 1.8.2, could @Aljoscha Krettek
 help to check it?
[FLINK-13925] ClassLoader in BlobLibraryCacheManager is not using context
class loader

Thank you all for the fixing and reviewing.

The issues of this release can be tracked here:
https://issues.apache.org/jira/projects/FLINK/versions/12345670

Best,
Jark


On Mon, 2 Sep 2019 at 09:19, Thomas Weise  wrote:

> +1 for the 1.8.2 release
>
> I marked https://issues.apache.org/jira/browse/FLINK-13586 for this
> release. It would be good to compensate for the backward incompatible
> change to ClosureCleaner that was introduced in 1.8.1, which affects
> downstream dependencies.
>
> Thanks,
> Thomas
>
>
> On Sun, Sep 1, 2019 at 5:10 PM jincheng sun 
> wrote:
>
> > Hi Jark,
> >
> > Glad to hear that you want to be the Release Manager of flink 1.8.2.
> > I believe that you will be a great RM, and I am very willing to help you
> > with the final release in the final stages. :)
> >
> > The release of Apache Flink involves a number of tasks. For details, you
> > can consult the documentation [1]. If you have any questions, please let
> me
> > know and let us work together.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release#CreatingaFlinkRelease-Checklisttoproceedtothenextstep.1
> >
> > Cheers,
> > Jincheng
> >
> > Till Rohrmann  于2019年8月31日周六 上午12:59写道：
> >
> > > +1 for a 1.8.2 bug fix release. Thanks for kicking this discussion off
> > > Jincheng.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Fri, Aug 30, 2019 at 6:45 PM Jark Wu  wrote:
> > >
> > > > Thanks Jincheng for bringing this up.
> > > >
> > > > +1 to the 1.8.2 release, because it already contains a couple of
> > > important
> > > > fixes and it has been a long time since 1.8.1 came out.
> > > > I'm willing to help the community as much as possible. I'm wondering
> > if I
> > > > can be the release manager of 1.8.2 or work with you together
> > @Jincheng？
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Fri, 30 Aug 2019 at 18:58, Hequn Cheng 
> > wrote:
> > > >
> > > > > Hi Jincheng,
> > > > >
> > > > > +1 for a 1.8.2 release.
> > > > > Thanks a lot for raising the discussion. It would be nice to have
> > these
> > > > > critical fixes.
> > > > >
> > > > > Best, Hequn
> > > > >
> > > > >
> > > > > On Fri, Aug 30, 2019 at 6:31 PM Maximilian Michels  >
> > > > wrote:
> > > > >
> > > > > > Hi Jincheng,
> > > > > >
> > > > > > +1 I would be for a 1.8.2 release such that we can fix the
> problems
> > > > with
> > > > > > the nested closure cleaner which currently block 1.8.1 users with
> > > Beam:
> > > > > > https://issues.apache.org/jira/browse/FLINK-13367
> > > > > >
> > > > > > Thanks,
> > > > > > Max
> > > > > >
> > > > > > On 30.08.19 11:25, jincheng sun wrote:
> > > > > > > Hi Flink devs,
> > > > > > >
> > > > > > > It has been nearly 2 months since the 1.8.1 released. So, what
> do
> > > you
> > > > > > think
> > > > > > > about releasing Flink 1.8.2 soon?
> > > > > > >
> > > > > > > We already have some blocker and critical fixes in the
> > release-1.8
> > > > > > branch:
> > > > > > >
> > > > > > > [Blocker]
> > > > > > > - FLINK-13159 java.lang.ClassNotFoundException when restore job
> > > > &g

Re: [DISCUSS] FLIP-60: Restructure the Table API & SQL documentation

2019-09-02 Thread Jark Wu

big +1 to the idea of restructuring the docs. We got a lot of complaints
from users about the Table & SQL docs.

In general, I think the new structure is very nice.

Regarding to moving "User-defined Extensions" to corresponding broader
topics, I would prefer current "User-defined Extensions".
Because it is a more advanced topic than "Connect to external systems" and
"Builtin Functions", and we can mention the common points (e.g. pom
dependency) in the overview of the Extensions section.
Besides that, I would like to keep Builtin Functions as a top-level to make
it have more exposure and may further split the page.

I have some other suggestions:

1) Having subpages under "Built-in Functions". For example:

Built-in Functions
 - Mathematical Functions
 - Bit Functions
 - Date and Time Functions
 - Conditional Functions
 - String Functions
 - Aggregate Functions
 - ...

Currently, all the functions are squeezed in one page. It make the
page bloated.
Meanwhile, I think it would be great to enrich the built-in functions with
argument explanation and more clear examples like MySQL[1] and other
DataBase docs.

2) +1 to the "Architecture & Internals" chapter.
We already have a pull request[2] to add "Streaming Aggregation Performance
Tuning" page which talks about the performance tuning tips around streaming
aggregation and the internals.
Maybe we can put it under the internal chapter or a "Performance Tuning"
chapter.

3) How about restructure SQL chapter a bit like this?

SQL
 - Overview
 - Data Manipulation Statements (all operations available in SQL)
 - Data Definition Statements (DDL syntaxes)
 - Pattern Matching

It renames "Full Reference" to "Data Manipulation Statements" which is more
align with "Data Definition Statements".

Regards,
Jark

[1]:
https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html#function_adddate
[2]: https://github.com/apache/flink/pull/9525

On Mon, 2 Sep 2019 at 17:29, Kurt Young  wrote:

> +1 to the general idea and thanks for driving this. I think the new
> structure is
> more clear than the old one, and i have some suggestions:
>
> 1. How about adding a "Architecture & Internals" chapter? This can help
> developers
> or users who want to contribute more to have a better understanding about
> Table.
> Essentially with blink planner, we merged a lots of codes and features but
> lack of
> proper user and design documents.
>
> 2. Add a dedicated "Hive Integration" chapter. We spend lots of effort on
> integrating
> hive, and hive integration is happened in different areas, like catalog,
> function and
> maybe ddl in the future. I think a dedicated chapter can make users who are
> interested
> in this topic easier to find the information they need.
>
> 3. Add a chapter about how to manage, monitor or tune the Table & SQL jobs,
> and
> might adding something like how to migrate old version jobs to new version
> in the future.
>
> Best,
> Kurt
>
>
> On Mon, Sep 2, 2019 at 4:17 PM vino yang  wrote:
>
> > Agree with Dawid's suggestion about function.
> >
> > Having a Functions section to unify the built-in function and UDF would
> be
> > better.
> >
> > Dawid Wysakowicz  于2019年8月30日周五 下午7:43写道：
> >
> > > +1 to the idea of restructuring the docs.
> > >
> > > My only suggestion to consider is how about moving the
> > > User-Defined-Extensions subpages to corresponding broader topics?
> > >
> > > Sources & Sinks >> Connect to external systems
> > >
> > > Catalogs >> Connect to external systems
> > >
> > > and then have a Functions sections with subsections:
> > >
> > > functions
> > >
> > > |- built in functions
> > >
> > > |- user defined functions
> > >
> > >
> > > Best,
> > >
> > > Dawid
> > >
> > > On 30/08/2019 10:59, Timo Walther wrote:
> > > > Hi everyone,
> > > >
> > > > the Table API & SQL documentation was already in a very good shape in
> > > > Flink 1.8. However, in the past it was mostly presented as an
> addition
> > > > to DataStream API. As the Table and SQL world is growing quickly,
> > > > stabilizes in its concepts, and is considered as another top-level
> API
> > > > and closed ecosystem, it is time to restructure the docs a little bit
> > > > to represent the vision of FLIP-32.
> > > >
> > > > Current state:
> > > > https://ci.apache.org/projects/flink/flink-docs-master/dev/table/
> > > >
> > > > We would like to propose the following FLIP-60 for a new structure:
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=127405685
> > > >
> > > >
> > > > Looking forward to feedback.
> > > >
> > > > Thanks,
> > > >
> > > > Timo
> > > >
> > > >
> > > >
> > >
> > >
> >
>

Re: Flink SQL - Support Computed Columns in DDL?

2019-09-03 Thread Jark Wu

Hi Qi,

The computed column is not fully supported in 1.9. We will start a design
discussion in the dev mailing list soon. Please stay tuned!

Btw, could you share with us what's the case why do you want to use
computed column?

Best,
Jark

On Tue, 3 Sep 2019 at 19:25, Danny Chan  wrote:

> Yeah, we are planning to implement this feature in release-1.10, wait for
> our good news !
>
> Best,
> Danny Chan
> 在 2019年9月3日 +0800 PM6:19，Qi Luo ，写道：
> > Hi folks,
> >
> > Computed columns in Flink SQL DDL is currently disabled in both old
> planner
> > and Blink planner (throws "Computed columns for DDL is not supported
> yet!"
> > exception in SqlToOperationConverter).
> >
> > I searched through the JIRA but found no relevant issues. Do we have any
> > plans to support this nice feature?
> >
> > Thanks,
> > Qi
>

Re: [DISCUSS] Releasing Flink 1.8.2

2019-09-03 Thread Jark Wu

Thanks Kostas for the quick fixing.

However, I find that FLINK-13940 still target to 1.8.2 as a blocker. If I
understand correctly, FLINK-13940 is aiming for a nicer and better solution
in the future.
So should we update the fixVersion of FLINK-13940?

Best,
Jark

On Tue, 3 Sep 2019 at 21:33, Kostas Kloudas  wrote:

> Thanks for waiting!
>
> A fix for FLINK-13940 has been merged on 1.8, 1.9 and the master under
> FLINK-13941.
>
> Cheers,
> Kostas
>
> On Tue, Sep 3, 2019 at 11:25 AM jincheng sun 
> wrote:
> >
> > +1 FLINK-13940 <https://issues.apache.org/jira/browse/FLINK-13940> is a
> > blocker, due to loss data is very important bug, And great thanks for
> > helping fix it  Kostas!
> >
> > Best, Jincheng
> >
> > Kostas Kloudas  于2019年9月2日周一 下午7:20写道：
> >
> > > Hi all,
> > >
> > > I think this should be also considered a blocker
> > > https://issues.apache.org/jira/browse/FLINK-13940.
> > > It is not a regression but it can result to data loss.
> > >
> > > I think I can have a quick fix by tomorrow.
> > >
> > > Cheers,
> > > Kostas
> > >
> > > On Mon, Sep 2, 2019 at 12:01 PM jincheng sun  >
> > > wrote:
> > > >
> > > > Thanks for all of your feedback!
> > > >
> > > > Hi Jark, Glad to see that you are doing what RM should doing.
> > > >
> > > > Only one tips here is before the RC1 all the blocker should be
> fixed, but
> > > > othrers is nice to have. So you can decide when to prepare RC1 after
> the
> > > > blokcer is resolved.
> > > >
> > > > Feel free to tell me if you have any questions.
> > > >
> > > > Best,Jincheng
> > > >
> > > > Aljoscha Krettek  于2019年9月2日周一 下午5:03写道：
> > > >
> > > > > I cut a PR for FLINK-13586:
> https://github.com/apache/flink/pull/9595
> > > <
> > > > > https://github.com/apache/flink/pull/9595>
> > > > >
> > > > > > On 2. Sep 2019, at 05:03, Yu Li  wrote:
> > > > > >
> > > > > > +1 for a 1.8.2 release, thanks for bringing this up Jincheng!
> > > > > >
> > > > > > Best Regards,
> > > > > > Yu
> > > > > >
> > > > > >
> > > > > > On Mon, 2 Sep 2019 at 09:19, Thomas Weise 
> wrote:
> > > > > >
> > > > > >> +1 for the 1.8.2 release
> > > > > >>
> > > > > >> I marked https://issues.apache.org/jira/browse/FLINK-13586 for
> this
> > > > > >> release. It would be good to compensate for the backward
> > > incompatible
> > > > > >> change to ClosureCleaner that was introduced in 1.8.1, which
> affects
> > > > > >> downstream dependencies.
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Thomas
> > > > > >>
> > > > > >>
> > > > > >> On Sun, Sep 1, 2019 at 5:10 PM jincheng sun <
> > > sunjincheng...@gmail.com>
> > > > > >> wrote:
> > > > > >>
> > > > > >>> Hi Jark,
> > > > > >>>
> > > > > >>> Glad to hear that you want to be the Release Manager of flink
> > > 1.8.2.
> > > > > >>> I believe that you will be a great RM, and I am very willing to
> > > help
> > > > > you
> > > > > >>> with the final release in the final stages. :)
> > > > > >>>
> > > > > >>> The release of Apache Flink involves a number of tasks. For
> > > details,
> > > > > you
> > > > > >>> can consult the documentation [1]. If you have any questions,
> > > please
> > > > > let
> > > > > >> me
> > > > > >>> know and let us work together.
> > > > > >>>
> > > > > >>>
> > > > > >>>
> > > > > >>
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release#CreatingaFlinkRelease-Checklisttoproceedtothenextstep.1
> > > > > >>>
> > > > > >>> Cheers,
> > > > > >>> Jincheng
> > > > > >>>
> > > > > >>> Till Ro

Re: [DISCUSS] Releasing Flink 1.8.2

2019-09-03 Thread Jark Wu

Hi all,

I am very happy to say that all the blockers and critical issues for
release 1.8.2 have been resolved!

Great thanks to everyone who contribute to the release.

I hope to create the first RC on Sep 05, at 10:00 UTC+8.
If you find some other blocker issues for 1.8.2, please let me know before
that to account for it for the 1.8.2 release.

Before cutting the RC1, I think it has chance to merge the
ClosureCleaner.clean fix (FLINK-13586), because the review and travis are
both passed.

Cheers,
Jark

On Wed, 4 Sep 2019 at 00:45, Kostas Kloudas  wrote:

> Yes, I will do that Jark!
>
> Kostas
>
> On Tue, Sep 3, 2019 at 4:19 PM Jark Wu  wrote:
> >
> > Thanks Kostas for the quick fixing.
> >
> > However, I find that FLINK-13940 still target to 1.8.2 as a blocker. If I
> > understand correctly, FLINK-13940 is aiming for a nicer and better
> solution
> > in the future.
> > So should we update the fixVersion of FLINK-13940?
> >
> > Best,
> > Jark
> >
> > On Tue, 3 Sep 2019 at 21:33, Kostas Kloudas  wrote:
> >
> > > Thanks for waiting!
> > >
> > > A fix for FLINK-13940 has been merged on 1.8, 1.9 and the master under
> > > FLINK-13941.
> > >
> > > Cheers,
> > > Kostas
> > >
> > > On Tue, Sep 3, 2019 at 11:25 AM jincheng sun  >
> > > wrote:
> > > >
> > > > +1 FLINK-13940 <https://issues.apache.org/jira/browse/FLINK-13940>
> is a
> > > > blocker, due to loss data is very important bug, And great thanks for
> > > > helping fix it  Kostas!
> > > >
> > > > Best, Jincheng
> > > >
> > > > Kostas Kloudas  于2019年9月2日周一 下午7:20写道：
> > > >
> > > > > Hi all,
> > > > >
> > > > > I think this should be also considered a blocker
> > > > > https://issues.apache.org/jira/browse/FLINK-13940.
> > > > > It is not a regression but it can result to data loss.
> > > > >
> > > > > I think I can have a quick fix by tomorrow.
> > > > >
> > > > > Cheers,
> > > > > Kostas
> > > > >
> > > > > On Mon, Sep 2, 2019 at 12:01 PM jincheng sun <
> sunjincheng...@gmail.com
> > > >
> > > > > wrote:
> > > > > >
> > > > > > Thanks for all of your feedback!
> > > > > >
> > > > > > Hi Jark, Glad to see that you are doing what RM should doing.
> > > > > >
> > > > > > Only one tips here is before the RC1 all the blocker should be
> > > fixed, but
> > > > > > othrers is nice to have. So you can decide when to prepare RC1
> after
> > > the
> > > > > > blokcer is resolved.
> > > > > >
> > > > > > Feel free to tell me if you have any questions.
> > > > > >
> > > > > > Best,Jincheng
> > > > > >
> > > > > > Aljoscha Krettek  于2019年9月2日周一 下午5:03写道：
> > > > > >
> > > > > > > I cut a PR for FLINK-13586:
> > > https://github.com/apache/flink/pull/9595
> > > > > <
> > > > > > > https://github.com/apache/flink/pull/9595>
> > > > > > >
> > > > > > > > On 2. Sep 2019, at 05:03, Yu Li  wrote:
> > > > > > > >
> > > > > > > > +1 for a 1.8.2 release, thanks for bringing this up Jincheng!
> > > > > > > >
> > > > > > > > Best Regards,
> > > > > > > > Yu
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, 2 Sep 2019 at 09:19, Thomas Weise 
> > > wrote:
> > > > > > > >
> > > > > > > >> +1 for the 1.8.2 release
> > > > > > > >>
> > > > > > > >> I marked https://issues.apache.org/jira/browse/FLINK-13586
> for
> > > this
> > > > > > > >> release. It would be good to compensate for the backward
> > > > > incompatible
> > > > > > > >> change to ClosureCleaner that was introduced in 1.8.1, which
> > > affects
> > > > > > > >> downstream dependencies.
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >> Thomas
> > > > > > > >>
> > >

Re: [DISCUSS] Releasing Flink 1.8.2

2019-09-03 Thread Jark Wu

Thanks for the work Jincheng!

I have moved remaining major issues to 1.8.3 except FLINK-13586.

Hi @Aljoscha Krettek  , is that possible to merge
FLINK-13586 today?

Best,
Jark

On Wed, 4 Sep 2019 at 10:47, jincheng sun  wrote:

> Thanks for the udpate Jark!
>
> I have add the new version 1.8.3 in JIRA, could you please remark the
> JIRAs(Such as FLINK-13689) which we do not want merge into the 1.8.2
> release :)
>
>  You are right, I think FLINK-13586 is better to be contained in 1.8.2
> release!
>
> Thanks,
> Jincheng
>
>
> Jark Wu  于2019年9月4日周三 上午10:15写道：
>
> > Hi all,
> >
> > I am very happy to say that all the blockers and critical issues for
> > release 1.8.2 have been resolved!
> >
> > Great thanks to everyone who contribute to the release.
> >
> > I hope to create the first RC on Sep 05, at 10:00 UTC+8.
> > If you find some other blocker issues for 1.8.2, please let me know
> before
> > that to account for it for the 1.8.2 release.
> >
> > Before cutting the RC1, I think it has chance to merge the
> > ClosureCleaner.clean fix (FLINK-13586), because the review and travis are
> > both passed.
> >
> > Cheers,
> > Jark
> >
> > On Wed, 4 Sep 2019 at 00:45, Kostas Kloudas  wrote:
> >
> > > Yes, I will do that Jark!
> > >
> > > Kostas
> > >
> > > On Tue, Sep 3, 2019 at 4:19 PM Jark Wu  wrote:
> > > >
> > > > Thanks Kostas for the quick fixing.
> > > >
> > > > However, I find that FLINK-13940 still target to 1.8.2 as a blocker.
> > If I
> > > > understand correctly, FLINK-13940 is aiming for a nicer and better
> > > solution
> > > > in the future.
> > > > So should we update the fixVersion of FLINK-13940?
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Tue, 3 Sep 2019 at 21:33, Kostas Kloudas 
> > wrote:
> > > >
> > > > > Thanks for waiting!
> > > > >
> > > > > A fix for FLINK-13940 has been merged on 1.8, 1.9 and the master
> > under
> > > > > FLINK-13941.
> > > > >
> > > > > Cheers,
> > > > > Kostas
> > > > >
> > > > > On Tue, Sep 3, 2019 at 11:25 AM jincheng sun <
> > sunjincheng...@gmail.com
> > > >
> > > > > wrote:
> > > > > >
> > > > > > +1 FLINK-13940 <
> https://issues.apache.org/jira/browse/FLINK-13940>
> > > is a
> > > > > > blocker, due to loss data is very important bug, And great thanks
> > for
> > > > > > helping fix it  Kostas!
> > > > > >
> > > > > > Best, Jincheng
> > > > > >
> > > > > > Kostas Kloudas  于2019年9月2日周一 下午7:20写道：
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I think this should be also considered a blocker
> > > > > > > https://issues.apache.org/jira/browse/FLINK-13940.
> > > > > > > It is not a regression but it can result to data loss.
> > > > > > >
> > > > > > > I think I can have a quick fix by tomorrow.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kostas
> > > > > > >
> > > > > > > On Mon, Sep 2, 2019 at 12:01 PM jincheng sun <
> > > sunjincheng...@gmail.com
> > > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Thanks for all of your feedback!
> > > > > > > >
> > > > > > > > Hi Jark, Glad to see that you are doing what RM should doing.
> > > > > > > >
> > > > > > > > Only one tips here is before the RC1 all the blocker should
> be
> > > > > fixed, but
> > > > > > > > othrers is nice to have. So you can decide when to prepare
> RC1
> > > after
> > > > > the
> > > > > > > > blokcer is resolved.
> > > > > > > >
> > > > > > > > Feel free to tell me if you have any questions.
> > > > > > > >
> > > > > > > > Best,Jincheng
> > > > > > > >
> > > > > > > > Aljoscha Krettek  于2019年9月2日周一
> 下午5:03写道：
> > > > >

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-04 Thread Jark Wu

Hi all,

Regarding #1 temp function <> built-in function and naming.
I'm fine with temp functions should precede built-in function and can
override built-in functions (we already support to override built-in
function in 1.9).
If we don't allow the same name as a built-in function, I'm afraid we will
have compatibility issues in the future.
Say users register a user defined function named "explode" in 1.9, and we
support a built-in "explode" function in 1.10.
Then the user's jobs which call the registered "explode" function in 1.9
will all fail in 1.10 because of naming conflict.

Regarding #2 "External" built-in functions.
I think if we store external built-in functions in catalog, then
"hive1::sqrt" is a good way to go.
However, I would prefer to support a discovery mechanism (e.g. SPI) for
built-in functions as Timo suggested above.
This gives us the flexibility to add Hive or MySQL or Geo or whatever
function set as built-in functions in an easy way.

Best,
Jark

On Wed, 4 Sep 2019 at 17:47, Xuefu Z  wrote:

> Hi David,
>
> Thank you for sharing your findings. It seems to me that there is no SQL
> standard regarding temporary functions. There are few systems that support
> it. Here are what I have found:
>
> 1. Hive: no DB qualifier allowed. Can overwrite built-in.
> 2. Spark: basically follows Hive (
>
> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
> )
> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of overwriting
> behavior. (
> http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html)
>
> Because of lack of standard, it's perfectly fine for Flink to define
> whatever it sees appropriate. Thus, your proposal (no overwriting and must
> have DB as holder) is one option. The advantage is simplicity, The downside
> is the deviation from Hive, which is popular and de facto standard in big
> data world.
>
> However, I don't think we have to follow Hive. More importantly, we need a
> consensus. I have no objection if your proposal is generally agreed upon.
>
> Thanks,
> Xuefu
>
> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz 
> wrote:
>
> > Hi all,
> >
> > Just an opinion on the built-in <> temporary functions resolution and
> > NAMING issue. I think we should not allow overriding the built-in
> > functions, as this may pose serious issues and to be honest is rather
> > not feasible and would require major rework. What happens if a user
> > wants to override CAST? Calls to that function are generated at
> > different layers of the stack that unfortunately does not always go
> > through the Catalog API (at least yet). Moreover from what I've checked
> > no other systems allow overriding the built-in functions. All the
> > systems I've checked so far register temporary functions in a
> > database/schema (either special database for temporary functions, or
> > just current database). What I would suggest is to always register
> > temporary functions with a 3 part identifier. The same way as tables,
> > views etc. This effectively means you cannot override built-in
> > functions. With such approach it is natural that the temporary functions
> > end up a step lower in the resolution order:
> >
> > 1. built-in functions (1 part, maybe 2? - this is still under discussion)
> >
> > 2. temporary functions (always 3 part path)
> >
> > 3. catalog functions (always 3 part path)
> >
> > Let me know what do you think.
> >
> > Best,
> >
> > Dawid
> >
> > On 04/09/2019 06:13, Bowen Li wrote:
> > > Hi,
> > >
> > > I agree with Xuefu that the main controversial points are mainly the
> two
> > > places. My thoughts on them:
> > >
> > > 1) Determinism of referencing Hive built-in functions. We can either
> > remove
> > > Hive built-in functions from ambiguous function resolution and require
> > > users to use special syntax for their qualified names, or add a config
> > flag
> > > to catalog constructor/yaml for turning on and off Hive built-in
> > functions
> > > with the flag set to 'false' by default and proper doc added to help
> > users
> > > make their decisions.
> > >
> > > 2) Flink temp functions v.s. Flink built-in functions in ambiguous
> > function
> > > resolution order. We believe Flink temp functions should precede Flink
> > > built-in functions, and I have presented my reasons. Just in case if we
> > > cannot reach an agreement, I propose forbid users registering temp
> >

Re: [VOTE] FLIP-62: Set default restart delay for FixedDelay- and FailureRateRestartStrategy to 1s

2019-09-04 Thread Jark Wu

+1

Best,
Jark

> 在 2019年9月4日，19:43，David Morávek  写道：
> 
> +1
> 
> On Wed, Sep 4, 2019 at 1:38 PM Till Rohrmann  wrote:
> 
>> +1 (binding)
>> 
>> On Wed, Sep 4, 2019 at 12:43 PM Chesnay Schepler 
>> wrote:
>> 
>>> +1 (binding)
>>> 
>>> On 04/09/2019 11:18, JingsongLee wrote:
>>>> +1 (non-binding)
>>>> 
>>>> default 0 is really not user production friendly.
>>>> 
>>>> Best,
>>>> Jingsong Lee
>>>> 
>>>> 
>>>> --
>>>> From:Zhu Zhu 
>>>> Send Time:2019年9月4日(星期三) 17:13
>>>> To:dev 
>>>> Subject:Re: [VOTE] FLIP-62: Set default restart delay for FixedDelay-
>>> and FailureRateRestartStrategy to 1s
>>>> 
>>>> +1 (non-binding)
>>>> 
>>>> Thanks,
>>>> Zhu Zhu
>>>> 
>>>> Till Rohrmann  于2019年9月4日周三 下午5:06写道：
>>>> 
>>>>> Hi everyone,
>>>>> 
>>>>> I would like to start the voting process for FLIP-62 [1], which
>>>>> is discussed and reached consensus in this thread [2].
>>>>> 
>>>>> Since the change is rather small I'd like to shorten the voting period
>>> to
>>>>> 48 hours. Hence, I'll try to close it September 6th, 11:00 am CET,
>>> unless
>>>>> there is an objection or not enough votes.
>>>>> 
>>>>> [1]
>>>>> 
>>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-62%3A+Set+default+restart+delay+for+FixedDelay-+and+FailureRateRestartStrategy+to+1s
>>>>> [2]
>>>>> 
>>>>> 
>>> 
>> https://lists.apache.org/thread.html/9602b342602a0181fcb618581f3b12e692ed2fad98c59fd6c1caeabd@%3Cdev.flink.apache.org%3E
>>>>> 
>>>>> Cheers,
>>>>> Till
>>>>> 
>>> 
>>> 
>>

Re: [DISCUSS] Support JSON functions in Flink SQL

2019-09-04 Thread Jark Wu

Hi Forward,

Thanks for bringing this discussion and preparing the nice design.
I think it's nice to have the JSON functions in the next release.
We have received some requirements for this feature.

I can help to shepherd this JSON functions effort and will leave comments
 in the design doc in the next days.

Hi Danny,

The new introduced JSON functions are from SQL:2016, not from MySQL.
So there no JSON type is needed. According to the SQL:2016, the
representation of JSON data can be "character string" which is also
the current implementation in Calcite[1].

Best,
Jark


[1]: https://calcite.apache.org/docs/reference.html#json-functions


On Wed, 4 Sep 2019 at 21:22, Xu Forward  wrote:

> hi Danny Chan ,Thank you very much for your reply, your help can help me
> further improve this discussion.
> Best
> forward
>
> Danny Chan  于2019年9月4日周三 下午8:50写道：
>
> > Thanks Xu Forward for bring up this topic, I think the JSON functions are
> > very useful especially for those MySQL users.
> >
> > I saw that you have done some work within the Apache Calcite, that’s a
> > good start, but this is one concern from me, Flink doesn’t support JSON
> > type internal, so how to represent a JSON object in Flink maybe a key
> point
> > we need to resolve. In Calcite, we use ANY type to represent as the JSON,
> > but I don’t think it is the right way to go, maybe we can have a
> discussion
> > here.
> >
> > Best,
> > Danny Chan
> > 在 2019年9月4日 +0800 PM8:34，Xu Forward ，写道：
> > > Hi everybody,
> > >
> > > I'd like to kick off a discussion on Support JSON functions in Flink
> SQL.
> > >
> > > The entire plan is divided into two steps:
> > > 1. Implement Support SQL 2016-2017 JSON functions in Flink SQL[1].
> > > 2. Implement non-Support SQL 2016-2017 JSON functions in Flink SQL,
> such
> > as
> > > JSON_TYPE in Mysql, JSON_LENGTH, etc. Very useful JSON functions.
> > >
> > > Would love to hear your thoughts.
> > >
> > > [1]
> > >
> >
> https://docs.google.com/document/d/1JfaFYIFOAY8P2pFhOYNCQ9RTzwF4l85_bnTvImOLKMk/edit#heading=h.76mb88ca6yjp
> > >
> > > Best,
> > > ForwardXu
> >
>

[DISCUSS] FLIP-66: Support time attribute in SQL DDL

2019-09-05 Thread Jark Wu

Hi everyone,

I would like to start discussion about how to support time attribute in SQL
DDL.
In Flink 1.9, we already introduced a basic SQL DDL to create a table.
However, it doesn't support to define time attributes. This makes users
can't
apply window operations on the tables created by DDL which is a bad
experience.

In FLIP-66, we propose a syntax for watermark to define rowtime attribute
and propose to use computed column syntax to define proctime attribute.
But computed column is another big topic and should deserve a separate
FLIP.
If we have a consensus on the computed column approach, we will start
computed column FLIP soon.

FLIP-66:
https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit#

Thanks for any feedback!

Best,
Jark

Re: [DISCUSS] Support notifyOnMaster for notifyCheckpointComplete

2019-09-05 Thread Jark Wu

I think before we have such interface, maybe we can make task-0 to do the 
global finalize work. 


Best,
Jark


> 在 2019年9月6日，13:39，shimin yang  写道：
> 
> Hi Jingsong,
> 
> Big fan of this idea. We faced the same problem and resolved by adding a
> distributed lock. It would be nice to have this feature in JobMaster, which
> can replace the lock.
> 
> Best,
> Shimin
> 
> JingsongLee  于2019年9月6日周五 下午12:20写道：
> 
>> Hi devs:
>> 
>> I try to implement streaming file sink for table[1] like StreamingFileSink.
>> If the underlying is a HiveFormat, or a format that updates visibility
>> through a metaStore, I have to update the metaStore in the
>> notifyCheckpointComplete, but this operation occurs on the task side,
>> which will lead to distributed access to the metaStore, which will
>> lead to bottleneck.
>> 
>> So I'm curious if we can support notifyOnMaster for
>> notifyCheckpointComplete like FinalizeOnMaster.
>> 
>> What do you think?
>> 
>> [1]
>> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
>> 
>> Best,
>> Jingsong Lee

Is Flink documentation deployment script broken ?

2019-09-06 Thread Jark Wu

Hi all,

I merged several documentation pull requests[1][2][3] days ago.
AFAIK, the documentation deployment is scheduled every day.
However, I didn't see the changes are available in the Flink doc website[4]
until now.
The same to Till's PR[5] merged 3 days ago.


Best,
Jark

[1]: https://github.com/apache/flink/pull/9545
[2]: https://github.com/apache/flink/pull/9511
[3]: https://github.com/apache/flink/pull/9525
[4]: https://ci.apache.org/projects/flink/flink-docs-master/
[5]: https://github.com/apache/flink/pull/9571

Re: Is Flink documentation deployment script broken ?

2019-09-06 Thread Jark Wu

Thanks Chesnay for reporting this. 


> 在 2019年9月6日，17:47，Chesnay Schepler  写道：
> 
> The scripts are fine, but the buildbot slave is currently down.
> 
> I've already opened a ticket with INFRA: 
> https://issues.apache.org/jira/browse/INFRA-18986
> 
> On 06/09/2019 11:44, Jark Wu wrote:
>> Hi all,
>> 
>> I merged several documentation pull requests[1][2][3] days ago.
>> AFAIK, the documentation deployment is scheduled every day.
>> However, I didn't see the changes are available in the Flink doc website[4]
>> until now.
>> The same to Till's PR[5] merged 3 days ago.
>> 
>> 
>> Best,
>> Jark
>> 
>> [1]: https://github.com/apache/flink/pull/9545
>> [2]: https://github.com/apache/flink/pull/9511
>> [3]: https://github.com/apache/flink/pull/9525
>> [4]: https://ci.apache.org/projects/flink/flink-docs-master/
>> [5]: https://github.com/apache/flink/pull/9571
>> 
>

[VOTE] Release 1.8.2, release candidate #1

2019-09-06 Thread Jark Wu

 Hi everyone,

Please review and vote on the release candidate #1 for the version 1.8.2,
as follows:
[ ] +1, Approve the release
[ ] -1, Do not approve the release (please provide specific comments)


The complete staging area is available for your review, which includes:
* JIRA release notes [1],
* the official Apache source release and binary convenience releases to be
deployed to dist.apache.org [2], which are signed with the key with
fingerprint E2C45417BED5C104154F341085BACB5AEFAE3202 [3],
* all artifacts to be deployed to the Maven Central Repository [4],
* source code tag "release-1.8.2-rc1" [5],
* website pull request listing the new release and adding announcement blog
post [6].

The vote will be open for at least 72 hours.
Please cast your votes before *Sep. 11th 2019, 13:00 UTC*.

It is adopted by majority approval, with at least 3 PMC affirmative votes.

Thanks,
Jark

[1]
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345670
[2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.2-rc1/
[3] https://dist.apache.org/repos/dist/release/flink/KEYS
[4] https://repository.apache.org/content/repositories/orgapacheflink-1262
[5]
https://github.com/apache/flink/commit/6322618bb0f1b7942d86cb1b2b7bc55290d9e330
[6] https://github.com/apache/flink-web/pull/262

Re: [DISCUSS] Releasing Flink 1.8.2

2019-09-06 Thread Jark Wu

Hi all,

Thanks all of you for fixing issues for 1.8.2 release!
The VOTE mail thread of the first RC of 1.8.2 already brought up.
I would appreciate if you can help to check the release and VOTE the RC1.

Thanks,
Jark

On Wed, 4 Sep 2019 at 16:57, Aljoscha Krettek  wrote:

> Hi,
>
> I’m just running the last tests on FLINK-13586 on Travis and them I’m
> merging.
>
> Best,
> Aljoscha
>
> On 4. Sep 2019, at 07:37, Jark Wu  wrote:
>
> Thanks for the work Jincheng!
>
> I have moved remaining major issues to 1.8.3 except FLINK-13586.
>
> Hi @Aljoscha Krettek  , is that possible to merge
> FLINK-13586 today?
>
> Best,
> Jark
>
> On Wed, 4 Sep 2019 at 10:47, jincheng sun 
> wrote:
>
>> Thanks for the udpate Jark!
>>
>> I have add the new version 1.8.3 in JIRA, could you please remark the
>> JIRAs(Such as FLINK-13689) which we do not want merge into the 1.8.2
>> release :)
>>
>>  You are right, I think FLINK-13586 is better to be contained in 1.8.2
>> release!
>>
>> Thanks,
>> Jincheng
>>
>>
>> Jark Wu  于2019年9月4日周三 上午10:15写道：
>>
>> > Hi all,
>> >
>> > I am very happy to say that all the blockers and critical issues for
>> > release 1.8.2 have been resolved!
>> >
>> > Great thanks to everyone who contribute to the release.
>> >
>> > I hope to create the first RC on Sep 05, at 10:00 UTC+8.
>> > If you find some other blocker issues for 1.8.2, please let me know
>> before
>> > that to account for it for the 1.8.2 release.
>> >
>> > Before cutting the RC1, I think it has chance to merge the
>> > ClosureCleaner.clean fix (FLINK-13586), because the review and travis
>> are
>> > both passed.
>> >
>> > Cheers,
>> > Jark
>> >
>> > On Wed, 4 Sep 2019 at 00:45, Kostas Kloudas  wrote:
>> >
>> > > Yes, I will do that Jark!
>> > >
>> > > Kostas
>> > >
>> > > On Tue, Sep 3, 2019 at 4:19 PM Jark Wu  wrote:
>> > > >
>> > > > Thanks Kostas for the quick fixing.
>> > > >
>> > > > However, I find that FLINK-13940 still target to 1.8.2 as a blocker.
>> > If I
>> > > > understand correctly, FLINK-13940 is aiming for a nicer and better
>> > > solution
>> > > > in the future.
>> > > > So should we update the fixVersion of FLINK-13940?
>> > > >
>> > > > Best,
>> > > > Jark
>> > > >
>> > > > On Tue, 3 Sep 2019 at 21:33, Kostas Kloudas 
>> > wrote:
>> > > >
>> > > > > Thanks for waiting!
>> > > > >
>> > > > > A fix for FLINK-13940 has been merged on 1.8, 1.9 and the master
>> > under
>> > > > > FLINK-13941.
>> > > > >
>> > > > > Cheers,
>> > > > > Kostas
>> > > > >
>> > > > > On Tue, Sep 3, 2019 at 11:25 AM jincheng sun <
>> > sunjincheng...@gmail.com
>> > > >
>> > > > > wrote:
>> > > > > >
>> > > > > > +1 FLINK-13940 <
>> https://issues.apache.org/jira/browse/FLINK-13940>
>> > > is a
>> > > > > > blocker, due to loss data is very important bug, And great
>> thanks
>> > for
>> > > > > > helping fix it  Kostas!
>> > > > > >
>> > > > > > Best, Jincheng
>> > > > > >
>> > > > > > Kostas Kloudas  于2019年9月2日周一 下午7:20写道：
>> > > > > >
>> > > > > > > Hi all,
>> > > > > > >
>> > > > > > > I think this should be also considered a blocker
>> > > > > > > https://issues.apache.org/jira/browse/FLINK-13940.
>> > > > > > > It is not a regression but it can result to data loss.
>> > > > > > >
>> > > > > > > I think I can have a quick fix by tomorrow.
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > > Kostas
>> > > > > > >
>> > > > > > > On Mon, Sep 2, 2019 at 12:01 PM jincheng sun <
>> > > sunjincheng...@gmail.com
>> > > > > >
>> > > > > > > wrote:
>> > > > > > > >
>> > > > > > > > Thanks for all

Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

2019-09-06 Thread Jark Wu

Congratulations Klou!


> 在 2019年9月7日，00:21，zhijiang  写道：
> 
> Congratulations Klou!
> 
> Best,
> Zhijiang
> --
> From:Zhu Zhu 
> Send Time:2019年9月6日(星期五) 17:19
> To:dev 
> Subject:Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC
> 
> Congratulations Kostas!
> 
> Thanks,
> Zhu Zhu
> 
> Yu Li  于2019年9月6日周五 下午10:49写道：
> 
>> Congratulations Klou!
>> 
>> Best Regards,
>> Yu
>> 
>> 
>> On Fri, 6 Sep 2019 at 22:43, Forward Xu  wrote:
>> 
>>> Congratulations Kloudas!
>>> 
>>> 
>>> Best,
>>> 
>>> Forward
>>> 
>>> Dawid Wysakowicz  于2019年9月6日周五 下午10:36写道：
>>> 
 Congratulations Klou!
 
 Best,
 
 Dawid
 
 On 06/09/2019 14:55, Fabian Hueske wrote:
> Hi everyone,
> 
> I'm very happy to announce that Kostas Kloudas is joining the Flink
>>> PMC.
> Kostas is contributing to Flink for many years and puts lots of
>> effort
>>> in
> helping our users and growing the Flink community.
> 
> Please join me in congratulating Kostas!
> 
> Cheers,
> Fabian
> 
 
 
>>> 
>> 
>

Re: [DISCUSS] Features for Apache Flink 1.10

2019-09-06 Thread Jark Wu

Thanks Gary for kicking off the discussion for 1.10 release.

+1 for Gary and Yu as release managers. Thank you for you effort. 

Best,
Jark


> 在 2019年9月7日，00:52，zhijiang  写道：
> 
> Hi Gary,
> 
> Thanks for kicking off the features for next release 1.10.  I am very 
> supportive of you and Yu Li to be the relaese managers.
> 
> Just mention another two improvements which want to be covered in FLINK-1.10 
> and I already confirmed with Piotr to reach an agreement before.
> 
> 1. Data serialize and copy only once for broadcast partition [1]: It would 
> improve the throughput performance greatly in broadcast mode and was actually 
> proposed in Flink-1.8. Most of works already done before and only left the 
> last critical jira/PR. It will not take much efforts to make it ready.
> 
> 2. Let Netty use Flink's buffers directly in credit-based mode [2] : It could 
> avoid memory copy from netty stack to flink managed network buffer. The 
> obvious benefit is decreasing the direct memory overhead greatly in 
> large-scale jobs. I also heard of some user cases encounter direct OOM caused 
> by netty memory overhead. Actually this improvment was proposed by nico in 
> FLINK-1.7 and always no time to focus then. Yun Gao already submitted a PR 
> half an year ago but have not been reviewed yet. I could help review the 
> deign and PR codes to make it ready. 
> 
> And you could make these two items as lowest priority if possible.
> 
> [1] https://issues.apache.org/jira/browse/FLINK-10745
> [2] https://issues.apache.org/jira/browse/FLINK-10742
> 
> Best,
> Zhijiang
> --
> From:Gary Yao 
> Send Time:2019年9月6日(星期五) 17:06
> To:dev 
> Cc:carp84 
> Subject:[DISCUSS] Features for Apache Flink 1.10
> 
> Hi community,
> 
> Since Apache Flink 1.9.0 has been released more than 2 weeks ago, I want to
> start kicking off the discussion about what we want to achieve for the 1.10
> release.
> 
> Based on discussions with various people as well as observations from
> mailing
> list threads, Yu Li and I have compiled a list of features that we deem
> important to be included in the next release. Note that the features
> presented
> here are not meant to be exhaustive. As always, I am sure that there will be
> other contributions that will make it into the next release. This email
> thread
> is merely to kick off a discussion, and to give users and contributors an
> understanding where the focus of the next release lies. If there is anything
> we have missed that somebody is working on, please reply to this thread.
> 
> 
> ** Proposed features and focus
> 
> Following the contribution of Blink to Apache Flink, the community released
> a
> preview of the Blink SQL Query Processor, which offers better SQL coverage
> and
> improved performance for batch queries, in Flink 1.9.0. However, the
> integration of the Blink query processor is not fully completed yet as there
> are still pending tasks, such as implementing full TPC-DS support. With the
> next Flink release, we aim at finishing the Blink integration.
> 
> Furthermore, there are several ongoing work threads addressing long-standing
> issues reported by users, such as improving checkpointing under
> backpressure,
> and limiting RocksDBs native memory usage, which can be especially
> problematic
> in containerized Flink deployments.
> 
> Notable features surrounding Flink’s ecosystem that are planned for the next
> release include active Kubernetes support (i.e., enabling Flink’s
> ResourceManager to launch new pods), improved Hive integration, Java 11
> support, and new algorithms for the Flink ML library.
> 
> Below I have included the list of features that we compiled ordered by
> priority – some of which already have ongoing mailing list threads, JIRAs,
> or
> FLIPs.
> 
> - Improving Flink’s build system & CI [1] [2]
> - Support Java 11 [3]
> - Table API improvements
>- Configuration Evolution [4] [5]
>- Finish type system: Expression Re-design [6] and UDF refactor
>- Streaming DDL: Time attribute (watermark) and Changelog support
>- Full SQL partition support for both batch & streaming [7]
>- New Java Expression DSL [8]
>- SQL CLI with DDL and DML support
> - Hive compatibility completion (DDL/UDF) to support full Hive integration
>- Partition/Function/View support
> - Remaining Blink planner/runtime merge
>- Support all TPC-DS queries [9]
> - Finer grained resource management
>- Unified TaskExecutor Memory Configuration [10]
>- Fine Grained Operator Resource Management [11]
>- Dynamic Slots Allocation [12]
> - Finish scheduler re-architecture [13]
>

Re: [DISCUSS] FLIP-66: Support time attribute in SQL DDL

2019-09-09 Thread Jark Wu

Hi all,

Thanks all for so much feedbacks received in the doc so far.
I saw a general agreement on using computed column to support proctime
attribute and extract timestamps.
So we will prepare a computed column FLIP and share in the dev ML soon.

Feel free to leave more comments!

Best,
Jark



On Fri, 6 Sep 2019 at 13:50, Dian Fu  wrote:

> Hi Jark,
>
> Thanks for bringing up this discussion and the detailed design doc. This
> is definitely a critical feature for streaming SQL jobs. I have left a few
> comments in the design doc.
>
> Thanks,
> Dian
>
> > 在 2019年9月6日，上午11:48，Forward Xu  写道：
> >
> > Thanks Jark for this topic, This will be very useful.
> >
> >
> > Best,
> >
> > ForwardXu
> >
> > Danny Chan  于2019年9月6日周五 上午11:26写道：
> >
> >> Thanks Jark for bring up this topic, this is definitely an import
> feature
> >> for the SQL, especially the DDL users.
> >>
> >> I would spend some time to review this design doc, really thanks.
> >>
> >> Best,
> >> Danny Chan
> >> 在 2019年9月6日 +0800 AM11:19，Jark Wu ，写道：
> >>> Hi everyone,
> >>>
> >>> I would like to start discussion about how to support time attribute in
> >> SQL
> >>> DDL.
> >>> In Flink 1.9, we already introduced a basic SQL DDL to create a table.
> >>> However, it doesn't support to define time attributes. This makes users
> >>> can't
> >>> apply window operations on the tables created by DDL which is a bad
> >>> experience.
> >>>
> >>> In FLIP-66, we propose a syntax for watermark to define rowtime
> attribute
> >>> and propose to use computed column syntax to define proctime attribute.
> >>> But computed column is another big topic and should deserve a separate
> >>> FLIP.
> >>> If we have a consensus on the computed column approach, we will start
> >>> computed column FLIP soon.
> >>>
> >>> FLIP-66:
> >>>
> >>
> https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit#
> >>>
> >>> Thanks for any feedback!
> >>>
> >>> Best,
> >>> Jark
> >>
>
>

Re: [ANNOUNCE] Zili Chen becomes a Flink committer

2019-09-11 Thread Jark Wu

Congratulations Zili!

Best,
Jark

On Wed, 11 Sep 2019 at 23:06,  wrote:

> Congratulations, Zili.
>
>
>
> Best,
>
> Xingcan
>
>
>
> *From:* SHI Xiaogang 
> *Sent:* Wednesday, September 11, 2019 7:43 AM
> *To:* Guowei Ma 
> *Cc:* Fabian Hueske ; Biao Liu ;
> Oytun Tez ; bupt_ljy ; dev <
> dev@flink.apache.org>; user ; Till Rohrmann <
> trohrm...@apache.org>
> *Subject:* Re: [ANNOUNCE] Zili Chen becomes a Flink committer
>
>
>
> Congratulations!
>
>
>
> Regards,
>
> Xiaogang
>
>
>
> Guowei Ma  于2019年9月11日周三 下午7:07写道：
>
> Congratulations Zili !
>
>
> Best,
>
> Guowei
>
>
>
>
>
> Fabian Hueske  于2019年9月11日周三 下午7:02写道：
>
> Congrats Zili Chen :-)
>
>
>
> Cheers, Fabian
>
>
>
> Am Mi., 11. Sept. 2019 um 12:48 Uhr schrieb Biao Liu :
>
> Congrats Zili!
>
>
>
> Thanks,
>
> Biao /'bɪ.aʊ/
>
>
>
>
>
>
>
> On Wed, 11 Sep 2019 at 18:43, Oytun Tez  wrote:
>
> Congratulations!
>
>
>
> ---
>
> Oytun Tez
>
>
>
> *M O T A W O R D*
>
> *The World's Fastest Human Translation Platform.*
>
> oy...@motaword.com — www.motaword.com
>
>
>
>
>
> On Wed, Sep 11, 2019 at 6:36 AM bupt_ljy  wrote:
>
> Congratulations!
>
>
>
> Best,
>
> Jiayi Liao
>
>
>
>  Original Message
>
> *Sender:* Till Rohrmann
>
> *Recipient:* dev; user
>
> *Date:* Wednesday, Sep 11, 2019 17:22
>
> *Subject:* [ANNOUNCE] Zili Chen becomes a Flink committer
>
>
>
> Hi everyone,
>
>
>
> I'm very happy to announce that Zili Chen (some of you might also know
> him as Tison Kun) accepted the offer of the Flink PMC to become a committer
> of the Flink project.
>
>
>
> Zili Chen has been an active community member for almost 16 months now.
> He helped pushing the Flip-6 effort over the finish line, ported a lot of
> legacy code tests, removed a good part of the legacy code, contributed
> numerous fixes, is involved in the Flink's client API refactoring, drives
> the refactoring of Flink's HighAvailabilityServices and much more. Zili
> Chen also helped the community by PR reviews, reporting Flink issues,
> answering user mails and being very active on the dev mailing list.
>
>
>
> Congratulations Zili Chen!
>
>
>
> Best, Till
>
> (on behalf of the Flink PMC)
>
>

Re: [VOTE] Release 1.8.2, release candidate #1

2019-09-11 Thread Jark Wu

+1 (non-binding)

- checked/verified signatures and hashes
- built from source, without Hadoop and using Scala 2.12
- checked that all POM files point to the same version
- started a cluster; WebUI is accessible, submit example jobs, no
suspicious log output
- manually verified the diff pom files between 1.8.1 and 1.8.2 to check
dependencies, looks good

Best,
Jark

On Wed, 11 Sep 2019 at 00:12, Till Rohrmann  wrote:

> +1 (binding)
>
> - verified checksums and signatures
> - no binary files in source release
> - built Flink from source release with Scala 2.12, running all tests
> - Verified that no new dependencies have been added
> - Executed simple example jobs locally (worked)
>
> Cheers,
> Till
>
> On Tue, Sep 10, 2019 at 8:00 AM Kurt Young  wrote:
>
> > +1 (binding)
> >
> > - build from source and passed all tests locally
> > - checked the difference between 1.8.1 and 1.8.2, no legal risk found
> > - went through all commits checked in between 1.8.1 and 1.8.2, make
> > sure all the issues set the proper "fixVersion" property
> >
> > Best,
> > Kurt
> >
> >
> > On Mon, Sep 9, 2019 at 8:45 PM Dian Fu  wrote:
> >
> > > +1 (non-binding)
> > >
> > > - built from source successfully (mvn clean install -DskipTests)
> > > - checked gpg signature and hashes of the source release and binary
> > > release packages
> > > - All artifacts have been deployed to the maven central repository
> > > - no new dependencies were added since 1.8.1
> > > - run a couple of tests in IDE success
> > >
> > > Regards,
> > > Dian
> > >
> > > > 在 2019年9月9日，下午2:28，jincheng sun  写道：
> > > >
> > > > +1 (binding)
> > > >
> > > > - checked signatures [SUCCESS]
> > > > - built from source without tests [SUCCESS]
> > > > - ran some tests in IDE [SUCCESS]
> > > > - start local cluster and submit word count example [SUCCESS]
> > > > - announcement PR for website looks good! (I have left a few
> comments)
> > > >
> > > > Best,
> > > > Jincheng
> > > >
> > > > Jark Wu  于2019年9月6日周五 下午8:47写道：
> > > >
> > > >> Hi everyone,
> > > >>
> > > >> Please review and vote on the release candidate #1 for the version
> > > 1.8.2,
> > > >> as follows:
> > > >> [ ] +1, Approve the release
> > > >> [ ] -1, Do not approve the release (please provide specific
> comments)
> > > >>
> > > >>
> > > >> The complete staging area is available for your review, which
> > includes:
> > > >> * JIRA release notes [1],
> > > >> * the official Apache source release and binary convenience releases
> > to
> > > be
> > > >> deployed to dist.apache.org [2], which are signed with the key with
> > > >> fingerprint E2C45417BED5C104154F341085BACB5AEFAE3202 [3],
> > > >> * all artifacts to be deployed to the Maven Central Repository [4],
> > > >> * source code tag "release-1.8.2-rc1" [5],
> > > >> * website pull request listing the new release and adding
> announcement
> > > blog
> > > >> post [6].
> > > >>
> > > >> The vote will be open for at least 72 hours.
> > > >> Please cast your votes before *Sep. 11th 2019, 13:00 UTC*.
> > > >>
> > > >> It is adopted by majority approval, with at least 3 PMC affirmative
> > > votes.
> > > >>
> > > >> Thanks,
> > > >> Jark
> > > >>
> > > >> [1]
> > > >>
> > > >>
> > >
> >
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345670
> > > >> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.2-rc1/
> > > >> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> > > >> [4]
> > > https://repository.apache.org/content/repositories/orgapacheflink-1262
> > > >> [5]
> > > >>
> > > >>
> > >
> >
> https://github.com/apache/flink/commit/6322618bb0f1b7942d86cb1b2b7bc55290d9e330
> > > >> [6] https://github.com/apache/flink-web/pull/262
> > > >>
> > >
> > >
> >
>

[RESULT] [VOTE] Release 1.8.2, release candidate #1

2019-09-11 Thread Jark Wu

I'm happy to announce that we have unanimously approved this release.

There are 5 approving votes, 3 of which are binding:
* jincheng (binding)
* Kurt (binding)
* Till (binding)
* Dian
* Jark

There are no disapproving votes.

Thanks everyone!

Re: Blink Planner HBaseUpsertTableSink Exception

2019-09-12 Thread Jark Wu

Hi Lake,

This is not a problem of HBaseUpsertTableSink.
This is because the query loses primary key (e.g. concat(key1, key2) will
lose the primary key information [key1, key2] currently.),
but the validation of inserting checks the upsert query should have a
primary key. That’s why the exception is thrown.

IMO, in order to fix this problem, we need to enrich the primary key
inference to support all kinds of built-in function/operators.
But this is a large work which means it may not happen in 1.9.1.

Regards,
Jark

On Thu, 12 Sep 2019 at 14:39, LakeShen  wrote:

> Hi community , when I create the hbase sink table  in my  flink ddl sql
> ,just like this:
>
>
>
>
>
> *create table sink_hbase_table(rowkey VARCHAR,cf
>   row(kdt_it_count  bigint )) with (xx);*
>
> and I run my flink task , it throws the exception like this :
> *UpsertStreamTableSink requires that Table has a full primary keys if it is
> updated.*
> at
>
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:115)
> at
>
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlanInternal(StreamExecSink.scala:50)
> at
>
> org.apache.flink.table.planner.plan.nodes.exec.ExecNode$class.translateToPlan(ExecNode.scala:54)
> at
>
> org.apache.flink.table.planner.plan.nodes.physical.stream.StreamExecSink.translateToPlan(StreamExecSink.scala:50)
> at
>
> org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:61)
> at
>
> org.apache.flink.table.planner.delegation.StreamPlanner$$anonfun$translateToPlan$1.apply(StreamPlanner.scala:60)
> at
>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at
>
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
> at scala.collection.Iterator$class.foreach(Iterator.scala:891)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>
> I saw the flink source code , I find that in HBaseUpsertTableSink , the
> method setKeyFields doesnt' has any code content,in StreamExecSink class,I
> saw the code content like this :
> *//TODO UpsertStreamTableSink setKeyFields interface should be
> Array[Array[String]]*
> but now the  UpsertStreamTableSink setKeyFields interface is Array[String],
> it seems like the conflict with the above content.
> Could we use HBaseUpsertTableSink in our flink task?Thanks your reply.
>

[ANNOUNCE] Apache Flink 1.8.2 released

2019-09-13 Thread Jark Wu

Hi,

The Apache Flink community is very happy to announce the release of Apache
Flink 1.8.2, which is the second bugfix release for the Apache Flink 1.8
series.

Apache Flink® is an open-source stream processing framework for
distributed, high-performing, always-available, and accurate data streaming
applications.

The release is available for download at:
https://flink.apache.org/downloads.html

Please check out the release blog post for an overview of the improvements
for this bugfix release:
https://flink.apache.org/news/2019/09/11/release-1.8.2.html

The full release notes are available in Jira:
https://issues.apache.org/jira/projects/FLINK/versions/12345670

We would like to thank all contributors of the Apache Flink community who
made this release possible!
Great thanks to @Jincheng for the kindly help during this release.

Regards,
Jark

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Jark Wu

Hi,

+1 to strive for reaching consensus on the remaining topics. We are close to 
the truth. It will waste a lot of time if we resume the topic some time later. 

+1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun” way to 
override a catalog function. 

I’m not sure about “system.system.fun”, it introduces a nonexistent cat & db? 
And we still need to do special treatment for the dedicated system.system cat & 
db? 

Best,
Jark


> 在 2019年9月18日，06:54，Timo Walther  写道：
> 
> Hi everyone,
> 
> @Xuefu: I would like to avoid adding too many things incrementally. Users 
> should be able to override all catalog objects consistently according to 
> FLIP-64 (Support for Temporary Objects in Table module). If functions are 
> treated completely different, we need more code and special cases. From an 
> implementation perspective, this topic only affects the lookup logic which is 
> rather low implementation effort which is why I would like to clarify the 
> remaining items. As you said, we have a slight consenus on overriding 
> built-in functions; we should also strive for reaching consensus on the 
> remaining topics.
> 
> @Dawid: I like your idea as it ensures registering catalog objects consistent 
> and the overriding of built-in functions more explicit.
> 
> Thanks,
> Timo
> 
> 
> On 17.09.19 11:59, kai wang wrote:
>> hi, everyone
>> I think this flip is very meaningful. it supports functions that can be
>> shared by different catalogs and dbs, reducing the duplication of functions.
>> 
>> Our group based on flink's sql parser module implements create function
>> feature, stores the parsed function metadata and schema into mysql, and
>> also customizes the catalog, customizes sql-client to support custom
>> schemas and functions. Loaded, but the function is currently global, and is
>> not subdivided according to catalog and db.
>> 
>> In addition, I very much hope to participate in the development of this
>> flip, I have been paying attention to the community, but found it is more
>> difficult to join.
>>  thank you.
>> 
>> Xuefu Z  于2019年9月17日周二 上午11:19写道：
>> 
>>> Thanks to Tmo and Dawid for sharing thoughts.
>>> 
>>> It seems to me that there is a general consensus on having temp functions
>>> that have no namespaces and overwrite built-in functions. (As a side note
>>> for comparability, the current user defined functions are all temporary and
>>> having no namespaces.)
>>> 
>>> Nevertheless, I can also see the merit of having namespaced temp functions
>>> that can overwrite functions defined in a specific cat/db. However,  this
>>> idea appears orthogonal to the former and can be added incrementally.
>>> 
>>> How about we first implement non-namespaced temp functions now and leave
>>> the door open for namespaced ones for later releases as the requirement
>>> might become more crystal? This also helps shorten the debate and allow us
>>> to make some progress along this direction.
>>> 
>>> As to Dawid's idea of having a dedicated cat/db to host the temporary temp
>>> functions that don't have namespaces, my only concern is the special
>>> treatment for a cat/db, which makes code less clean, as evident in treating
>>> the built-in catalog currently.
>>> 
>>> Thanks,
>>> Xuefiu
>>> 
>>> On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
>>> wysakowicz.da...@gmail.com>
>>> wrote:
>>> 
>>>> Hi,
>>>> Another idea to consider on top of Timo's suggestion. How about we have a
>>>> special namespace (catalog + database) for built-in objects? This catalog
>>>> would be invisible for users as Xuefu was suggesting.
>>>> 
>>>> Then users could still override built-in functions, if they fully qualify
>>>> object with the built-in namespace, but by default the common logic of
>>>> current dB & cat would be used.
>>>> 
>>>> CREATE TEMPORARY FUNCTION func ...
>>>> registers temporary function in current cat & dB
>>>> 
>>>> CREATE TEMPORARY FUNCTION cat.db.func ...
>>>> registers temporary function in cat db
>>>> 
>>>> CREATE TEMPORARY FUNCTION system.system.func ...
>>>> Overrides built-in function with temporary function
>>>> 
>>>> The built-in/system namespace would not be writable for permanent
>>> objects.
>>>> WDYT?
>>>> 
>>>> This way I think we can have benefits of both solutions.

Re: Retention policy | Memory management.

2019-09-18 Thread Jark Wu

Hi,

The Job1 is a simple ETL job and doesn’t consume much state size (only Kafka 
offset), so it should work well. 
The Job2 is an unbounded join which will put the two input stream data into 
state in Join operator. 
As the input stream is unlimited and 100GB per day as you described, if you are 
using Memory statebackend (which is the default one). 
Then the job will OOM at the end. 

Here are my answers:
>  1. How long does the data reside in my table once I read it? I consume
  100GB per day, should have been a retention policy right? If so, where do I
  configure and how?

The data is stored in state. You can specify the retention policy by setting
 “execution: min-idle-state-retention” and execution: max-idle-retention: “ 
keys[1]
 in environment file if you are using SQL CLI. 

>  2. Are retention policies specific to tables?

No. It affects to all the stateble non-window operations (e.g. GroupBy, Join)

>   3. I have a data set updates once a day. How about using UPSERT mode?
  If so, how could I delete the existing data set to load the new?

Flink SQL doesn’t support to load periodic-changed data set yet. Maybe you can 
achieve this by implementing custom source and operators in DataStream API.

Best,
Jark


[1]: 
https://ci.apache.org/projects/flink/flink-docs-master/dev/table/sqlClient.html#environment-files


> 在 2019年9月13日，15:43，srikanth flink  写道：
> 
> Hi there,
> 
> I came across Flink and FlinkSQL and using FlinkSQL for stream processing.
> Flink runs as 3 node cluster with embedded Zookeeper, given heap 80GB on
> each. I came across few issues and would like to get some clarification.
> 
>   - Job1: Using Flink(java) to read and flatten my JSON and write to Kafka
>   topic.
> 
> 
>   - Job2: Environment file configured to read from 2 different Kafka
>   topics. I get to join both the tables and are working. The query runs for a
>   while (say an hour) and then fails with *error*.
> 
> Questions:
> 
>   1. How long does the data reside in my table once I read it? I consume
>   100GB per day, should have been a retention policy right? If so, where do I
>   configure and how?
>   2. Are retention policies specific to tables?
>   3. I have a data set updates once a day. How about using UPSERT mode?
>   If so, how could I delete the existing data set to load the new?
> 
> 
> *Query*: SELECT s.* from sourceKafka AS s INNER JOIN badIp AS b ON
> s.`source.ip`=b.ip;
> *Error*: org.apache.flink.util.FlinkException: The assigned slot
> e57d1c0556b4a197eb44d7d9e83e1a47_6 was removed. at
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.removeSlot(SlotManagerImpl.java:958)
> at
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.removeSlots(SlotManagerImpl.java:928)
> at
> org.apache.flink.runtime.resourcemanager.slotmanager.SlotManagerImpl.internalUnregisterTaskManager(SlotManagerImpl.java:1149)
> 
> *Environment File*:
> #==
> # Tables
> #==
> 
> # Define tables here such as sources, sinks, views, or temporal tables.
> tables:  # empty list
> # A typical table source definition looks like:
>  - name: sourceKafka
>type: source-table
>update-mode: append
>connector:
>  type: kafka
>  version: "universal" # required: valid connector versions are
>  #   "0.8", "0.9", "0.10", "0.11", and "universal"
>  topic: recon-data-flatten  # required: topic name from which
> the table is read
> 
>  properties: # optional: connector specific properties
>- key: zookeeper.connect
>  value: 1.2.4.1:2181
>- key: bootstrap.servers
>  value: 1.2.4.1:9092
>- key: group.id
>  value: reconDataGroup
>format:
>  type: json
>  fail-on-missing-field: false
>  json-schema: >
>{
>  type: 'object',
>  properties: {
>'source.ip': {
>  type: 'string'
>},
>'source.port': {
>  type: 'string'
>},
>'destination.ip': {
>  type: 'string'
>},
>'destination.port': {
>  type: 'string'
>}
>  }
>}
>  derive-schema: false
> 
>schema:
>  - name: 'source.ip'
>type: VARCHAR
>  - name: 'source.port'
>typ

Re: [DISCUSS] FLIP-66: Support time attribute in SQL DDL

2019-09-18 Thread Jark Wu

Hi everyone,

Thanks all for joining the discussion in the doc[1].
It seems that the discussion is converged and there is a consensus on the
current FLIP document.
If there is no objection, I would like to convert it into cwiki FLIP page
and start voting process.

For more details, please refer to the design doc (it is slightly changed
since the initial proposal).

Thanks,
Jark

[1]:
https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit?ts=5d8258cd

On Mon, 16 Sep 2019 at 16:12, Kurt Young  wrote:

> After some review and discussion in the google document, I think it's time
> to
> convert this design to a cwiki flip page and start voting process.
>
> Best,
> Kurt
>
>
> On Mon, Sep 9, 2019 at 7:46 PM Jark Wu  wrote:
>
> > Hi all,
> >
> > Thanks all for so much feedbacks received in the doc so far.
> > I saw a general agreement on using computed column to support proctime
> > attribute and extract timestamps.
> > So we will prepare a computed column FLIP and share in the dev ML soon.
> >
> > Feel free to leave more comments!
> >
> > Best,
> > Jark
> >
> >
> >
> > On Fri, 6 Sep 2019 at 13:50, Dian Fu  wrote:
> >
> > > Hi Jark,
> > >
> > > Thanks for bringing up this discussion and the detailed design doc.
> This
> > > is definitely a critical feature for streaming SQL jobs. I have left a
> > few
> > > comments in the design doc.
> > >
> > > Thanks,
> > > Dian
> > >
> > > > 在 2019年9月6日，上午11:48，Forward Xu  写道：
> > > >
> > > > Thanks Jark for this topic, This will be very useful.
> > > >
> > > >
> > > > Best,
> > > >
> > > > ForwardXu
> > > >
> > > > Danny Chan  于2019年9月6日周五 上午11:26写道：
> > > >
> > > >> Thanks Jark for bring up this topic, this is definitely an import
> > > feature
> > > >> for the SQL, especially the DDL users.
> > > >>
> > > >> I would spend some time to review this design doc, really thanks.
> > > >>
> > > >> Best,
> > > >> Danny Chan
> > > >> 在 2019年9月6日 +0800 AM11:19，Jark Wu ，写道：
> > > >>> Hi everyone,
> > > >>>
> > > >>> I would like to start discussion about how to support time
> attribute
> > in
> > > >> SQL
> > > >>> DDL.
> > > >>> In Flink 1.9, we already introduced a basic SQL DDL to create a
> > table.
> > > >>> However, it doesn't support to define time attributes. This makes
> > users
> > > >>> can't
> > > >>> apply window operations on the tables created by DDL which is a bad
> > > >>> experience.
> > > >>>
> > > >>> In FLIP-66, we propose a syntax for watermark to define rowtime
> > > attribute
> > > >>> and propose to use computed column syntax to define proctime
> > attribute.
> > > >>> But computed column is another big topic and should deserve a
> > separate
> > > >>> FLIP.
> > > >>> If we have a consensus on the computed column approach, we will
> start
> > > >>> computed column FLIP soon.
> > > >>>
> > > >>> FLIP-66:
> > > >>>
> > > >>
> > >
> >
> https://docs.google.com/document/d/1-SecocBqzUh7zY6HBYcfMlG_0z-JAcuZkCvsmN3LrOw/edit#
> > > >>>
> > > >>> Thanks for any feedback!
> > > >>>
> > > >>> Best,
> > > >>> Jark
> > > >>
> > >
> > >
> >
>

[VOTE] FLIP-66: Support Time Attribute in SQL DDL

2019-09-18 Thread Jark Wu

Hi all,

I would like to start the vote for FLIP-66 [1], which is discussed and
reached a consensus in the discussion thread[2].

The vote will be open for at least 72 hours. I'll try to close it after
Sep. 24 08:00 UTC, unless there is an objection or not enough votes.

Thanks,
Jark

[1]:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+time+attribute+in+SQL+DDL
<https://cwiki.apache.org/confluence/display/FLINK/FLIP-66%3A+Support+time+attribute+in+SQL+DDL>
[2]:
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-66-Support-time-attribute-in-SQL-DDL-tt32766.html

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Jark Wu

I agree with Xuefu that inconsistent handling with all the other objects is
not a big problem.

Regarding to option#3, the special "system.system" namespace may confuse
users.
Users need to know the set of built-in function names to know when to use
"system.system" namespace.
What will happen if user registers a non-builtin function name under the
"system.system" namespace?
Besides, I think it doesn't solve the "explode" problem I mentioned at the
beginning of this thread.

So here is my vote:

+1 for #1
0 for #2
-1 for #3

Best,
Jark


On Thu, 19 Sep 2019 at 08:38, Xuefu Z  wrote:

> @Dawid, Re: we also don't need additional referencing the specialcatalog
> anywhere.
>
> True. But once we allow such reference, then user can do so in any possible
> place where a function name is expected, for which we have to handle.
> That's a big difference, I think.
>
> Thanks,
> Xuefu
>
> On Wed, Sep 18, 2019 at 5:25 PM Dawid Wysakowicz <
> wysakowicz.da...@gmail.com>
> wrote:
>
> > @Bowen I am not suggesting introducing additional catalog. I think we
> need
> > to get rid of the current built-in catalog.
> >
> > @Xuefu in option #3 we also don't need additional referencing the special
> > catalog anywhere else besides in the CREATE statement. The resolution
> > behaviour is exactly the same in both options.
> >
> > On Thu, 19 Sep 2019, 08:17 Xuefu Z,  wrote:
> >
> > > Hi Dawid,
> > >
> > > "GLOBAL" is a temporary keyword that was given to the approach. It can
> be
> > > changed to something else for better.
> > >
> > > The difference between this and the #3 approach is that we only need
> the
> > > keyword for this create DDL. For other places (such as function
> > > referencing), no keyword or special namespace is needed.
> > >
> > > Thanks,
> > > Xuefu
> > >
> > > On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> > > wysakowicz.da...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > > I think it makes sense to start voting at this point.
> > > >
> > > > Option 1: Only 1-part identifiers
> > > > PROS:
> > > > - allows shadowing built-in functions
> > > > CONS:
> > > > - incosistent with all the other objects, both permanent & temporary
> > > > - does not allow shadowing catalog functions
> > > >
> > > > Option 2: Special keyword for built-in function
> > > > I think this is quite similar to the special catalog/db. The thing I
> am
> > > > strongly against in this proposal is the GLOBAL keyword. This keyword
> > > has a
> > > > meaning in rdbms systems and means a function that is present for a
> > > > lifetime of a session in which it was created, but available in all
> > other
> > > > sessions. Therefore I really don't want to use this keyword in a
> > > different
> > > > context.
> > > >
> > > > Option 3: Special catalog/db
> > > >
> > > > PROS:
> > > > - allows shadowing built-in functions
> > > > - allows shadowing catalog functions
> > > > - consistent with other objects
> > > > CONS:
> > > > - we introduce a special namespace for built-in functions
> > > >
> > > > I don't see a problem with introducing the special namespace. In the
> > end
> > > it
> > > > is very similar to the keyword approach. In this case the catalog/db
> > > > combination would be the "keyword"
> > > >
> > > > Therefore my votes:
> > > > Option 1: -0
> > > > Option 2: -1 (I might change to +0 if we can come up with a better
> > > keyword)
> > > > Option 3: +1
> > > >
> > > > Best,
> > > > Dawid
> > > >
> > > >
> > > > On Thu, 19 Sep 2019, 05:12 Xuefu Z,  wrote:
> > > >
> > > > > Hi Aljoscha,
> > > > >
> > > > > Thanks for the summary and these are great questions to be
> answered.
> > > The
> > > > > answer to your first question is clear: there is a general
> agreement
> > to
> > > > > override built-in functions with temp functions.
> > > > >
> > > > > However, your second and third questions are sort of related, as a
> > > > function
> > > > > reference can be either just function name (

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

2019-09-19 Thread Jark Wu

Thanks Dawid for the design doc. 

In general, I’m +1 to the FLIP.


+1 to the single-string and parse way to express object path. 

+1 to deprecate registerTableSink & registerTableSource. 
But I would suggest to provide an easy way to register a custom source/sink 
before we drop them (this is another story). 
Currently, it’s not easy to implement a custom connector descriptor.

Best,
Jark


> 在 2019年9月19日，11:37，Dawid Wysakowicz  写道：
> 
> Hi JingsongLee,
> From my understanding they can. Underneath they will be CatalogTables. The
> difference is the lifetime of the tables. Plus some of the user facing
> interfaces cannot be persisted e.g. datastream. Therefore we must have a
> separate methods for that. In the end the temporary tables are held in
> memory as CatalogTables.
> Best,
> Dawid
> 
> On Thu, 19 Sep 2019, 10:08 JingsongLee, 
> wrote:
> 
>> Hi dawid:
>> Can temporary tables achieve the same capabilities as catalog table?
>> like statistics: CatalogTableStatistics, CatalogColumnStatistics,
>> PartitionStatistics
>> like partition support: we have added some catalog equivalent interfaces
>> on TableSource/TableSink: getPartitions, getPartitionFieldNames
>> Maybe it's not a good idea to add these interfaces to
>> TableSource/TableSink. What do you think?
>> 
>> Best,
>> Jingsong Lee
>> 
>> 
>> --
>> From:Kurt Young 
>> Send Time:2019年9月18日(星期三) 17:54
>> To:dev 
>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
>> module
>> 
>> Hi all,
>> 
>> Sorry to join this party late. Big +1 to this flip, especially for the
>> dropping
>> "registerTableSink & registerTableSource" part. These are indeed legacy
>> and we should try to unify them through CatalogTable after we introduce
>> the concept of Catalog.
>> 
>> From my understanding, what we can registered should all be metadata,
>> TableSource/TableSink should only be the one who is responsible to do
>> the real work, i.e. reading and writing data according to the schema and
>> other information like computed column, partition, .e.g.
>> 
>> Best,
>> Kurt
>> 
>> 
>> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee > .invalid>
>> wrote:
>> 
>>> After some development and thinking, I have a general understanding.
>>> +1 to registering a source/sink does not fit into the SQL world.
>>> I am OK to have a deprecated registerTemporarySource/Sink to compatible
>>> with old ways.
>>> 
>>> Best,
>>> Jingsong Lee
>>> 
>>> 
>>> --
>>> From:Timo Walther 
>>> Send Time:2019年9月17日(星期二) 08:00
>>> To:dev 
>>> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
>>> module
>>> 
>>> Hi Dawid,
>>> 
>>> thanks for the design document. It fixes big concept gaps due to
>>> historical reasons with proper support for serializability and catalog
>>> support in mind.
>>> 
>>> I would not mind a registerTemporarySource/Sink, but the problem that I
>>> see is that many people think that this is the recommended way of
>>> registering a table source/sink which is not true. We should guide users
>>> to either use connect() or DDL API which can be validated and stored in
>>> catalog.
>>> 
>>> Also from a concept perspective, registering a source/sink does not fit
>>> into the SQL world. SQL does not know about source/sinks but only about
>>> tables. If the responsibility of a TableSource/TableSink is just a pure
>>> physical data consumer/producer that is not connected to the actual
>>> logical table schema, we would need a possibility of defining time
>>> attributes and interpreting/converting a changelog. This should be done
>>> by the framework with information from the DDL/connect() and not be
>>> defined in every table source.
>>> 
>>> Regards,
>>> Timo
>>> 
>>> 
>>> On 09.09.19 14:16, JingsongLee wrote:
>>>> Hi dawid:
>>>> 
>>>> It is difficult to describe specific examples.
>>>> Sometimes users will generate some java converters through some
>>>>  Java code, or generate some Java classes through third-party
>>>>  libraries. Of course, these can be best done through properties.
>>>> But this requires additional work from users.My suggestion is to
>>&

Re: [DISCUSS] FLIP-316: Introduce SQL Driver

2023-06-01 Thread Jark Wu

Hi Paul,

Thanks for starting this discussion. I like the proposal! This is a
frequently requested feature!

I agree with Shengkai that ExecNodeGraph as the submission object is a
better idea than SQL file. To be more specific, it should be JsonPlanGraph
or CompiledPlan which is the serializable representation. CompiledPlan is a
clear separation between compiling/optimization/validation and execution.
This can keep the validation and metadata accessing still on the SQLGateway
side. This allows SQLGateway to leverage some metadata caching and UDF JAR
caching for better compiling performance.

If we decide to submit ExecNodeGraph instead of SQL file, is it still
necessary to support SQL Driver? Regarding non-interactive SQL jobs, users
can use the Table API program for application mode. SQL Driver needs to
serialize SessionState which is very challenging but not detailed covered
in the FLIP.

Regarding "K8S doesn't support shipping multiple jars", is that true? Is it
possible to support it?

Best,
Jark



On Thu, 1 Jun 2023 at 16:58, Paul Lam  wrote:

> Hi Weihua,
>
> You’re right. Distributing the SQLs to the TMs is one of the challenging
> parts of this FLIP.
>
> Web submission is not enabled in application mode currently as you said,
> but it could be changed if we have good reasons.
>
> What do you think about introducing a distributed storage for SQL Gateway?
>
> We could make use of Flink file systems [1] to distribute the SQL Gateway
> generated resources, that should solve the problem at its root cause.
>
> Users could specify Flink-supported file systems to ship files. It’s only
> required when using SQL Gateway with K8s application mode.
>
> [1]
> https://nightlies.apache.org/flink/flink-docs-stable/docs/deployment/filesystems/overview/
>
> Best,
> Paul Lam
>
> > 2023年6月1日 13:55，Weihua Hu  写道：
> >
> > Thanks Paul for your reply.
> >
> > SQLDriver looks good to me.
> >
> > 2. Do you mean a pass the SQL string a configuration or a program
> argument?
> >
> >
> > I brought this up because we were unable to pass the SQL file to Flink
> > using Kubernetes mode.
> > For DataStream/Python users, they need to prepare their images for the
> jars
> > and dependencies.
> > But for SQL users, they can use a common image to run different SQL
> queries
> > if there are no other udf requirements.
> > It would be great if the SQL query and image were not bound.
> >
> > Using strings is a way to decouple these, but just as you mentioned, it's
> > not easy to pass complex SQL.
> >
> >> use web submission
> > AFAIK, we can not use web submission in the Application mode. Please
> > correct me if I'm wrong.
> >
> >
> > Best,
> > Weihua
> >
> >
> > On Wed, May 31, 2023 at 9:37 PM Paul Lam  wrote:
> >
> >> Hi Biao,
> >>
> >> Thanks for your comments!
> >>
> >>> 1. Scope: is this FLIP only targeted for non-interactive Flink SQL jobs
> >> in
> >>> Application mode? More specifically, if we use SQL client/gateway to
> >>> execute some interactive SQLs like a SELECT query, can we ask flink to
> >> use
> >>> Application mode to execute those queries after this FLIP?
> >>
> >> Thanks for pointing it out. I think only DMLs would be executed via SQL
> >> Driver.
> >> I'll add the scope to the FLIP.
> >>
> >>> 2. Deployment: I believe in YARN mode, the implementation is trivial as
> >> we
> >>> can ship files via YARN's tool easily but for K8s, things can be more
> >>> complicated as Shengkai said.
> >>
> >>
> >> Your input is very informative. I’m thinking about using web submission,
> >> but it requires exposing the JobManager port which could also be a
> problem
> >> on K8s.
> >>
> >> Another approach is to explicitly require a distributed storage to ship
> >> files,
> >> but we may need a new deployment executor for that.
> >>
> >> What do you think of these two approaches?
> >>
> >>> 3. Serialization of SessionState: in SessionState, there are some
> >>> unserializable fields
> >>> like org.apache.flink.table.resource.ResourceManager#userClassLoader.
> It
> >>> may be worthwhile to add more details about the serialization part.
> >>
> >> I agree. That’s a missing part. But if we use ExecNodeGraph as Shengkai
> >> mentioned, do we eliminate the need for serialization of SessionState?
> >>
> >> Best,
> >> Paul Lam
> >>
>

Re: [DISCUSS] FLIP-315: Support Operator Fusion Codegen for Flink SQL

2023-06-01 Thread Jark Wu

Hi Ron,

Thanks a lot for the great proposal. The FLIP looks good to me in general.
It looks like not an easy work but the performance sounds promising. So I
think it's worth doing.

Besides, if there is a complete test graph with all TPC-DS queries, the
effect of this FLIP will be more intuitive.

Best,
Jark



On Wed, 31 May 2023 at 14:27, liu ron  wrote:

> Hi, Jinsong
>
> Thanks for your valuable suggestions.
>
> Best,
> Ron
>
> Jingsong Li  于2023年5月30日周二 13:22写道：
>
> > Thanks Ron for your information.
> >
> > I suggest that it can be written in the Motivation of FLIP.
> >
> > Best,
> > Jingsong
> >
> > On Tue, May 30, 2023 at 9:57 AM liu ron  wrote:
> > >
> > > Hi, Jingsong
> > >
> > > Thanks for your review. We have tested it in TPC-DS case, and got a 12%
> > > gain overall when only supporting only Calc&HashJoin&HashAgg operator.
> In
> > > some queries, we even get more than 30% gain, it looks like  an
> effective
> > > way.
> > >
> > > Best,
> > > Ron
> > >
> > > Jingsong Li  于2023年5月29日周一 14:33写道：
> > >
> > > > Thanks Ron for the proposal.
> > > >
> > > > Do you have some benchmark results for the performance improvement? I
> > > > am more concerned about the improvement on Flink than the data in
> > > > other papers.
> > > >
> > > > Best,
> > > > Jingsong
> > > >
> > > > On Mon, May 29, 2023 at 2:16 PM liu ron  wrote:
> > > > >
> > > > > Hi, dev
> > > > >
> > > > > I'd like to start a discussion about FLIP-315: Support Operator
> > Fusion
> > > > > Codegen for Flink SQL[1]
> > > > >
> > > > > As main memory grows, query performance is more and more determined
> > by
> > > > the
> > > > > raw CPU costs of query processing itself, this is due to the query
> > > > > processing techniques based on interpreted execution shows poor
> > > > performance
> > > > > on modern CPUs due to lack of locality and frequent instruction
> > > > > mis-prediction. Therefore, the industry is also researching how to
> > > > improve
> > > > > engine performance by increasing operator execution efficiency. In
> > > > > addition, during the process of optimizing Flink's performance for
> > TPC-DS
> > > > > queries, we found that a significant amount of CPU time was spent
> on
> > > > > virtual function calls, framework collector calls, and invalid
> > > > > calculations, which can be optimized to improve the overall engine
> > > > > performance. After some investigation, we found Operator Fusion
> > Codegen
> > > > > which is proposed by Thomas Neumann in the paper[2] can address
> these
> > > > > problems. I have finished a PoC[3] to verify its feasibility and
> > > > validity.
> > > > >
> > > > > Looking forward to your feedback.
> > > > >
> > > > > [1]:
> > > > >
> > > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-315+Support+Operator+Fusion+Codegen+for+Flink+SQL
> > > > > [2]: http://www.vldb.org/pvldb/vol4/p539-neumann.pdf
> > > > > [3]: https://github.com/lsyldliu/flink/tree/OFCG
> > > > >
> > > > > Best,
> > > > > Ron
> > > >
> >
>

Re: [DISCUSS] FLIP 295: Support persistence of Catalog configuration and asynchronous registration

2023-06-01 Thread Jark Wu

Hi Feng,

This is a useful FLIP. Thanks for starting this discussion.
The current design looks pretty good to me. I just have some minor comments.

1. How to register the CatalogStore for Table API? I think the CatalogStore
should be immutable once TableEnv is created. Otherwise, there might be
some data inconsistencies when CatalogStore is changed.

2. Why does the CatalogStoreFactory interface only have a default method,
not an interface method?

3. Please mention the alternative API in Javadoc for the deprecated
`registerCatalog`.

4. In the "Compatibility" section, would be better to mention the changed
behavior of CREATE CATALOG statement if FileCatalogStore (or other
persisted catalog store) is used.


Best,
Jark

On Thu, 1 Jun 2023 at 11:26, Feng Jin  wrote:

> Hi ，  thanks all for reviewing the flip.
>
> @Ron
>
> >  Regarding the CatalogStoreFactory#createCatalogStore method, do we need
> to provide a default implementation?
>
> Yes, we will provide a default InMemoryCatalogStoreFactory to create an
> InMemoryCatalogStore.
>
> >  If we get a Catalog from CatalogStore, after initializing it, whether we
> put it in Map catalogs again?
>
> Yes, in the current design, catalogs are stored as snapshots, and once
> initialized, the Catalog will be placed in the Map
> catalogs.
> Subsequently, the Map catalogs will be the primary source
> for obtaining the corresponding Catalog.
>
> >   how about renaming them to `catalog.store.type` and
> `catalog.store.path`?
>
> I think it is okay. Adding "sql" at the beginning may seem a bit strange. I
> will update the FLIP.
>
>
>
> @Shammon
>
> Thank you for the review. I have made the necessary corrections.
> Regarding the modifications made to the Public Interface, I have also
> included the relevant changes to the `TableEnvironment`.
>
>
> Best,
> Feng
>
>
> On Wed, May 31, 2023 at 5:02 PM Shammon FY  wrote:
>
> > Hi feng,
> >
> > Thanks for updating, I have some minor comments
> >
> > 1. The modification of `CatalogManager` should not be in `Public
> > Interfaces`, it is not a public interface.
> >
> > 2. `@PublicEvolving` should be added for `CatalogStore` and
> > `CatalogStoreFactory`
> >
> > 3. The code `Optional optionalDescriptor =
> > catalogStore.get(catalogName);` in the `CatalogManager` should be
> > `Optional optionalDescriptor =
> > catalogStore.get(catalogName);`
> >
> > Best,
> > Shammon FY
> >
> >
> > On Wed, May 31, 2023 at 2:24 PM liu ron  wrote:
> >
> > > Hi, Feng
> > >
> > > Thanks for driving this FLIP, this proposal is very useful for catalog
> > > management.
> > > I have some small questions:
> > >
> > > 1. Regarding the CatalogStoreFactory#createCatalogStore method, do we
> > need
> > > to provide a default implementation?
> > > 2. If we get Catalog from CatalogStore, after initializing it, whether
> we
> > > put it to Map catalogs again?
> > > 3. Regarding the options `sql.catalog.store.type` and
> > > `sql.catalog.store.file.path`, how about renaming them to
> > > `catalog.store.type` and `catalog.store.path`?
> > >
> > > Best,
> > > Ron
> > >
> > > Feng Jin  于2023年5月29日周一 21:19写道：
> > >
> > > > Hi yuxia
> > > >
> > > >  > But from the code in Proposed Changes, once we register the
> Catalog,
> > > we
> > > > initialize it and open it. right?
> > > >
> > > > Yes, In order to avoid inconsistent semantics of the original CREATE
> > > > CATALOG DDL, Catalog will be directly initialized in registerCatalog
> so
> > > > that parameter validation can be performed.
> > > >
> > > > In the current design, lazy initialization is mainly reflected in
> > > > getCatalog. If CatalogStore has already saved some catalog
> > > configurations,
> > > > only initialization is required in getCatalog.
> > > >
> > > >
> > > > Best,
> > > > Feng
> > > >
> > > > On Mon, May 29, 2023 at 8:27 PM yuxia 
> > > wrote:
> > > >
> > > > > Hi, Feng.
> > > > > I'm trying to understanding the meaning of *lazy initialization*.
> If
> > > i'm
> > > > > wrong, please correct me.
> > > > >
> > > > > IIUC, lazy initialization means only you need to access the
> catalog,
> > > then
> > > > > you initialize it. But from the code in Proposed C

Re: [DISCUSS] Hive dialect shouldn't fall back to Flink's default dialect

2023-06-01 Thread Jark Wu

+1, I think this can make the grammar more clear.
Please remember to add a release note once the issue is finished.

Best,
Jark

On Thu, 1 Jun 2023 at 11:28, yuxia  wrote:

> Hi, Jingsong. It's hard to provide an option regarding to the fact that we
> also want to decouple Hive with flink planner.
> If we still need this fall back behavior, we will still depend on
> `ParserImpl` provided by flink-table-planner  on HiveParser.
> But to try best to minimize the impact to users and more user-friendly,
> I'll remind users may use set table.sql-dialect = default to switch to
> Flink's default dialect in error message when fail to parse the sql in
> HiveParser.
>
> Best regards,
> Yuxia
>
> Best regards,
> Yuxia
>
> - 原始邮件 -
> 发件人: "Jingsong Li" 
> 收件人: "Rui Li" 
> 抄送: "dev" , "yuxia" ,
> "User" 
> 发送时间: 星期二, 2023年 5 月 30日 下午 3:21:56
> 主题: Re: [DISCUSS] Hive dialect shouldn't fall back to Flink's default
> dialect
>
> +1, the fallback looks weird now, it is outdated.
>
> But, it is good to provide an option. I don't know if there are some
> users who depend on this fallback.
>
> Best,
> Jingsong
>
> On Tue, May 30, 2023 at 1:47 PM Rui Li  wrote:
> >
> > +1, the fallback was just intended as a temporary workaround to run
> catalog/module related statements with hive dialect.
> >
> > On Mon, May 29, 2023 at 3:59 PM Benchao Li  wrote:
> >>
> >> Big +1 on this, thanks yuxia for driving this!
> >>
> >> yuxia  于2023年5月29日周一 14:55写道：
> >>
> >> > Hi, community.
> >> >
> >> > I want to start the discussion about Hive dialect shouldn't fall back
> to
> >> > Flink's default dialect.
> >> >
> >> > Currently, when the HiveParser fail to parse the sql in Hive dialect,
> >> > it'll fall back to Flink's default parser[1] to handle flink-specific
> >> > statements like "CREATE CATALOG xx with (xx);".
> >> >
> >> > As I‘m involving with Hive dialect and have some communication with
> >> > community users who use Hive dialectrecently,  I'm thinking throw
> exception
> >> > directly instead of falling back to Flink's default dialect when fail
> to
> >> > parse the sql in Hive dialect
> >> >
> >> > Here're some reasons:
> >> >
> >> > First of all, it'll hide some error with Hive dialect. For example, we
> >> > found we can't use Hive dialect any more with Flink sql client in
> release
> >> > validation phase[2], finally we find a modification in Flink sql
> client
> >> > cause it, but our test case can't find it earlier for although
> HiveParser
> >> > faill to parse it but then it'll fall back to default parser and pass
> test
> >> > case successfully.
> >> >
> >> > Second, conceptually, Hive dialect should be do nothing with Flink's
> >> > default dialect. They are two totally different dialect. If we do
> need a
> >> > dialect mixing Hive dialect and default dialect , may be we need to
> propose
> >> > a new hybrid dialect and announce the hybrid behavior to users.
> >> > Also, It made some users confused for the fallback behavior. The fact
> >> > comes from I had been ask by community users. Throw an excpetioin
> directly
> >> > when fail to parse the sql statement in Hive dialect will be more
> intuitive.
> >> >
> >> > Last but not least, it's import to decouple Hive with Flink planner[3]
> >> > before we can externalize Hive connector[4]. If we still fall back to
> Flink
> >> > default dialct, then we will need depend on `ParserImpl` in Flink
> planner,
> >> > which will block us removing the provided dependency of Hive dialect
> as
> >> > well as externalizing Hive connector.
> >> >
> >> > Although we hadn't announced the fall back behavior ever, but some
> users
> >> > may implicitly depend on this behavior in theirs sql jobs. So, I
> hereby
> >> > open the dicussion about abandoning the fall back behavior to make
> Hive
> >> > dialect clear and isoloted.
> >> > Please remember it won't break the Hive synatax but the syntax
> specified
> >> > to Flink may fail after then. But for the failed sql, you can use `SET
> >> > table.sql-dialect=default;` to switch to Flink dialect.
> >> > If there's some flink-specific statements we found should be included
> in
> >> > Hive dialect to be easy to use, I think we can still add them as
> specific
> >> > cases to Hive dialect.
> >> >
> >> > Look forwards to your feedback. I'd love to listen the feedback from
> >> > community to take the next steps.
> >> >
> >> > [1]:
> >> >
> https://github.com/apache/flink/blob/678370b18e1b6c4a23e5ce08f8efd05675a0cc17/flink-connectors/flink-connector-hive/src/main/java/org/apache/flink/table/planner/delegation/hive/HiveParser.java#L348
> >> > [2]:https://issues.apache.org/jira/browse/FLINK-26681
> >> > [3]:https://issues.apache.org/jira/browse/FLINK-31413
> >> > [4]:https://issues.apache.org/jira/browse/FLINK-30064
> >> >
> >> >
> >> >
> >> > Best regards,
> >> > Yuxia
> >> >
> >>
> >>
> >> --
> >>
> >> Best,
> >> Benchao Li
> >
> >
> >
> > --
> > Best regards!
> > Rui Li
>

[DISCUSS] Update Flink Roadmap

2023-06-01 Thread Jark Wu

Hi all,

Martijn and I would like to initiate a discussion on the Flink roadmap,
which should cover the project's long-term roadmap and the regular update
mechanism.

Xintong has already started a discussion about Flink 2.0 planning. One of
the points raised in that discussion is that we should have a high-level
discussion of the roadmap to present where the project is heading (which
doesn't necessarily need to block the Flink 2.0 planning). Moreover, the
roadmap on the Flink website [1] hasn't been updated for half a year, and
the last update was for the feature radar for the 1.15 release. It has been
2 years since the community discussed Flink's overall roadmap.

I would like to raise two topics for discussion:

1. The new roadmap. This should be an updated version of the current
roadmap[1].
2. A mechanism to regularly discuss and update the roadmap.

To make the first topic discussion more efficient, Martijn and I volunteer
to summarize the ongoing big things of different components and present a
roadmap draft to the community in the next few weeks. This should be a good
starting point for a more detailed discussion.

Regarding the regular update mechanism, there was a proposal in a thread
[2] three years ago to make the release manager responsible for updating
the roadmap. However, it appears that this was not documented as a release
management task [3], and the roadmap update wasn't performed for releases
1.16 and 1.17.

In my opinion, making release managers responsible for keeping the roadmap
up to date is a good idea. Specifically, release managers of release X can
kick off the roadmap update at the beginning of release X, which can be a
joint task with collecting a feature list [4]. Additionally, release
managers of release X-1 can help verify and remove the accomplished items
from the roadmap and update the feature radar.

What do you think? Do you have other ideas?

Best,
Jark & Martijn

[1]: https://flink.apache.org/roadmap.html
[2]: https://lists.apache.org/thread/o0l3cg6yphxwrww0k7215jgtw3yfoybv
[3]:
https://cwiki.apache.org/confluence/display/FLINK/Flink+Release+Management
[4]: https://cwiki.apache.org/confluence/display/FLINK/1.18+Release

Re: [DISCUSS] Update Flink Roadmap

2023-06-01 Thread Jark Wu

Hi Jing,

This thread is for discussing the roadmap for versions 1.18, 2.0, and even
more.
One of the outcomes of this discussion will be an updated version of the
current roadmap.
Let's work together on refining the roadmap in this thread.

Best,
Jark

On Thu, 1 Jun 2023 at 23:25, Jing Ge  wrote:

> Hi Jark,
>
> Thanks for driving it! For point 2, since we are developing 1.18 now,
> does it make sense to update the roadmap this time while we are releasing
> 1.18? This discussion thread will be focusing on the Flink 2.0 roadmap, as
> you mentioned previously. WDYT?
>
> Best regards,
> Jing
>
> On Thu, Jun 1, 2023 at 3:31 PM Jark Wu  wrote:
>
> > Hi all,
> >
> > Martijn and I would like to initiate a discussion on the Flink roadmap,
> > which should cover the project's long-term roadmap and the regular update
> > mechanism.
> >
> > Xintong has already started a discussion about Flink 2.0 planning. One of
> > the points raised in that discussion is that we should have a high-level
> > discussion of the roadmap to present where the project is heading (which
> > doesn't necessarily need to block the Flink 2.0 planning). Moreover, the
> > roadmap on the Flink website [1] hasn't been updated for half a year, and
> > the last update was for the feature radar for the 1.15 release. It has
> been
> > 2 years since the community discussed Flink's overall roadmap.
> >
> > I would like to raise two topics for discussion:
> >
> > 1. The new roadmap. This should be an updated version of the current
> > roadmap[1].
> > 2. A mechanism to regularly discuss and update the roadmap.
> >
> > To make the first topic discussion more efficient, Martijn and I
> volunteer
> > to summarize the ongoing big things of different components and present a
> > roadmap draft to the community in the next few weeks. This should be a
> good
> > starting point for a more detailed discussion.
> >
> > Regarding the regular update mechanism, there was a proposal in a thread
> > [2] three years ago to make the release manager responsible for updating
> > the roadmap. However, it appears that this was not documented as a
> release
> > management task [3], and the roadmap update wasn't performed for releases
> > 1.16 and 1.17.
> >
> > In my opinion, making release managers responsible for keeping the
> roadmap
> > up to date is a good idea. Specifically, release managers of release X
> can
> > kick off the roadmap update at the beginning of release X, which can be a
> > joint task with collecting a feature list [4]. Additionally, release
> > managers of release X-1 can help verify and remove the accomplished items
> > from the roadmap and update the feature radar.
> >
> > What do you think? Do you have other ideas?
> >
> > Best,
> > Jark & Martijn
> >
> > [1]: https://flink.apache.org/roadmap.html
> > [2]: https://lists.apache.org/thread/o0l3cg6yphxwrww0k7215jgtw3yfoybv
> > [3]:
> >
> https://cwiki.apache.org/confluence/display/FLINK/Flink+Release+Management
> > [4]: https://cwiki.apache.org/confluence/display/FLINK/1.18+Release
> >
>

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1639 matches

Mail list logo