Re: [VOTE] Release 1.8.1, release candidate #1

2019-07-01 Thread Aljoscha Krettek
+1 (binding)

 - I checked the diff in the POM files since 1.8.0 and they look good, i.e. no 
new dependencies that could lead to licensing problems

> On 1. Jul 2019, at 10:02, Tzu-Li (Gordon) Tai  wrote:
> 
> +1 (binding)
> 
> - checked signatures and hashes
> - built from source without skipping tests (without Hadoop, Scala 2.12)
> - no new dependencies were added since 1.8.0
> - ran the end-to-end tests locally once, and in a loop specifically for
> Kafka tests (to cover FLINK-11987)
> - announcement PR for website looks good!
> 
> Cheers,
> Gordon
> 
> On Mon, Jul 1, 2019 at 9:02 AM jincheng sun 
> wrote:
> 
>> +1 (binding)
>> 
>> With the following checks:
>> 
>> - checked gpg signatures by `gpg --verify 181.asc flink-1.8.1-src.tgz`
>> [success]
>> - checked the hashes by `shasum -a 512 flink-1.8.1-src.tgz` [success]
>> - built from source by `mvn clean package -DskipTests` [success]
>> - download the `flink-core-1.8.1.jar` from `repository.apache.org`
>> [success]
>> - run the example(word count) in local [success]
>> 
>> Cheers,
>> Jincheng
>> 
>> Jark Wu  于2019年6月30日周日 下午9:21写道:
>> 
>>> +1 (non-binding)
>>> 
>>> - built from source successfully
>>> - checked signatures and hashes
>>> - run a couple of end-to-end tests locally with success
>>> - started a cluster both for scala-2.11 and scala-2.12, ran examples,
>> WebUI
>>> is accessible, no suspicious log output
>>> - reviewed the release PR and left comments
>>> 
>>> Cheers,
>>> Jark
>>> 
>>> On Thu, 27 Jun 2019 at 22:40, Hequn Cheng  wrote:
>>> 
 Hi Jincheng,
 
 Thanks a lot for the release which contains so many fixes!
 
 I have done the following checks:
 
 Local Tests
  - Built from source archive successfully.
  - Signatures and hash are correct.
  - All artifacts have been deployed to the maven central repository.
  - Run WordCount(batch&streaming) on Local cluster successfully.
 
 Cluster Tests
 Cluster environment: 7 nodes, jm 1024m, tm 4096m
 Testing Jobs: WordCount(batch&streaming), DataStreamAllroundTestProgram
  - Read and write hdfs file successfully.
  - Run jobs on YARN(with or without session) successfully
  - Job failover and recovery successfully
 
 PR review
 - Left a minor comment. But I think it is not a blocker, we can just
>>> update
 the PR directly.
 
 To sum up, I have not spotted any blockers, so +1(non-binding) from my
 side.
 
 Best, Hequn
 
 On Tue, Jun 25, 2019 at 4:52 PM jincheng sun >> 
 wrote:
 
> Hi everyone,
> 
> Please review and vote on the release candidate 1 for Flink 1.8.1, as
> follows:
> 
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> The complete staging area is available for your review, which
>> includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases
>> to
 be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint 8FEA1EE9D0048C0CCC70B7573211B0703B79EA0E [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.8.1-rc1" [5],
> * website pull request listing the new release [6]
> 
> The vote will be open for at least 72 hours. It is adopted by
>> majority
> approval, with at least 3 PMC affirmative votes.
> 
> Cheers,
> Jincheng
> 
> [1]
> 
> 
 
>>> 
>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345164
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.1-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
 https://repository.apache.org/content/repositories/orgapacheflink-1229
> [5]
> 
> 
 
>>> 
>> https://github.com/apache/flink/commit/11ab983ed20068dac93efe7f234ffab9abc2926e
> [6] https://github.com/apache/flink-web/pull/221
> 
 
>>> 
>> 



Re: [VOTE] Migrate to sponsored Travis account

2019-07-04 Thread Aljoscha Krettek
+1

Aljoscha

> On 4. Jul 2019, at 11:09, Stephan Ewen  wrote:
> 
> +1 to move to a private Travis account.
> 
> I can confirm that Ververica will sponsor a Travis CI plan that is
> equivalent or a bit higher than the previous ASF quota (10 concurrent build
> queues)
> 
> Best,
> Stephan
> 
> On Thu, Jul 4, 2019 at 10:46 AM Chesnay Schepler  wrote:
> 
>> I've raised a JIRA
>> with INFRA to inquire
>> whether it would be possible to switch to a different Travis account,
>> and if so what steps would need to be taken.
>> We need a proper confirmation from INFRA since we are not in full
>> control of the flink repository (for example, we cannot access the
>> settings page).
>> 
>> If this is indeed possible, Ververica is willing sponsor a Travis
>> account for the Flink project.
>> This would provide us with more than enough resources than we need.
>> 
>> Since this makes the project more reliant on resources provided by
>> external companies I would like to vote on this.
>> 
>> Please vote on this proposal, as follows:
>> [ ] +1, Approve the migration to a Ververica-sponsored Travis account,
>> provided that INFRA approves
>> [ ] -1, Do not approach the migration to a Ververica-sponsored Travis
>> account
>> 
>> The vote will be open for at least 24h, and until we have confirmation
>> from INFRA. The voting period may be shorter than the usual 3 days since
>> our current is effectively not working.
>> 
>> On 04/07/2019 06:51, Bowen Li wrote:
>>> Re: > Are they using their own Travis CI pool, or did the switch to an
>>> entirely different CI service?
>>> 
>>> I reached out to Wes and Krisztián from Apache Arrow PMC. They are
>>> currently moving away from ASF's Travis to their own in-house metal
>>> machines at [1] with custom CI application at [2]. They've seen
>>> significant improvement w.r.t both much higher performance and
>>> basically no resource waiting time, "night-and-day" difference quoting
>>> Wes.
>>> 
>>> Re: > If we can just switch to our own Travis pool, just for our
>>> project, then this might be something we can do fairly quickly?
>>> 
>>> I believe so, according to [3] and [4]
>>> 
>>> 
>>> [1] https://ci.ursalabs.org/ 
>>> [2] https://github.com/ursa-labs/ursabot
>>> [3]
>>> https://docs.travis-ci.com/user/migrate/open-source-repository-migration
>>> [4] https://docs.travis-ci.com/user/migrate/open-source-on-travis-ci-com
>>> 
>>> 
>>> 
>>> On Wed, Jul 3, 2019 at 12:01 AM Chesnay Schepler >> > wrote:
>>> 
>>>Are they using their own Travis CI pool, or did the switch to an
>>>entirely different CI service?
>>> 
>>>If we can just switch to our own Travis pool, just for our
>>>project, then
>>>this might be something we can do fairly quickly?
>>> 
>>>On 03/07/2019 05:55, Bowen Li wrote:
 I responded in the INFRA ticket [1] that I believe they are
>>>using a wrong
 metric against Flink and the total build time is a completely
>>>different
 thing than guaranteed build capacity.
 
 My response:
 
 "As mentioned above, since I started to pay attention to Flink's
>>>build
 queue a few tens of days ago, I'm in Seattle and I saw no build
>>>was kicking
 off in PST daytime in weekdays for Flink. Our teammates in China
>>>and Europe
 have also reported similar observations. So we need to evaluate
>>>how the
 large total build time came from - if 1) your number and 2) our
 observations from three locations that cover pretty much a full
>>>day, are
 all true, I **guess** one reason can be that - highly likely the
>>>extra
 build time came from weekends when other Apache projects may be
>>>idle and
 Flink just drains hard its congested queue.
 
 Please be aware of that we're not complaining about the lack of
>>>resources
 in general, I'm complaining about the lack of **stable, dedicated**
 resources. An example for the latter one is, currently even if
>>>no build is
 in Flink's queue and I submit a request to be the queue head in PST
 morning, my build won't even start in 6-8+h. That is an absurd
>>>amount of
 waiting time.
 
 That's saying, if ASF INFRA decides to adopt a quota system and
>>>grants
 Flink five DEDICATED servers that runs all the time only for
>>>Flink, that'll
 be PERFECT and can totally solve our problem now.
 
 Please be aware of that we're not complaining about the lack of
>>>resources
 in general, I'm complaining about the lack of **stable, dedicated**
 resources. An example for the latter one is, currently even if
>>>no build is
 in Flink's queue and I submit a request to be the queue head in PST
 morning, my build won't even start in 6-8+h. That is an absurd
>>>amount of
 waiting time.
 
 
 That's saying, if ASF INFRA decides to adopt a quota s

Re: [VOTE] How to Deal with Split/Select in DataStream API

2019-07-08 Thread Aljoscha Krettek
I think this would benefit from a FLIP, that neatly sums up the options, and 
which then gives us also a point where we can vote and ratify a decision.

As a gut feeling, I most like Option 3). Initially I would have preferred 
option 1) (because of a sense of API purity), but by now I think it’s good that 
users have this simpler option.

Aljoscha 

> On 8. Jul 2019, at 06:39, Xingcan Cui  wrote:
> 
> Hi all,
> 
> Thanks for your participation.
> 
> In this thread, we got one +1 for option 1 and option 3, respectively. In the 
> original thread[1], we got two +1 for option 1, one +1 for option 2, and five 
> +1 and one -1 for option 3.
> 
> To summarize,
> 
> Option 1 (port side output to flatMap and deprecate split/select): three +1
> Option 2 (introduce a new split/select and deprecate existing one): one +1
> Option 3 ("correct" the existing split/select): six +1 and one -1
> 
> It seems that most people involved are in favor of "correcting" the existing 
> split/select. However, this will definitely break the API compatibility, in a 
> subtle way.
> 
> IMO, the real behavior of consecutive split/select's has never been 
> thoroughly clarified. Even in the community, it hard to say that we come into 
> a consensus on its real semantics[2-4]. Though the initial design is not 
> ambiguous, there's no doubt that its concept has drifted. 
> 
> As the split/select is quite an ancient API, I cc'ed this to more members. It 
> couldn't be better if you can share your opinions on this.
> 
> Thanks,
> Xingcan
> 
> [1] 
> https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E
>  
> 
> [2] https://issues.apache.org/jira/browse/FLINK-1772 
> 
> [3] https://issues.apache.org/jira/browse/FLINK-5031 
> 
> [4] https://issues.apache.org/jira/browse/FLINK-11084 
> 
> 
> 
>> On Jul 5, 2019, at 12:04 AM, 杨力 > > wrote:
>> 
>> I prefer the 1) approach. I used to carry fields, which is needed only for 
>> splitting, in the outputs of flatMap functions. Replacing it with outputTags 
>> would simplify data structures.
>> 
>> Xingcan Cui mailto:xingc...@gmail.com> 
>> >> 于 2019年7月5日周五 
>> 上午2:20写道:
>> Hi folks,
>> 
>> Two weeks ago, I started a thread [1] discussing whether we should discard 
>> the split/select methods (which have been marked as deprecation since v1.7) 
>> in DataStream API. 
>> 
>> The fact is, these methods will cause "unexpected" results when using 
>> consecutively (e.g., ds.split(a).select(b).split(c).select(d)) or 
>> multi-times on the same target (e.g., ds.split(a).select(b), 
>> ds.split(c).select(d)). The reason is that following the initial design, the 
>> new split/select logic will always override the existing one on the same 
>> target operator, rather than append to it. Some users may not be aware of 
>> that, but if you do, a current solution would be to use the more powerful 
>> side output feature [2].
>> 
>> FLINK-11084  added some 
>> restrictions to the existing split/select logic and suggest to replace it 
>> with side output in the future. However, considering that the side output is 
>> currently only available in the process function layer and the split/select 
>> could have been widely used in many real-world applications, we'd like to 
>> start a vote andlisten to the community on how to deal with them.
>> 
>> In the discussion thread [1], we proposed three solutions as follows. All of 
>> them are feasible but have different impacts on the public API.
>> 
>> 1) Port the side output feature to DataStream API's flatMap and replace 
>> split/select with it.
>> 
>> 2) Introduce a dedicated function in DataStream API (with the "correct" 
>> behavior but a different name) that can be used to replace the existing 
>> split/select.
>> 
>> 3) Keep split/select but change the behavior/semantic to be "correct".
>> 
>> Note that this is just a vote for gathering information, so feel free to 
>> participate and share your opinions.
>> 
>> The voting time will end on July 7th 17:00 EDT.
>> 
>> Thanks,
>> Xingcan
>> 
>> [1] 
>> https://lists.apache.org/thread.html/f94ea5c97f96c705527dcc809b0e2b69e87a4c5d400cb7c61859e1f4@%3Cdev.flink.apache.org%3E
>>  
>> >  
>> >
>> [2] 
>> h

Re: [DISCUSS] META-FLIP: Sticking (or not) to a strict FLIP voting process

2019-07-09 Thread Aljoscha Krettek
scope for a release rather than just a vague
>>>> list
>>>>> of
>>>>>>>   features that we want to have.
>>>>>>>   - the whole community is on the same page what a certain
>> feature
>>>>> means
>>>>>>>   - the scope does not change drastically during the development
>>>>> period
>>>>>>> 
>>>>>>> As for what should and what should not deserve a FLIP, I actually
>>>> quite
>>>>>>> like the definition in the FLIPs page[1]. I think it does make
>>> sense
>>>> to
>>>>>>> have a FLIP, and as a result a voting process, for any *public*
>> or
>>>>> major
>>>>>>> change. I agree with Gordon. Even if the change is trivial it
>> might
>>>>>> affect
>>>>>>> external systems/users and it is also a commitment from the
>>>> community.
>>>>>>> Therefore I think they deserve a vote.
>>>>>>> 
>>>>>>> Lastly, I think Jark raised a valid point. We should have a clear
>>>>>>> understanding what binding votes in this case mean. I think it
>>> makes
>>>>>> sense
>>>>>>> to consider PMC's and committers' votes as binding for FLIPs
>>> voting.
>>>>>>> Otherwise we would lose the aspect of committing to help with
>>> getting
>>>>> the
>>>>>>> FLIP into the codebase.
>>>>>>> 
>>>>>>> To sum up I would opt for enforcing the lazy majority. I would
>>>> suggest
>>>>> to
>>>>>>> consider constructing a release plan with a list of accepted
>> FLIPs.
>>>>>>> 
>>>>>>> Best,
>>>>>>> 
>>>>>>> Dawid
>>>>>>> 
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals#FlinkImprovementProposals-Whatisconsidereda%22majorchange%22thatneedsaFLIP
>>>>>>> ?
>>>>>>> On 27/06/2019 04:15, Jark Wu wrote:
>>>>>>> 
>>>>>>> +1 for sticking to the lazy majority voting.
>>>>>>> 
>>>>>>> A question from my side, the 3+1 votes are binding votes which
>> only
>>>>>> active
>>>>>>> (i.e. non-emeritus) committers and PMC members have?
>>>>>>> 
>>>>>>> 
>>>>>>> Best,
>>>>>>> Jark
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, 26 Jun 2019 at 19:07, Tzu-Li (Gordon) Tai <
>>>> tzuli...@apache.org
>>>>>> 
>>>>>> 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> +1 to enforcing lazy majority voting for future FLIPs, starting
>>> from
>>>>>> FLIPs
>>>>>>> that are still currently under discussion (by the time we've
>> agreed
>>>> on
>>>>>> the
>>>>>>> FLIP voting process).
>>>>>>> 
>>>>>>> My two cents concerning "what should and shouldn't be a FLIP":
>>>>>>> 
>>>>>>> I can understand Chesnay's argument about how some FLIPs, while
>>>> meeting
>>>>>> the
>>>>>>> criteria defined by the FLIP guidelines, feel to not be
>>> sufficiently
>>>>>> large
>>>>>>> to justify a FLIP.
>>>>>>> As a matter of fact, the FLIP guidelines explicitly mention that
>>>>> "Exposed
>>>>>>> Monitoring Information" is considered public interface; I guess
>>> that
>>>>> was
>>>>>>> why this FLIP came around in the first place.
>>>>>>> I was also hesitant in whether or not the recent FLIP about keyed
>>>> state
>>>>>>> snapshot binary format unification (FLIP-41) deserves to be a
>> FLIP,
>>>>> since
>>>>>>> the complexity of the change is rather small.
>>>>>>> 
>>&g

Re: [DISCUSS] Flink project bylaws

2019-07-11 Thread Aljoscha Krettek
Big +1

How different is this from the Kafka bylaws? I’m asking because I quite like 
them and wouldn’t mind essentially adopting the Kafka bylaws. I mean, it’s open 
source, and we don’t have to try to re-invent the wheel here.

I think it’s worthwhile to discuss the “committer +1” requirement. We don’t 
usually have that now but I would actually be in favour of requiring it, 
although it might make stuff more complicated.

Aljoscha

> On 11. Jul 2019, at 15:31, Till Rohrmann  wrote:
> 
> Thanks a lot for creating this draft Becket.
> 
> I think without the notion of emeritus (or active vs. inactive), it won't
> be possible to have a 2/3 majority vote because we already have too many
> inactive PMCs/committers.
> 
> For the case of a committer being the author and getting a +1 from a
> non-committer: I think a committer should know when to ask another
> committer for feedback or not. Hence, I would not enforce that we strictly
> need a +1 from a committer if the author is a committer but of course
> encourage it if capacities exist.
> 
> Cheers,
> Till
> 
> On Thu, Jul 11, 2019 at 3:08 PM Chesnay Schepler  wrote:
> 
>> The emeritus stuff seems like unnecessary noise.
>> 
>> There's a bunch of subtle changes in the draft compared to existing
>> "conventions"; we should find a way to highlight these and discuss them
>> one by one.
>> 
>> On 11/07/2019 14:29, Robert Metzger wrote:
>>> Thank you Becket for kicking off this discussion and creating a draft in
>>> the Wiki.
>>> 
>>> I left some comments in the wiki.
>>> 
>>> In my understanding this means, that a committer always needs a review
>> and
 +1 from another committer. As far as I know this is currently not always
 the case (often committer authors, contributor reviews & +1s).
>>> 
>>> I would agree to add such a bylaw, if we had cases in the past where code
>>> was not sufficiently reviewed AND we believe that we have enough capacity
>>> to ensure a separate committer's approval.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Thu, Jul 11, 2019 at 9:49 AM Konstantin Knauf <
>> konstan...@ververica.com>
>>> wrote:
>>> 
 Hi all,
 
 thanks a lot for driving this, Becket. I have two remarks regarding the
 "Actions" section:
 
 * In addition to a simple "Code Change" we could also add a row for
>> "Code
 Change requiring a FLIP" with a reference to the FLIP process page. A
>> FLIP
 will have/does have different rules for approvals, etc.
 * For "Code Change" the draft currently requires "one +1 from a
>> committer
 who has not authored the patch followed by a Lazy approval (not counting
 the vote of the contributor), moving to lazy majority if a -1 is
>> received".
 In my understanding this means, that a committer always needs a review
>> and
 +1 from another committer. As far as I know this is currently not always
 the case (often committer authors, contributor reviews & +1s).
 
 I think it is worth thinking about how we can make it easy to follow the
 bylaws e.g. by having more Flink-specific Jira workflows and ticket
>> types +
 corresponding permissions. While this is certainly "Step 2", I believe,
>> we
 really need to make it as easy & transparent as possible, otherwise they
 will be unintentionally broken.
 
 Cheers and thanks,
 
 Konstantin
 
 
 
 On Thu, Jul 11, 2019 at 9:10 AM Becket Qin 
>> wrote:
 
> Hi all,
> 
> As it was raised in the FLIP process discussion thread [1], currently
 Flink
> does not have official bylaws to govern the operation of the project.
 Such
> bylaws are critical for the community to coordinate and contribute
> together. It is also the basis of other processes such as FLIP.
> 
> I have drafted a Flink bylaws page and would like to start a discussion
> thread on this.
> 
 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
> The bylaws will affect everyone in the community. It'll be great to
>> hear
> your thoughts.
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> [1]
> 
> 
 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-META-FLIP-Sticking-or-not-to-a-strict-FLIP-voting-process-td29978.html#none
 
 --
 
 Konstantin Knauf | Solutions Architect
 
 +49 160 91394525
 
 
 Planned Absences: 10.08.2019 - 31.08.2019, 05.09. - 06.09.2010
 
 
 --
 
 Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
 
 --
 
 Ververica GmbH
 Registered at Amtsgericht Charlottenburg: HRB 158244 B
 Managing Directors: Dr. Kostas Tzoumas, Dr. Stephan Ewen
 
>> 
>> 



Re: [DISCUSS] Flink project bylaws

2019-07-12 Thread Aljoscha Krettek
rwise they
>>>>> will be unintentionally broken.
>>>> 
>>>> & Re: Till
>>>> 
>>>>> For the case of a committer being the author and getting a +1 from a
>>>>> non-committer: I think a committer should know when to ask another
>>>>> committer for feedback or not. Hence, I would not enforce that we
>>>> strictly
>>>>> need a +1 from a committer if the author is a committer but of course
>>>>> encourage it if capacities exist.
>>>> 
>>>> I am with Robert and Aljoscha on this.
>>>> 
>>>> I completely understand the concern here. TBH, in Kafka occasionally
>>>> trivial patches from committers are still merged without following the
>>>> cross-review requirement, but it is rare. That said, I still think an
>>>> additional committer's review makes sense due to the following reasons:
>>>> 1. The bottom line here is that we need to make sure every patch is
>>>> reviewed with a high quality. This is a little difficult to guarantee if
>>>> the review comes from a contributor for many reasons. In some cases, a
>>>> contributor may not have enough knowledge about the project to make a good
>>>> judgement. Also sometimes the contributors are more eagerly to get a
>>>> particular issue fixed, so they are willing to lower the review bar.
>>>> 2. One byproduct of such cross review among committers, which I personally
>>>> feel useful, is that it helps gradually form consistent design principles
>>>> and code style. This is because the committers will know how the other
>>>> committers are writing code and learn from each other. So they tend to
>>>> reach some tacit understanding on how things should be done in general.
>>>> 
>>>> Another way to think about this is to consider the following two scenarios:
>>>> 1. Reviewing a committer's patch takes a lot of iterations. Then the patch
>>>> needs to be reviewed even if it takes time because there are things
>>>> actually needs to be clarified / changed.
>>>> 2. Reviewing a committer's patch is very smooth and quick, so the patch is
>>>> merged soon. Then reviewing such a patch does not take much time.
>>>> 
>>>> Letting another committer review the patch from a committer falls either in
>>>> case 1 or case 2. The best option here is to review the patch because
>>>> If it is case 1, the patch actually needs to be reviewed.
>>>> If it is case 2, the review should not take much time anyways.
>>>> 
>>>> In the contrast, we will risk encounter case 1 if we skip the cross-review.
>>>> 
>>>> 
>>>> Re: Robert
>>>> 
>>>> I replied to your comments in the wiki and made the following modifications
>>>> to resolve some of your comments:
>>>> 1. Added a release manager role section.
>>>> 2. changed the name of "lazy consensus" to "consensus" to align with
>>>> general definition of Apache glossary and other projects.
>>>> 3. review board  -> pull request
>>>> 
>>>> -
>>>> Re: Chesnay
>>>> 
>>>> The emeritus stuff seems like unnecessary noise.
>>>> As Till mentioned, this is to make sure 2/3 majority is still feasible in
>>>> practice.
>>>> 
>>>> There's a bunch of subtle changes in the draft compared to existing
>>>>> "conventions"; we should find a way to highlight these and discuss them
>>>>> one by one.
>>>> That is a good suggestion. I am not familiar enough with the current Flink
>>>> convention. Will you help on this? I saw you commented on some part in the
>>>> wiki. Are those complete?
>>>> 
>>>> --
>>>> Re: Aljoscha
>>>> 
>>>> How different is this from the Kafka bylaws? I’m asking because I quite
>>>>> like them and wouldn’t mind essentially adopting the Kafka bylaws. I
>>>> mean,
>>>>> it’s open source, and we don’t have to try to re-invent the wheel here.
>>>> Ha, you got me on this. The first version of the draft was almost identical
>>>> to Kafka. But Robert has already caught a few inconsistent places. So it
>>>> might still worth going through it to make sure we truly agree on them.
>>>> Otherwise we may e

Re: [FLIP-47] Savepoints vs Checkpoints

2019-07-12 Thread Aljoscha Krettek
Hi,

Sorry for the quite late response!

I initially understood FLIP-45 [0] more as a “allow user to 
stop-with-checkpoint”, that’s why I didn’t think too much about the other 
things it mentions like semantics of savepoints and checkpoints. I thought that 
the “stop-with-checkpoint” would work very well together with the ideas 
proposed in FLIP-47 [2]. Before FLIP-47 we were somewhat adamant about the 
distinction between savepoints and checkpoints, especially that the former 
should be user controlled and the latter should be system controlled. That 
wouldn’t work well for “stop-with-checkpoints” because we would further weaken 
that separation (that is already weakened by externalised retained 
checkpoints). If we said (as FLIP-47 proposes) that there are only “snapshots” 
and they have certain properties like formats or whether they are user 
controlled, “stop-with-checkpoint” would fit neatly into that model, i.e. the 
snapshot created from “stop-with-checkpoint” (which should just be called 
“stop”) would be user controlled and in the format that is configured anyways 
for the backend (which could be incremental). That last thing is important for 
users, because this is the whole motivation for “stop-with-checkpoint”, i.e. 
users want a quicker way of doing a clean stop that is not as heavy as 
“stop-with-savepoint”.

I think we somehow have to converge on something that we all like about how we 
want to treat savepoints and checkpoints going forward. I think it’s important 
to look at it from the users perspective. Right now I see two possible things 
that users want:

 - faster way of stopping than “stop-with-savepoint”
 - potentially less confusion between what exactly is a checkpoint, savepoint, 
externalised retained checkpoint, etc.

For me, the first is more important than the second, but the second is also 
important to tackle.

What do you think? I have to read a bit more Yu’s email and FLIP-45 again and 
think about them, then I can have a more educated opinion.

Aljoscha

[0] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-45%3A+Reinforce+Job+Stop+Semantic
 

[1] https://issues.apache.org/jira/browse/FLINK-12619 

[2] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-47%3A+Checkpoints+vs.+Savepoints
 


> On 10. Jul 2019, at 11:05, Yu Li  wrote:
> 
> Hi all,
> 
> Please allow me to throw some points in combination of FLIP-45 [1] for
> discussing, and please don't be confused if some of them are inconsistent
> or even opposite to current proposals in FLIP-47 (with me as a co-author),
> because as Kostas pointed out, the discussion is still in progress and
> hasn't reached to a consensus, but we all agreed to move it forward to
> public to collect more feedbacks.
> 
> FLIP-45 and FLIP-47 all touches the checkpoint and savepoint concept clean
> up but in two different ways, and below are my understanding about their
> variance and pros/cons:
> 
> * FLIP-45 proposes to map the concepts of Flink checkpoint and savepoint to
> database checkpoint and backup, furthermore the periodic system-triggered
> checkpoint to flurry [2] checkpoint and the stop-with-checkpoint to sharp
> [3] checkpoint. And mentions whether we should introduce a Flink concept
> relative to database snapshot, which IMHO we could use FLIP-47 as a good
> start for discussion.
> 
>   - Pros
>  - No change from user perspective, both conceptually and physically,
>  thus no additional education cost. (Semantic correction are mainly for
>  developer to understand)
>  - Concept mapping to a mature system (database) could help to make it
>  clear, as well as facilitating implement and explain db-like functions in
>  future, such as FLIP-43 [4] and streaming ledger [5]
>   - Cons
>  - Less beneficial for developers with no database experience (need to
>  learn database concepts to understand Flink's)
>  - One may argue that Flink is Flink (stream processing engine), not
>  database
> 
> 
> * FLIP-47 proposes to unify the concepts of Flink checkpoint and savepoint
> to snapshot, with a unified command.
> 
>   - Pros
>  - Pure Flink concepts, no additional cost to learn/compare concepts
>  in other systems
>  - Unified semantic from developer perspective
>   - Cons
>  - Detectable change from user perspective, need to re-map the
>  existing checkpoint/savepoint use cases to new commands
> - Currently: checkpoint for failover, savepoint for
> 
> upgrade/state-migration/switch-backend/import-export/blue-red-deployment
> - Future: every use case to newly introduced command, for example
> (the command format is just pseudo):
>- Command format:
>- take snapshot [mode] [format]
> -

Re: subscribe flink dev mail list

2019-07-18 Thread Aljoscha Krettek
Hi,

To subscribe to the dev mailing list you have to send an email to 
dev-subscr...@flink.apache.org

Aljoscha

> On 18. Jul 2019, at 10:58, venn  wrote:
> 
> I'am glad to subscribe the dev mail list
> 



Re: [DISCUSS] Support temporary tables in SQL API

2019-07-23 Thread Aljoscha Krettek
I would be fine with option 3) but I think option 2) is the more implicit 
solution that has less surprising behaviour.

Aljoscha

> On 22. Jul 2019, at 23:59, Xuefu Zhang  wrote:
> 
> Thanks to Dawid for initiating the discussion. Overall, I agree with Timo
> that for 1.9 we should have some quick and simple solution, leaving time
> for more thorough discussions for 1.10.
> 
> In particular, I'm not fully with solution #1. For one thing, it seems
> proposing storing all temporary objects in a memory map in CatalogManager,
> and the memory map duplicates the functionality of the in-memory catalog,
> which also store temporary objects. For another, as pointed out by the
> google doc, different db may handle the temporary tables differently, and
> accordingly it may make more sense to let each catalog to handle its
> temporary objects.
> 
> Therefore, postponing the fix buys us time to flush out all the details.
> 
> Thanks,
> Xuefu
> 
> On Mon, Jul 22, 2019 at 7:19 AM Timo Walther  wrote:
> 
>> Thanks for summarizing our offline discussion Dawid! Even though I would
>> prefer solution 1 instead of releasing half-baked features, I also
>> understand that the Table API should not further block the next release.
>> Therefore, I would be fine with solution 3 but introduce the new
>> user-facing `createTemporaryTable` methods as synonyms of the existing
>> ones already. This allows us to deprecate the methods with undefined
>> behavior as early as possible.
>> 
>> Thanks,
>> Timo
>> 
>> 
>> Am 22.07.19 um 16:13 schrieb Dawid Wysakowicz:
>>> Hi all,
>>> 
>>> When working on FLINK-13279[1] we realized we could benefit from a
>>> better temporary objects support in the Catalog API/Table API.
>>> Unfortunately we are already long past the feature freeze that's why I
>>> wanted to get some opinions from the community how should we proceed
>>> with this topic. I tried to prepare a summary of the current state and 3
>>> different suggested approaches that we could take. Please see the
>>> attached document[2]
>>> 
>>> I will appreciate your thoughts!
>>> 
>>> 
>>> [1] https://issues.apache.org/jira/browse/FLINK-13279
>>> 
>>> [2]
>>> 
>> https://docs.google.com/document/d/1RxLj4tDB9GXVjF5qrkM38SKUPkvJt_BSefGYTQ-cVX4/edit?usp=sharing
>>> 
>>> 
>> 
>> 



Re: [DISCUSS] Support temporary tables in SQL API

2019-07-24 Thread Aljoscha Krettek
Isn’t https://issues.apache.org/jira/browse/FLINK-13279 
<https://issues.apache.org/jira/browse/FLINK-13279> already a sign that there 
are surprises for users if we go with option #3?

Aljoscha

> On 24. Jul 2019, at 00:33, Xuefu Z  wrote:
> 
> I favored #3 if that wasn't obvious.
> 
> Usability issue with #2 makes Hive  too hard to use. #3 keeps the old
> behavior for existing users who don't have Hive and thus there is only one,
> in-memory catalog. If a user does register Hive, he/she understands that
> there are multiple catalogs and that fully qualified table name is
> necessary. Thus, #3 has no impact (and no surprises) for existing users,
> and new requirement of fully qualified names is for only for users of the
> new feature (multiple catalogs), which seems very natural.
> 
> Thanks,
> Xuefu
> 
> On Tue, Jul 23, 2019 at 5:47 AM Dawid Wysakowicz  <mailto:dwysakow...@apache.org>>
> wrote:
> 
>> I think we all agree so far that we should implement one of the short term
>> solutions for 1.9 release (#2 or #3) and continue the discussion on option
>> #1 for the next release. Personally I prefer option #2, because it is
>> closest to the current behavior and as Kurt said it is the most intuitive
>> one, but I am also fine with option #3
>> 
>> To sum up the opinions so far:
>> 
>> *option #2: 3 votes(Kurt, Aljoscha, me)*
>> 
>> *option #3: 2 votes(Timo, Jingsong)*
>> 
>> I wasn't sure which option out of the two Xuefu prefers.
>> 
>> I would like to conclude the discussion by the end of tomorrow, so that we
>> can prepare a proper fix as soon as possible. Therefore I would suggest to
>> proceed with the option that gets the most votes until tomorrow (*July
>> 24th 12:00 CET*), unless there are some hard objections.
>> 
>> 
>> Comment on option #1 concerns:
>> 
>> I agree with Jingsong on that. I think there are some benefits of the
>> approach, as it makes Flink in control of the temporary tables.
>> 
>> 1. We have a unified behavior across all catalogs. Also for the catalogs
>> that do not support temporary tables natively.
>> 
>> 2. As Flink is in control of the temporary tables it makes it easier to
>> control their lifecycle.
>> 
>> Best,
>> 
>> Dawid
>> On 23/07/2019 11:40, JingsongLee wrote:
>> 
>> And I think we should recommend user to use catalog api to
>> createTable and createFunction,(I guess most scenarios do
>> not use temporary objects) in this way, it is good to option #3
>> 
>> Best, JingsongLee
>> 
>> 
>> --
>> From:JingsongLee > <mailto:lzljs3620...@aliyun.com.INVALID>> > <mailto:lzljs3620...@aliyun.com.INVALID>>
>> Send Time:2019年7月23日(星期二) 17:35
>> To:dev mailto:dev@flink.apache.org>> 
>> mailto:dev@flink.apache.org>>
>> Subject:Re: [DISCUSS] Support temporary tables in SQL API
>> 
>> Thanks Dawid and other people.
>> +1 for using option #3 for 1.9.0 and go with option #1
>> in 1.10.0.
>> 
>> Regarding Xuefu's concern, I don't know how necessary it is for each catalog 
>> to
>> deal with tmpView. I think Catalog is different from DB, we can have single 
>> concept for tmpView, that make user easier to understand.
>> 
>> Regarding option #2, It is hard to use if we let user to use fully qualified 
>> name for hive catalog. Would this experience be too bad to use?
>> 
>> Best, Jingsong Lee
>> 
>> 
>> --
>> From:Kurt Young mailto:ykt...@gmail.com>> 
>> mailto:ykt...@gmail.com>>
>> Send Time:2019年7月23日(星期二) 17:03
>> To:dev mailto:dev@flink.apache.org>> 
>> mailto:dev@flink.apache.org>>
>> Subject:Re: [DISCUSS] Support temporary tables in SQL API
>> 
>> Thanks Dawid for driving this discussion.
>> Personally, I would +1 for using option #2 for 1.9.0 and go with option #1
>> in 1.10.0.
>> 
>> Regarding Xuefu's concern about option #1, I think we could also try to
>> reuse the in-memory catalog
>> for the builtin temporary table storage.
>> 
>> Regarding to option #2 and option #3, from user's perspective, IIUC option
>> #2 allows user to have
>> simple name to reference temporary table and should use fully qualified
>> name for external catalogs.
>> But option #3 provide the opposite behavior, user can use simple name for
>> external tables after he
>> changed current catalog an

Re: [DISCUSS] Support temporary tables in SQL API

2019-07-25 Thread Aljoscha Krettek
Thanks for pushing the discussion, Dawid! I’m also fine with option #3.

Aljoscha

> On 24. Jul 2019, at 12:04, Dawid Wysakowicz  wrote:
> 
> Hi all,
> 
> Thank you Xuefu for clarifying your opinion. Now we have 3 votes for both of 
> the options. To conclude this discussion I am willing to change my vote to 
> option 3 as I had only a slight preference towards option 2.
> 
> Therefore the final results of the poll are as follows:
> 
> option #2: 2 votes(Kurt, Aljoscha)
> 
> option #3: 4 votes(Timo, Jingsong, Xuefu, me)
> 
> I will prepare appropriate PRs according to the decision (unless somebody 
> objects). We will revisit the long-term solution in a separate thread as part 
> of the 1.10 release after 1.9 is released.
> 
> Thank you all for your opinions!
> 
> Best,
> 
> Dawid
> 
> On 24/07/2019 09:35, Aljoscha Krettek wrote:
>> Isn’t https://issues.apache.org/jira/browse/FLINK-13279 
>> <https://issues.apache.org/jira/browse/FLINK-13279> 
>> <https://issues.apache.org/jira/browse/FLINK-13279> 
>> <https://issues.apache.org/jira/browse/FLINK-13279> already a sign that 
>> there are surprises for users if we go with option #3?
>> 
>> Aljoscha
>> 
>>> On 24. Jul 2019, at 00:33, Xuefu Z  
>>> <mailto:usxu...@gmail.com> wrote:
>>> 
>>> I favored #3 if that wasn't obvious.
>>> 
>>> Usability issue with #2 makes Hive  too hard to use. #3 keeps the old
>>> behavior for existing users who don't have Hive and thus there is only one,
>>> in-memory catalog. If a user does register Hive, he/she understands that
>>> there are multiple catalogs and that fully qualified table name is
>>> necessary. Thus, #3 has no impact (and no surprises) for existing users,
>>> and new requirement of fully qualified names is for only for users of the
>>> new feature (multiple catalogs), which seems very natural.
>>> 
>>> Thanks,
>>> Xuefu
>>> 
>>> On Tue, Jul 23, 2019 at 5:47 AM Dawid Wysakowicz >> <mailto:dwysakow...@apache.org> <mailto:dwysakow...@apache.org> 
>>> <mailto:dwysakow...@apache.org>>
>>> wrote:
>>> 
>>>> I think we all agree so far that we should implement one of the short term
>>>> solutions for 1.9 release (#2 or #3) and continue the discussion on option
>>>> #1 for the next release. Personally I prefer option #2, because it is
>>>> closest to the current behavior and as Kurt said it is the most intuitive
>>>> one, but I am also fine with option #3
>>>> 
>>>> To sum up the opinions so far:
>>>> 
>>>> *option #2: 3 votes(Kurt, Aljoscha, me)*
>>>> 
>>>> *option #3: 2 votes(Timo, Jingsong)*
>>>> 
>>>> I wasn't sure which option out of the two Xuefu prefers.
>>>> 
>>>> I would like to conclude the discussion by the end of tomorrow, so that we
>>>> can prepare a proper fix as soon as possible. Therefore I would suggest to
>>>> proceed with the option that gets the most votes until tomorrow (*July
>>>> 24th 12:00 CET*), unless there are some hard objections.
>>>> 
>>>> 
>>>> Comment on option #1 concerns:
>>>> 
>>>> I agree with Jingsong on that. I think there are some benefits of the
>>>> approach, as it makes Flink in control of the temporary tables.
>>>> 
>>>> 1. We have a unified behavior across all catalogs. Also for the catalogs
>>>> that do not support temporary tables natively.
>>>> 
>>>> 2. As Flink is in control of the temporary tables it makes it easier to
>>>> control their lifecycle.
>>>> 
>>>> Best,
>>>> 
>>>> Dawid
>>>> On 23/07/2019 11:40, JingsongLee wrote:
>>>> 
>>>> And I think we should recommend user to use catalog api to
>>>> createTable and createFunction,(I guess most scenarios do
>>>> not use temporary objects) in this way, it is good to option #3
>>>> 
>>>> Best, JingsongLee
>>>> 
>>>> 
>>>> --
>>>> From:JingsongLee >>> <mailto:lzljs3620...@aliyun.com.INVALID> 
>>>> <mailto:lzljs3620...@aliyun.com.INVALID> 
>>>> <mailto:lzljs3620...@aliyun.com.INVALID>> >>> <mailto:lzljs3620...@aliyun.com.INVALID> 
>>>> <mailto:lzljs3620...@aliyun.com.INVALID> 
>>

Re: flink-mapr-fs failed in travis

2019-07-29 Thread Aljoscha Krettek
I’m seeing the same issue when building this locally. I’ll start a DISCUSS 
thread so see about just removing the flink-mapr-fs module.

Aljoscha

> On 24. Jul 2019, at 05:00, JingsongLee  
> wrote:
> 
> Sorry, 
> "It looks like it's been compiled all the time." 
> should be: "It looks like it can not be compiled all the time in local."
> 
> Best, Jingsong Lee
> 
> 
> --
> From:JingsongLee 
> Send Time:2019年7月24日(星期三) 10:54
> To:Jark Wu ; dev ; chesnay 
> 
> Subject:Re: flink-mapr-fs failed in travis
> 
> Hi @chesnay :
> Thanks for fix on travis, Do you have any idea about why we build failed in 
> local? 
> It looks like it's been compiled all the time.
> (It should have nothing to do with your previous changes)
> 
> Caused by: org.eclipse.aether.transfer.ArtifactTransferException: Could not 
> transfer artifact com.mapr.hadoop:maprfs:pom:5.2.1-mapr from/to mapr-releases 
> (https://repository.mapr.com/maven/): 
> sun.security.validator.ValidatorException: PKIX path building failed: 
> sun.security.provider.certpath.SunCertPathBuilderException: unable to find 
> valid certification path to requested target
> [1] 
> https://stackoverflow.com/questions/57106180/unable-to-build-flink-from-sources-due-to-mapr-artifacts-problems
> 
> Best, Jingsong Lee
> 
> 
> --
> From:Jark Wu 
> Send Time:2019年7月19日(星期五) 18:47
> To:dev 
> Cc:JingsongLee 
> Subject:Re: flink-mapr-fs failed in travis
> 
> Great! Thanks Chesnay for the quick fixing.
> 
> 
> On Fri, 19 Jul 2019 at 16:40, Chesnay Schepler  wrote:
> I think I found the issue; I forgot to update travis_controller.sh .
> 
> On 19/07/2019 10:02, Chesnay Schepler wrote:
>> Ah, I added it to the common options in the travis_manv_watchdog.sh .
>> 
>> On 19/07/2019 09:58, Chesnay Schepler wrote:
>>> I did modify the .travis.yml do activate the unsafe-mapr-repo 
>>> profile; did I modified the wrong profile?...
>>> 
>>> 
>>> On 19/07/2019 07:57, Jark Wu wrote:
 It seems that it is introduced by this commit:
 https://github.com/apache/flink/commit/5c36c650e6520d92191ce2da33f7dcae774319f6
  
 
 Hi @Chesnay Schepler  , do we need to add
 "-Punsafe-mapr-repo" to the ".travis.yml"?
 
 Best,
 Jark
 
 On Fri, 19 Jul 2019 at 10:58, JingsongLee 
 
 wrote:
 
> Hi everyone:
> 
> flink-mapr-fs failed in travis, and I retried many times, and also 
> failed.
> Anyone has idea about this?
> 
> 01:32:54.755 [ERROR] Failed to execute goal on project flink-mapr-fs:
> Could not resolve dependencies for project
> org.apache.flink:flink-mapr-fs:jar:1.10-SNAPSHOT: Failed to collect
> dependencies at com.mapr.hadoop:maprfs:jar:5.2.1-mapr: Failed to read
> artifact descriptor for com.mapr.hadoop:maprfs:jar:5.2.1-mapr: 
> Could not
> transfer artifact com.mapr.hadoop:maprfs:pom:5.2.1-mapr from/to
> mapr-releases (https://repository.mapr.com/maven/):
> sun.security.validator.ValidatorException: PKIX path building failed:
> sun.security.provider.certpath.SunCertPathBuilderException: unable 
> to find
> valid certification path to requested target -> [Help 1]
> 
> https://api.travis-ci.org/v3/job/560790299/log.txt
> 
> Best, Jingsong Lee
>>> 
>>> 
>>> 
>> 
>> 
> 
> 



[DISCUSS] Removing the flink-mapr-fs module

2019-07-29 Thread Aljoscha Krettek
Hi,

Because of recent problems in the dependencies of that module [1] I would 
suggest that we remove it. If people are using it, they can use the one from 
Flink 1.8.

What do you think about it? It would a) solve the dependency problem and b) 
make our build a tiny smidgen more lightweight.

Aljoscha

[1] 
https://lists.apache.org/thread.html/16c16db8f4b94dc47e638f059fd53be936d5da423376a8c1092eaad1@%3Cdev.flink.apache.org%3E

Re: [DISCUSS] Removing the flink-mapr-fs module

2019-07-29 Thread Aljoscha Krettek
If we remove it, that would mean it’s not supported in Flink 1.9.0, yes. Or we 
only remove it in Flink 1.10.0.

Aljoscha

> On 29. Jul 2019, at 11:35, Biao Liu  wrote:
> 
> Hi Aljoscha,
> 
> Does it mean the MapRFileSystem is no longer supported since 1.9.0?
> 
> On Mon, Jul 29, 2019 at 5:19 PM Ufuk Celebi  wrote:
> 
>> +1
>> 
>> 
>> On Mon, Jul 29, 2019 at 11:06 AM Jeff Zhang  wrote:
>> 
>>> +1 to remove it.
>>> 
>>> Aljoscha Krettek  于2019年7月29日周一 下午5:01写道:
>>> 
>>>> Hi,
>>>> 
>>>> Because of recent problems in the dependencies of that module [1] I
>> would
>>>> suggest that we remove it. If people are using it, they can use the one
>>>> from Flink 1.8.
>>>> 
>>>> What do you think about it? It would a) solve the dependency problem
>> and
>>>> b) make our build a tiny smidgen more lightweight.
>>>> 
>>>> Aljoscha
>>>> 
>>>> [1]
>>>> 
>>> 
>> https://lists.apache.org/thread.html/16c16db8f4b94dc47e638f059fd53be936d5da423376a8c1092eaad1@%3Cdev.flink.apache.org%3E
>>> 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>>> 
>> 



Re: [DISCUSS] Removing the flink-mapr-fs module

2019-07-29 Thread Aljoscha Krettek
Just FYI, the MapRFileSystem does have some additional code on our 
(Hadoop)FileSystem class, so it might not be straightforward to use MapR with 
our vanilla HadoopFileSystem.

(Still staying that we should remove it, though).

Aljoscha

> On 29. Jul 2019, at 15:16, Simon Su  wrote:
> 
> +1 to remove it.
> 
> 
> Thanks,
> SImon
> 
> 
> On 07/29/2019 21:00,Till Rohrmann wrote:
> +1 to remove it.
> 
> On Mon, Jul 29, 2019 at 1:27 PM Stephan Ewen  wrote:
> 
> +1 to remove it
> 
> One should still be able to use MapR in the same way as any other vendor
> Hadoop distribution.
> 
> On Mon, Jul 29, 2019 at 12:22 PM JingsongLee
>  wrote:
> 
> +1 for removing it. We never run mvn clean test success in China with
> mapr-fs...
> Best, Jingsong Lee
> 
> 
> --
> From:Biao Liu 
> Send Time:2019年7月29日(星期一) 12:05
> To:dev 
> Subject:Re: [DISCUSS] Removing the flink-mapr-fs module
> 
> +1 for removing it.
> 
> Actually I encountered this issue several times. I thought it might be
> blocked by firewall of China :(
> 
> BTW, I think it should be included in release notes.
> 
> 
> On Mon, Jul 29, 2019 at 5:37 PM Aljoscha Krettek 
> wrote:
> 
> If we remove it, that would mean it’s not supported in Flink 1.9.0,
> yes.
> Or we only remove it in Flink 1.10.0.
> 
> Aljoscha
> 
> On 29. Jul 2019, at 11:35, Biao Liu  wrote:
> 
> Hi Aljoscha,
> 
> Does it mean the MapRFileSystem is no longer supported since 1.9.0?
> 
> On Mon, Jul 29, 2019 at 5:19 PM Ufuk Celebi  wrote:
> 
> +1
> 
> 
> On Mon, Jul 29, 2019 at 11:06 AM Jeff Zhang 
> wrote:
> 
> +1 to remove it.
> 
> Aljoscha Krettek  于2019年7月29日周一 下午5:01写道:
> 
> Hi,
> 
> Because of recent problems in the dependencies of that module [1]
> I
> would
> suggest that we remove it. If people are using it, they can use
> the
> one
> from Flink 1.8.
> 
> What do you think about it? It would a) solve the dependency
> problem
> and
> b) make our build a tiny smidgen more lightweight.
> 
> Aljoscha
> 
> [1]
> 
> 
> 
> 
> 
> https://lists.apache.org/thread.html/16c16db8f4b94dc47e638f059fd53be936d5da423376a8c1092eaad1@%3Cdev.flink.apache.org%3E
> 
> 
> 
> --
> Best Regards
> 
> Jeff Zhang
> 
> 
> 
> 
> 
> 



Re: [DISCUSS][CODE STYLE] Usage of Java Optional

2019-08-05 Thread Aljoscha Krettek
Hi,

I’m also in favour of using Optional only for method return values. I could be 
persuaded to allow them for parameters of internal methods but I’m sceptical 
about that.

Aljoscha

> On 2. Aug 2019, at 15:35, Yu Li  wrote:
> 
> TL; DR: I second Timo that we should use Optional only as method return
> type for non-performance critical code.
> 
> From the example given on our AvroFactory [1] I also noticed that Jetbrains
> marks the OptionalUsedAsFieldOrParameterType inspection as a warning. It's
> relatively easy to understand why it's not suggested to use (java.util)
> Optional as a field since it's not serializable. What made me feel curious
> is that why we shouldn't use it as a parameter type, so I did some
> investigation and here is what I found:
> 
> There's a JB blog talking about java8 top tips [2] where we could find the
> advice around Optional, there I found another blog telling about the
> pragmatic approach of using Optional [3]. Reading further we could see the
> reason why we shouldn't use Optional as parameter type, please allow me to
> quote here:
> 
> It is often the case that domain objects hang about in memory for a fair
> while, as processing in the application occurs, making each optional
> instance rather long-lived (tied to the lifetime of the domain object). By
> contrast, the Optionalinstance returned from the getter is likely to be
> very short-lived. The caller will call the getter, interpret the result,
> and then move on. If you know anything about garbage collection you'll know
> that the JVM handles these short-lived objects well. In addition, there is
> more potential for hotspot to remove the costs of the Optional instance
> when it is short lived. While it is easy to claim this is "premature
> optimization", as engineers it is our responsibility to know the limits and
> capabilities of the system we work with and to choose carefully the point
> where it should be stressed.
> 
> And there's another JB blog about code smell on Null [4], which I'd also
> suggest to read(smile).
> 
> [1]
> https://github.com/apache/flink/blob/master/flink-formats/flink-avro/src/main/java/org/apache/flink/formats/avro/typeutils/AvroFactory.java#L95
> [2] https://blog.jetbrains.com/idea/2016/07/java-8-top-tips/
> [3] https://blog.joda.org/2015/08/java-se-8-optional-pragmatic-approach.html
> [4] https://blog.jetbrains.com/idea/2017/08/code-smells-null/
> 
> Best Regards,
> Yu
> 
> 
> On Fri, 2 Aug 2019 at 14:54, JingsongLee 
> wrote:
> 
>> Hi,
>> First, Optional is just a wrapper, just like boxed value. So as long as
>> it's
>> not a field level operation, I think it is OK to performance.
>> I think guava optional has a good summary to the uses. [1]
>>> As a method return type, as an alternative to returning null to indicate
>> that no value was available
>>> To distinguish between "unknown" (for example, not present in a map)
>> and "known to have no value"
>>> To wrap nullable references for storage in a collection that does not
>> support
>> The latter two points seem reasonable, but they have few scenes.
>> 
>> [1]
>> https://github.com/google/guava/blob/master/guava/src/com/google/common/base/Optional.java
>> 
>> Best,
>> Jingsong Lee
>> 
>> 
>> --
>> From:Timo Walther 
>> Send Time:2019年8月2日(星期五) 14:12
>> To:dev 
>> Subject:Re: [DISCUSS][CODE STYLE] Usage of Java Optional
>> 
>> Hi everyone,
>> 
>> I would vote for using Optional only as method return type for
>> non-performance critical code. Nothing more. No fields, no method
>> parameters. Method parameters can be overloaded and internally a class
>> can work with nulls and @Nullable. Optional is meant for API method
>> return types and I think we should not abuse it and spam the code with
>> `@SuppressWarnings("OptionalUsedAsFieldOrParameterType")`.
>> 
>> Regards,
>> 
>> Timo
>> 
>> 
>> 
>> Am 02.08.19 um 11:08 schrieb Biao Liu:
>>> Hi Jark & Zili,
>>> 
>>> I thought it means "Optional should not be used for class fields".
>> However
>>> now I get a bit confused about the edited version.
>>> 
>>> Anyway +1 to "Optional should not be used for class fields"
>>> 
>>> Thanks,
>>> Biao /'bɪ.aʊ/
>>> 
>>> 
>>> 
>>> On Fri, Aug 2, 2019 at 5:00 PM Zili Chen  wrote:
>>> 
 Hi Jark,
 
 Follow your opinion, for class field, we can make
 use of @Nullable/@Nonnull annotation or Flink's
 SerializableOptional. It would be sufficient.
 
 Best,
 tison.
 
 
 Jark Wu  于2019年8月2日周五 下午4:57写道:
 
> Hi Andrey,
> 
> I have some concern on point (3) "even class fields as e.g. optional
 config
> options with implicit default values".
> 
> Regarding to the Oracle's guide (4) "Optional should not be used for
 class
> fields".
> And IntelliJ IDEA also report warnings if a class field is Optional,
> because Optional is not serializable.
> 
> 
> Do we allow Optional as class field only if the class

Re: [VOTE] Publish the PyFlink into PyPI

2019-08-05 Thread Aljoscha Krettek
+1 (binding)

> On 1. Aug 2019, at 17:23, Hequn Cheng  wrote:
> 
> +1 (non-binding)
> 
> Thanks a lot for driving this! @jincheng sun 
> 
> Best, Hequn
> 
> On Thu, Aug 1, 2019 at 11:00 PM Biao Liu  wrote:
> 
>> Thanks Jincheng for working on this.
>> 
>> +1 (non-binding)
>> 
>> Thanks,
>> Biao /'bɪ.aʊ/
>> 
>> 
>> 
>> On Thu, Aug 1, 2019 at 8:55 PM Jark Wu  wrote:
>> 
>>> +1 (non-binding)
>>> 
>>> Cheers,
>>> Jark
>>> 
>>> On Thu, 1 Aug 2019 at 17:45, Yu Li  wrote:
>>> 
 +1 (non-binding)
 
 Thanks for driving this!
 
 Best Regards,
 Yu
 
 
 On Thu, 1 Aug 2019 at 11:41, Till Rohrmann 
>> wrote:
 
> +1
> 
> Cheers,
> Till
> 
> On Thu, Aug 1, 2019 at 10:39 AM vino yang 
>>> wrote:
> 
>> +1 (non-binding)
>> 
>> Jeff Zhang  于2019年8月1日周四 下午4:33写道:
>> 
>>> +1 (non-binding)
>>> 
>>> Stephan Ewen  于2019年8月1日周四 下午4:29写道:
>>> 
 +1 (binding)
 
 On Thu, Aug 1, 2019 at 9:52 AM Dian Fu 
> wrote:
 
> Hi Jincheng,
> 
> Thanks a lot for driving this.
> +1 (non-binding).
> 
> Regards,
> Dian
> 
>> 在 2019年8月1日,下午3:24,jincheng sun 
>>> 写道:
>> 
>> Hi all,
>> 
>> Publish the PyFlink into PyPI is very important for our
>> user,
>> Please
 vote
>> on the following proposal:
>> 
>> 1. Create PyPI Project for Apache Flink Python API, named:
 "apache-flink"
>> 2. Release one binary with the default Scala version same
>>> with
>> flink
>> default config.
>> 3. Create an account, named "pyflink" as owner(only PMC can
> manage
>>> it).
> PMC
>> can add account for the Release Manager, but Release
>> Manager
 can
>> not
> delete
>> the release.
>> 
>> [ ] +1, Approve the proposal.
>> [ ] -1, Disapprove the proposal, because ...
>> 
>> The vote will be open for at least 72 hours. It is adopted
>>> by a
>>> simple
>> majority with a minimum of three positive votes.
>> 
>> See discussion threads for more details [1].
>> 
>> Thanks,
>> Jincheng
>> 
>> [1]
>> 
> 
 
>>> 
>> 
> 
 
>>> 
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Publish-the-PyFlink-into-PyPI-td30095.html
> 
> 
 
>>> 
>>> 
>>> --
>>> Best Regards
>>> 
>>> Jeff Zhang
>>> 
>> 
> 
 
>>> 
>> 



Re: [VOTE] Flink Project Bylaws

2019-08-12 Thread Aljoscha Krettek
+1

> On 11. Aug 2019, at 10:07, Becket Qin  wrote:
> 
> Hi all,
> 
> I would like to start a voting thread on the project bylaws of Flink. It
> aims to help the community coordinate more smoothly. Please see the bylaws
> wiki page below for details.
> 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
> 
> The discussion thread is following:
> 
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-project-bylaws-td30409.html
> 
> The vote will be open for at least 6 days. PMC members' votes are
> considered as binding. The vote requires 2/3 majority of the binding +1s to
> pass.
> 
> Thanks,
> 
> Jiangjie (Becket) Qin



Re: [DISCUSS] Drop stale class Program

2019-08-14 Thread Aljoscha Krettek
Hi,

I would be in favour of removing Program (and the code paths that support it) 
for Flink 1.10. Most users of Flink don’t actually know it exists and it is 
only making our code more complicated. Going forward with the new Client API 
discussions will be a lot easier without it as well.

Best,
Aljoscha

> On 14. Aug 2019, at 11:08, Kostas Kloudas  wrote:
> 
> Hi all,
> 
> It is nice to have this discussion.
> 
> I am totally up for removing the unused Program interface, as this will
> simplify a lot of other code paths in the ClusterClient and elsewhere.
> 
> Also about the easier integration of Flink with other frameworks, there
> is another discussion in the mailing list with exactly this topic:
> [DISCUSS] Flink client api enhancement for downstream project
> 
> Cheers,
> Kostas
> 
> 
> On Tue, Jul 30, 2019 at 1:38 PM Zili Chen  wrote:
> 
>> Hi,
>> 
>> With a one-week survey in user list[1], nobody expect Flavio and Jeff
>> participant the thread.
>> 
>> Flavio shared his experience with a revised Program like interface.
>> This could be regraded as downstream integration and in client api
>> enhancements document we propose rich interface for this integration.
>> Anyway, the Flink scope Program is less functional than it should be.
>> 
>> With no objection I'd like to push on this thread. We need a committer
>> participant this thread to shepherd the removal/deprecation of Program, a
>> @PublicEvolving interface. Anybody has spare time? Or anything I can do
>> to make progress?
>> 
>> Best,
>> tison.
>> 
>> [1]
>> 
>> https://lists.apache.org/thread.html/37445e43729cf7eaeb0aa09133d3980b62f891c5ee69d2c3c3e76ab5@%3Cuser.flink.apache.org%3E
>> 
>> 
>> Zili Chen  于2019年7月22日周一 下午8:38写道:
>> 
>>> Hi,
>>> 
>>> I created a thread for survey in user list[1]. Please take participate in
>>> if interested.
>>> 
>>> Best,
>>> tison.
>>> 
>>> [1]
>>> 
>> https://lists.apache.org/thread.html/37445e43729cf7eaeb0aa09133d3980b62f891c5ee69d2c3c3e76ab5@%3Cuser.flink.apache.org%3E
>>> 
>>> 
>>> Flavio Pompermaier  于2019年7月19日周五 下午5:18写道:
>>> 
 +1 to remove directly the Program class (I think nobody use it and it's
 not
 supported at all by REST services and UI).
 Moreover it requires a lot of transitive dependencies while it should
>> be a
 very simple thing..
 +1 to add this discussion to "Flink client api enhancement"
 
 On Fri, Jul 19, 2019 at 11:14 AM Biao Liu  wrote:
 
> To Flavio, good point for the integration suggestion.
> 
> I think it should be considered in the "Flink client api enhancement"
> discussion. But the outdated API should be deprecated somehow.
> 
> Flavio Pompermaier  于2019年7月19日周五 下午4:21写道:
> 
>> In my experience a basic "official" (but optional) program
>> description
>> would be very useful indeed (in order to ease the integration with
 other
>> frameworks).
>> 
>> Of course it should be extended and integrated with the REST
>> services
 and
>> the Web UI (when defined) in order to be useful..
>> It ease to show to the user what a job does and which parameters it
>> requires (optional or mandatory) and with a proper help description.
>> Indeed, when we write a Flink job we implement the following
 interface:
>> 
>> public interface FlinkJob {
>>  String getDescription();
>>  List getParameters();
>> boolean isStreamingOrBatch();
>> }
>> 
>> public class ClusterJobParameter {
>> 
>>  private String paramName;
>>  private String paramType = "string";
>>  private String paramDesc;
>>  private String paramDefaultValue;
>>  private Set choices;
>>  private boolean mandatory;
>> }
>> 
>> This really helps to launch a Flink job by a frontend (if the rest
> services
>> returns back those infos).
>> 
>> Best,
>> Flavio
>> 
>> On Fri, Jul 19, 2019 at 9:57 AM Biao Liu 
>> wrote:
>> 
>>> Hi Zili,
>>> 
>>> Thank you for bring us this discussion.
>>> 
>>> My gut feeling is +1 for dropping it.
>>> Usually it costs some time to deprecate a public (actually it's
>>> `PublicEvolving`) API. Ideally it should be marked as `Deprecated`
> first.
>>> Then it might be abandoned it in some later version.
>>> 
>>> I'm not sure how big the burden is to make it compatible with the
>> enhanced
>>> client API. If it's a critical blocker, I support dropping it
 radically
>> in
>>> 1.10. Of course a survey is necessary. And the result of survey is
>>> acceptable.
>>> 
>>> 
>>> 
>>> Zili Chen  于2019年7月19日周五 下午1:44写道:
>>> 
 Hi devs,
 
 Participating the thread "Flink client api enhancement"[1], I
>> just
>> notice
 that inside submission codepath of Flink we always has a branch
> dealing
 with the case that main class of user program is a subclass of
 o.a.f.api.common.Progr

Re: [VOTE] Apache Flink Release 1.9.0, release candidate #2

2019-08-14 Thread Aljoscha Krettek
+1

I did some testing on a Google Cloud Dataproc cluster (it gives you a managed 
YARN and Google Cloud Storage (GCS)):
  - tried both YARN session mode and YARN per-job mode, also using bin/flink 
list/cancel/etc. against a YARN session cluster
  - ran examples that write to GCS, both with the native Hadoop FileSystem and 
a custom “plugin” FileSystem
  - ran stateful streaming jobs that use GCS as a checkpoint backend
  - tried running SQL programs on YARN using the SQL Cli: this worked for YARN 
session mode but not for YARN per-job mode. Looking at the code I don’t think 
per-job mode would work from seeing how it is implemented. But I think it’s an 
OK restriction to have for now
  - in all the testing I had fine-grained recovery (region failover) enabled 
but I didn’t simulate any failures

> On 14. Aug 2019, at 15:20, Kurt Young  wrote:
> 
> Hi,
> 
> Thanks for preparing this release candidate. I have verified the following:
> 
> - verified the checksums and GPG files match the corresponding release files
> - verified that the source archives do not contains any binaries
> - build the source release with Scala 2.11 successfully.
> - ran `mvn verify` locally, met 2 issuses [FLINK-13687] and [FLINK-13688],
> but
> both are not release blockers. Other than that, all tests are passed.
> - ran all e2e tests which don't need download external packages (it's very
> unstable
> in China and almost impossible to download them), all passed.
> - started local cluster, ran some examples. Met a small website display
> issue
> [FLINK-13591], which is also not a release blocker.
> 
> Although we have pushed some fixes around blink planner and hive
> integration
> after RC2, but consider these are both preview features, I'm lean to be ok
> to release
> without these fixes.
> 
> +1 from my side. (binding)
> 
> Best,
> Kurt
> 
> 
> On Wed, Aug 14, 2019 at 5:13 PM Jark Wu  wrote:
> 
>> Hi Gordon,
>> 
>> I have verified the following things:
>> 
>> - build the source release with Scala 2.12 and Scala 2.11 successfully
>> - checked/verified signatures and hashes
>> - checked that all POM files point to the same version
>> - ran some flink table related end-to-end tests locally and succeeded
>> (except TPC-H e2e failed which is reported in FLINK-13704)
>> - started cluster for both Scala 2.11 and 2.12, ran examples, verified web
>> ui and log output, nothing unexpected
>> - started cluster, ran a SQL query to temporal join with kafka source and
>> mysql jdbc table, and write results to kafka again. Using DDL to create the
>> source and sinks. looks good.
>> - reviewed the release PR
>> 
>> As FLINK-13704 is not recognized as blocker issue, so +1 from my side
>> (non-binding).
>> 
>> On Tue, 13 Aug 2019 at 17:07, Till Rohrmann  wrote:
>> 
>>> Hi Richard,
>>> 
>>> although I can see that it would be handy for users who have PubSub set
>> up,
>>> I would rather not include examples which require an external dependency
>>> into the Flink distribution. I think examples should be self-contained.
>> My
>>> concern is that we would bloat the distribution for many users at the
>>> benefit of a few. Instead, I think it would be better to make these
>>> examples available differently, maybe through Flink's ecosystem website
>> or
>>> maybe a new examples section in Flink's documentation.
>>> 
>>> Cheers,
>>> Till
>>> 
>>> On Tue, Aug 13, 2019 at 9:43 AM Jark Wu  wrote:
>>> 
 Hi Till,
 
 After thinking about we can use VARCHAR as an alternative of
 timestamp/time/date.
 I'm fine with not recognize it as a blocker issue.
 We can fix it into 1.9.1.
 
 
 Thanks,
 Jark
 
 
 On Tue, 13 Aug 2019 at 15:10, Richard Deurwaarder 
>>> wrote:
 
> Hello all,
> 
> I noticed the PubSub example jar is not included in the examples/ dir
>>> of
> flink-dist. I've created
 https://issues.apache.org/jira/browse/FLINK-13700
> + https://github.com/apache/flink/pull/9424/files to fix this.
> 
> I will leave it up to you to decide if we want to add this to 1.9.0.
> 
> Regards,
> 
> Richard
> 
> On Tue, Aug 13, 2019 at 9:04 AM Till Rohrmann 
> wrote:
> 
>> Hi Jark,
>> 
>> thanks for reporting this issue. Could this be a documented
>>> limitation
 of
>> Blink's preview version? I think we have agreed that the Blink SQL
> planner
>> will be rather a preview feature than production ready. Hence it
>>> could
>> still contain some bugs. My concern is that there might be still
>>> other
>> issues which we'll discover bit by bit and could postpone the
>> release
> even
>> further if we say Blink bugs are blockers.
>> 
>> Cheers,
>> Till
>> 
>> On Tue, Aug 13, 2019 at 7:42 AM Jark Wu  wrote:
>> 
>>> Hi all,
>>> 
>>> I just find an issue when testing connector DDLs against blink
 planner
>> for
>>> rc2.
>>> This issue lead to the DDL doesn't work when con

Re: [DISCUSS] FLIP-52: Remove legacy Program interface.

2019-08-14 Thread Aljoscha Krettek
+1 (for the same reasons I posted on the other thread)

> On 14. Aug 2019, at 15:03, Zili Chen  wrote:
> 
> +1
> 
> It could be regarded as part of Flink client api refactor.
> Removal of stale code paths helps reason refactor.
> 
> There is one thing worth attention that in this thread[1] Thomas
> suggests an interface with a method return JobGraph based on the
> fact that REST API and in per job mode actually extracts the JobGraph
> from user program and submit it instead of running user program and
> submission happens inside the program in session scenario.
> 
> Such an interface would be like
> 
> interface Program {
>  JobGraph getJobGraph(args);
> }
> 
> Anyway, the discussion above could be continued in that thread.
> Current Program is a legacy class that quite less useful than it should be.
> 
> Best,
> tison.
> 
> [1]
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/REST-API-JarRunHandler-More-flexibility-for-launching-jobs-td31026.html#a31168
> 
> 
> Stephan Ewen  于2019年8月14日周三 下午7:50写道:
> 
>> +1
>> 
>> the "main" method is the overwhelming default. getting rid of "two ways to
>> do things" is a good idea.
>> 
>> On Wed, Aug 14, 2019 at 1:42 PM Kostas Kloudas  wrote:
>> 
>>> Hi all,
>>> 
>>> As discussed in [1] , the Program interface seems to be outdated and
>>> there seems to be
>>> no objection to remove it.
>>> 
>>> Given that this interface is PublicEvolving, its removal should pass
>>> through a FLIP and
>>> this discussion and the associated FLIP are exactly for that purpose.
>>> 
>>> Please let me know what you think and if it is ok to proceed with its
>>> removal.
>>> 
>>> Cheers,
>>> Kostas
>>> 
>>> link to FLIP-52 :
>>> 
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=125308637
>>> 
>>> [1]
>>> 
>> https://lists.apache.org/x/thread.html/7ffc9936a384b891dbcf0a481d26c6d13b2125607c200577780d1e18@%3Cdev.flink.apache.org%3E
>>> 
>> 



Re: [VOTE] FLIP-51: Rework of the Expression Design

2019-08-16 Thread Aljoscha Krettek
+1

This seems to be a good refactoring/cleanup step to me!

> On 16. Aug 2019, at 10:59, Dawid Wysakowicz  wrote:
> 
> +1 from my side
> 
> Best,
> 
> Dawid
> 
> On 16/08/2019 10:31, Jark Wu wrote:
>> +1 from my side.
>> 
>> Thanks Jingsong for driving this.
>> 
>> Best,
>> Jark
>> 
>> On Thu, 15 Aug 2019 at 22:09, Timo Walther  wrote:
>> 
>>> +1 for this.
>>> 
>>> Thanks,
>>> Timo
>>> 
>>> Am 15.08.19 um 15:57 schrieb JingsongLee:
 Hi Flink devs,
 
 I would like to start the voting for FLIP-51 Rework of the Expression
  Design.
 
 FLIP wiki:
 
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
 Discussion thread:
 
>>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-51-Rework-of-the-Expression-Design-td31653.html
 Google Doc:
 
>>> https://docs.google.com/document/d/1yFDyquMo_-VZ59vyhaMshpPtg7p87b9IYdAtMXv5XmM/edit?usp=sharing
 Thanks,
 
 Best,
 Jingsong Lee
>>> 
>>> 
> 



Re: [DISCUSS] Reducing build times

2019-08-16 Thread Aljoscha Krettek
Speaking of flink-shaded, do we have any idea what the impact of shading is on 
the build time? We could get rid of shading completely in the Flink main 
repository by moving everything that we shade to flink-shaded.

Aljoscha

> On 16. Aug 2019, at 14:58, Bowen Li  wrote:
> 
> +1 to Till's points on #2 and #5, especially the potential non-disruptive,
> gradual migration approach if we decide to go that route.
> 
> To add on, I want to point it out that we can actually start with
> flink-shaded project [1] which is a perfect candidate for PoC. It's of much
> smaller size, totally isolated from and not interfered with flink project
> [2], and it actually covers most of our practical feature requirements for
> a build tool - all making it an ideal experimental field.
> 
> [1] https://github.com/apache/flink-shaded
> [2] https://github.com/apache/flink
> 
> 
> On Fri, Aug 16, 2019 at 4:52 AM Till Rohrmann  wrote:
> 
>> For the sake of keeping the discussion focused and not cluttering the
>> discussion thread I would suggest to split the detailed reporting for
>> reusing JVMs to a separate thread and cross linking it from here.
>> 
>> Cheers,
>> Till
>> 
>> On Fri, Aug 16, 2019 at 1:36 PM Chesnay Schepler 
>> wrote:
>> 
>>> Update:
>>> 
>>> TL;DR: table-planner is a good candidate for enabling fork reuse right
>>> away, while flink-tests has the potential for huge savings, but we have
>>> to figure out some issues first.
>>> 
>>> 
>>> Build link: https://travis-ci.org/zentol/flink/builds/572659220
>>> 
>>> 4/8 profiles failed.
>>> 
>>> No speedup in libraries, python, blink_planner, 7 minutes saved in
>>> libraries (table-planner).
>>> 
>>> The kafka and connectors profiles both fail in kafka tests due to
>>> producer leaks, and no speed up could be confirmed so far:
>>> 
>>> java.lang.AssertionError: Detected producer leak. Thread name:
>>> kafka-producer-network-thread | producer-239
>>>at org.junit.Assert.fail(Assert.java:88)
>>>at
>>> 
>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011ITCase.checkProducerLeak(FlinkKafkaProducer011ITCase.java:677)
>>>at
>>> 
>> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer011ITCase.testFlinkKafkaProducer011FailBeforeNotify(FlinkKafkaProducer011ITCase.java:210)
>>> 
>>> 
>>> The tests profile failed due to various errors in migration tests:
>>> 
>>> junit.framework.AssertionFailedError: Did not see the expected
>> accumulator
>>> results within time limit.
>>>at
>>> 
>> org.apache.flink.test.migration.TypeSerializerSnapshotMigrationITCase.testSavepoint(TypeSerializerSnapshotMigrationITCase.java:141)
>>> 
>>> *However*, a normal tests run takes 40 minutes, while this one above
>>> failed after 19 minutes and is only missing the migration tests (which
>>> currently need 6-7 minutes). So we could save somewhere between 15 to 20
>>> minutes here.
>>> 
>>> 
>>> Finally, the misc profiles fails in YARN:
>>> 
>>> java.lang.AssertionError
>>>at org.apache.flink.yarn.YARNITCase.setup(YARNITCase.java:64)
>>> 
>>> No significant speedup could be observed in other modules; for
>>> flink-yarn-tests we can maybe get a minute or 2 out of it.
>>> 
>>> On 16/08/2019 10:43, Chesnay Schepler wrote:
 There appears to be a general agreement that 1) should be looked into;
 I've setup a branch with fork reuse being enabled for all tests; will
 report back the results.
 
 On 15/08/2019 09:38, Chesnay Schepler wrote:
> Hello everyone,
> 
> improving our build times is a hot topic at the moment so let's
> discuss the different ways how they could be reduced.
> 
> 
>   Current state:
> 
> First up, let's look at some numbers:
> 
> 1 full build currently consumes 5h of build time total ("total
> time"), and in the ideal case takes about 1h20m ("run time") to
> complete from start to finish. The run time may fluctuate of course
> depending on the current Travis load. This applies both to builds on
> the Apache and flink-ci Travis.
> 
> At the time of writing, the current queue time for PR jobs (reminder:
> running on flink-ci) is about 30 minutes (which basically means that
> we are processing builds at the rate that they come in), however we
> are in an admittedly quiet period right now.
> 2 weeks ago the queue times on flink-ci peaked at around 5-6h as
> everyone was scrambling to get their changes merged in time for the
> feature freeze.
> 
> (Note: Recently optimizations where added to ci-bot where pending
> builds are canceled if a new commit was pushed to the PR or the PR
> was closed, which should prove especially useful during the rush
> hours we see before feature-freezes.)
> 
> 
>   Past approaches
> 
> Over the years we have done rather few things to improve this
> situation (hence our current predicament).
> 
> Beyond the sporadic speedup of some te

Re: [DISCUSS] Flink client api enhancement for downstream project

2019-08-16 Thread Aljoscha Krettek
>>>>>> Actually, it sets up an exec env that hijacks the
>>>>>> #execute/executePlan
>>>>>>>>> method, initializes the job graph and abort execution. And
>> then
>>>>>>>>> control flow back to CliFrontend, it deploys the cluster(or
>>>>> retrieve
>>>>>>>>> the client) and submits the job graph. This is quite a
>> specific
>>>>>>> internal
>>>>>>>>> process inside Flink and none of consistency to anything.
>>>>>>>>> 
>>>>>>>>> 2) Deployment of job cluster couples job graph creation and
>>>> cluster
>>>>>>>>> deployment. Abstractly, from user job to a concrete
>> submission,
>>>> it
>>>>>>>> requires
>>>>>>>>> 
>>>>>>>>> create JobGraph --\
>>>>>>>>> 
>>>>>>>>> create ClusterClient -->  submit JobGraph
>>>>>>>>> 
>>>>>>>>> such a dependency. ClusterClient was created by deploying or
>>>>>>> retrieving.
>>>>>>>>> JobGraph submission requires a compiled JobGraph and valid
>>>>>>> ClusterClient,
>>>>>>>>> but the creation of ClusterClient is abstractly independent
>> of
>>>> that
>>>>>> of
>>>>>>>>> JobGraph. However, in job cluster mode, we deploy job cluster
>>>> with
>>>>> a
>>>>>>> job
>>>>>>>>> graph, which means we use another process:
>>>>>>>>> 
>>>>>>>>> create JobGraph --> deploy cluster with the JobGraph
>>>>>>>>> 
>>>>>>>>> Here is another inconsistency and downstream projects/client
>>> apis
>>>>> are
>>>>>>>>> forced to handle different cases with rare supports from
>> Flink.
>>>>>>>>> 
>>>>>>>>> Since we likely reached a consensus on
>>>>>>>>> 
>>>>>>>>> 1. all configs gathered by Flink configuration and passed
>>>>>>>>> 2. execution environment knows all configs and handles
>>>>> execution(both
>>>>>>>>> deployment and submission)
>>>>>>>>> 
>>>>>>>>> to the issues above I propose eliminating inconsistencies by
>>>>>> following
>>>>>>>>> approach:
>>>>>>>>> 
>>>>>>>>> 1) CliFrontend should exactly be a front end, at least for
>>> "run"
>>>>>>> command.
>>>>>>>>> That means it just gathered and passed all config from
>> command
>>>> line
>>>>>> to
>>>>>>>>> the main method of user program. Execution environment knows
>>> all
>>>>> the
>>>>>>> info
>>>>>>>>> and with an addition to utils for ClusterClient, we
>> gracefully
>>>> get
>>>>> a
>>>>>>>>> ClusterClient by deploying or retrieving. In this way, we
>> don't
>>>>> need
>>>>>> to
>>>>>>>>> hijack #execute/executePlan methods and can remove various
>>>> hacking
>>>>>>>>> subclasses of exec env, as well as #run methods in
>>>>> ClusterClient(for
>>>>>> an
>>>>>>>>> interface-ized ClusterClient). Now the control flow flows
>> from
>>>>>>>> CliFrontend
>>>>>>>>> to the main method and never returns.
>>>>>>>>> 
>>>>>>>>> 2) Job cluster means a cluster for the specific job. From
>>> another
>>>>>>>>> perspective, it is an ephemeral session. We may decouple the
>>>>>> deployment
>>>>>>>>> with a compiled job graph, but start a session with idle
>>> timeout
>>>>>>>>> and submit the job following.
>>>>>>>>> 
>>>>>>>>> These topics, before we go into more details on design or
>>>>>>> implementation,
>>>>>>>>&

Re: [DISCUSS] Reducing build times

2019-08-19 Thread Aljoscha Krettek
I did a quick test: a normal "mvn clean install -DskipTests -Drat.skip=true 
-Dmaven.javadoc.skip=true -Punsafe-mapr-repo” on my machine takes about 14 
minutes. After removing all mentions of maven-shade-plugin the build time goes 
down to roughly 11.5 minutes. (Obviously the resulting Flink won’t work, 
because some expected stuff is not packaged and most of the end-to-end tests 
use the shade plugin to package the jars for testing.

Aljoscha

> On 18. Aug 2019, at 19:52, Robert Metzger  wrote:
> 
> Hi all,
> 
> I wanted to understand the impact of the hardware we are using for running
> our tests. Each travis worker has 2 virtual cores, and 7.5 gb memory [1].
> They are using Google Cloud Compute Engine *n1-standard-2* instances.
> Running a full "mvn clean verify" takes *03:32 h* on such a machine type.
> 
> Running the same workload on a 32 virtual cores, 64 gb machine, takes *1:21
> h*.
> 
> What is interesting are the per-module build time differences.
> Modules which are parallelizing tests well greatly benefit from the
> additional cores:
> "flink-tests" 36:51 min vs 4:33 min
> "flink-runtime" 23:41 min vs 3:47 min
> "flink-table-planner" 15:54 min vs 3:13 min
> 
> On the other hand, we have modules which are not parallel at all:
> "flink-connector-kafka": 16:32 min vs 15:19 min
> "flink-connector-kafka-0.11": 9:52 min vs 7:46 min
> Also, the checkstyle plugin is not scaling at all.
> 
> Chesnay reported some significant speedups by reusing forks.
> I don't know how much effort it would be to make the Kafka tests
> parallelizable. In total, they currently use 30 minutes on the big machine
> (while 31 CPUs are idling :) )
> 
> Let me know what you think about these results. If the community is
> generally interested in further investigating into that direction, I could
> look into software to orchestrate this, as well as sponsors for such an
> infrastructure.
> 
> [1] https://docs.travis-ci.com/user/reference/overview/
> 
> 
> On Fri, Aug 16, 2019 at 3:27 PM Chesnay Schepler  wrote:
> 
>> @Aljoscha Shading takes a few minutes for a full build; you can see this
>> quite easily by looking at the compile step in the misc profile
>> <https://api.travis-ci.org/v3/job/572560060/log.txt>; all modules that
>> longer than a fraction of a section are usually caused by shading lots
>> of classes. Note that I cannot tell you how much of this is spent on
>> relocations, and how much on writing the jar.
>> 
>> Personally, I'd very much like us to move all shading to flink-shaded;
>> this would finally allows us to use newer maven versions without needing
>> cumbersome workarounds for flink-dist. However, this isn't a trivial
>> affair in some cases; IIRC calcite could be difficult to handle.
>> 
>> On another note, this would also simplify switching the main repo to
>> another build system, since you would no longer had to deal with
>> relocations, just packaging + merging NOTICE files.
>> 
>> @BowenLi I disagree, flink-shaded does not include any tests,  API
>> compatibility checks, checkstyle, layered shading (e.g., flink-runtime
>> and flink-dist, where both relocate dependencies and one is bundled by
>> the other), and, most importantly, CI (and really, without CI being
>> covered in a PoC there's nothing to discuss).
>> 
>> On 16/08/2019 15:13, Aljoscha Krettek wrote:
>>> Speaking of flink-shaded, do we have any idea what the impact of shading
>> is on the build time? We could get rid of shading completely in the Flink
>> main repository by moving everything that we shade to flink-shaded.
>>> 
>>> Aljoscha
>>> 
>>>> On 16. Aug 2019, at 14:58, Bowen Li  wrote:
>>>> 
>>>> +1 to Till's points on #2 and #5, especially the potential
>> non-disruptive,
>>>> gradual migration approach if we decide to go that route.
>>>> 
>>>> To add on, I want to point it out that we can actually start with
>>>> flink-shaded project [1] which is a perfect candidate for PoC. It's of
>> much
>>>> smaller size, totally isolated from and not interfered with flink
>> project
>>>> [2], and it actually covers most of our practical feature requirements
>> for
>>>> a build tool - all making it an ideal experimental field.
>>>> 
>>>> [1] https://github.com/apache/flink-shaded
>>>> [2] https://github.com/apache/flink
>>>> 
>>>> 
>>>> On Fri, Aug 16, 2019 at 4:52 AM Till Rohrmann 
>> wrote:
>>>> 
>>&g

Re: [VOTE] Apache Flink 1.9.0, release candidate #3

2019-08-21 Thread Aljoscha Krettek
+1

I checked the last RC on a GCE cluster and was satisfied with the testing. The 
cherry-picked commits didn’t change anything related, so I’m forwarding my vote 
from there.

Aljoscha

> On 21. Aug 2019, at 13:34, Chesnay Schepler  wrote:
> 
> +1 (binding)
> 
> On 21/08/2019 08:09, Bowen Li wrote:
>> +1 non-binding
>> 
>> - built from source with default profile
>> - manually ran SQL and Table API tests for Flink's metadata integration
>> with Hive Metastore in local cluster
>> - manually ran SQL tests for batch capability with Blink planner and Hive
>> integration (source/sink/udf) in local cluster
>> - file formats include: csv, orc, parquet
>> 
>> 
>> On Tue, Aug 20, 2019 at 10:23 PM Gary Yao  wrote:
>> 
>>> +1 (non-binding)
>>> 
>>> Reran Jepsen tests 10 times.
>>> 
>>> On Wed, Aug 21, 2019 at 5:35 AM vino yang  wrote:
>>> 
 +1 (non-binding)
 
 - checkout source code and build successfully
 - started a local cluster and ran some example jobs successfully
 - verified signatures and hashes
 - checked release notes and post
 
 Best,
 Vino
 
 Stephan Ewen  于2019年8月21日周三 上午4:20写道:
 
> +1 (binding)
> 
>  - Downloaded the binary release tarball
>  - started a standalone cluster with four nodes
>  - ran some examples through the Web UI
>  - checked the logs
>  - created a project from the Java quickstarts maven archetype
>  - ran a multi-stage DataSet job in batch mode
>  - killed as TaskManager and verified correct restart behavior,
>>> including
> failover region backtracking
> 
> 
> I found a few issues, and a common theme here is confusing error
 reporting
> and logging.
> 
> (1) When testing batch failover and killing a TaskManager, the job
 reports
> as the failure cause "org.apache.flink.util.FlinkException: The
>>> assigned
> slot 6d0e469d55a2630871f43ad0f89c786c_0 was removed."
> I think that is a pretty bad error message, as a user I don't know
 what
> that means. Some internal book keeping thing?
> You need to know a lot about Flink to understand that this means
> "TaskManager failure".
> https://issues.apache.org/jira/browse/FLINK-13805
> I would not block the release on this, but think this should get
 pretty
> urgent attention.
> 
> (2) The Metric Fetcher floods the log with error messages when a
> TaskManager is lost.
>  There are many exceptions being logged by the Metrics Fetcher due
>>> to
> not reaching the TM any more.
>  This pollutes the log and drowns out the original exception and
>>> the
> meaningful logs from the scheduler/execution graph.
>  https://issues.apache.org/jira/browse/FLINK-13806
>  Again, I would not block the release on this, but think this
>>> should
> get pretty urgent attention.
> 
> (3) If you put "web.submit.enable: false" into the configuration, the
>>> web
> UI will still display the "SubmitJob" page, but errors will
> continuously pop up, stating "Unable to load requested file /jars."
> https://issues.apache.org/jira/browse/FLINK-13799
> 
> (4) REST endpoint logs ERROR level messages when selecting the
> "Checkpoints" tab for batch jobs. That does not seem correct.
>  https://issues.apache.org/jira/browse/FLINK-13795
> 
> Best,
> Stephan
> 
> 
> 
> 
> On Tue, Aug 20, 2019 at 11:32 AM Tzu-Li (Gordon) Tai <
 tzuli...@apache.org>
> wrote:
> 
>> +1
>> 
>> Legal checks:
>> - verified signatures and hashes
>> - New bundled Javascript dependencies for flink-runtime-web are
 correctly
>> reflected under licenses-binary and NOTICE file.
>> - locally built from source (Scala 2.12, without Hadoop)
>> - No missing artifacts in staging repo
>> - No binaries in source release
>> 
>> Functional checks:
>> - Quickstart working (both in IDE + job submission)
>> - Simple State Processor API program that performs offline key schema
>> migration (RocksDB backend). Generated savepoint is valid to restore
> from.
>> - All E2E tests pass locally
>> - Didn’t notice any issues with the new WebUI
>> 
>> Cheers,
>> Gordon
>> 
>> On Tue, Aug 20, 2019 at 3:53 AM Zili Chen 
 wrote:
>>> +1 (non-binding)
>>> 
>>> - build from source: OK(8u212)
>>> - check local setup tutorial works as expected
>>> 
>>> Best,
>>> tison.
>>> 
>>> 
>>> Yu Li  于2019年8月20日周二 上午8:24写道:
>>> 
 +1 (non-binding)
 
 - checked release notes: OK
 - checked sums and signatures: OK
 - repository appears to contain all expected artifacts
 - source release
  - contains no binaries: OK
  - contains no 1.9-SNAPSHOT references: OK
  - build from source: OK (8u102)
 - binary 

Re: [VOTE] FLIP-52: Remove legacy Program interface.

2019-08-21 Thread Aljoscha Krettek
+1

> On 21. Aug 2019, at 13:30, Chesnay Schepler  wrote:
> 
> +1
> 
> On 21/08/2019 13:23, Timo Walther wrote:
>> +1
>> 
>> Am 21.08.19 um 13:21 schrieb Stephan Ewen:
>>> +1
>>> 
>>> On Wed, Aug 21, 2019 at 1:07 PM Kostas Kloudas  wrote:
>>> 
 Hi all,
 
 Following the FLIP process, this is a voting thread dedicated to the
 FLIP-52.
 As shown from the corresponding discussion thread [1], we seem to agree
 that
 the Program interface can be removed, so let's make it also official
 with a vote.
 
 Cheers,
 Kostas
 
 
 [1]
 https://lists.apache.org/thread.html/0dbd0a4adf9ad00d6ad869dffc8820f6ce4c1969e1ea4aafb1dd0aa4@%3Cdev.flink.apache.org%3E
  
 
>> 
>> 
> 



Re: [VOTE] Release flink-shaded 8.0, release candidate #1

2019-08-28 Thread Aljoscha Krettek
+1 (binding)

 - I verified the signature and checksum
 - I eyeballed the list of resolved issues
 - I checked the maven central artifices

Aljoscha

> On 23. Aug 2019, at 21:05, Chesnay Schepler  wrote:
> 
> Hi everyone,
> Please review and vote on the release candidate #1 for the version 8.0, as 
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
> 
> 
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org [2], 
> which are signed with the key with fingerprint 11d464BA [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-8.0-rc1" [5],
> * website pull request listing the new release [6].
> 
> The vote will be open for at least 72 hours. It is adopted by majority 
> approval, with at least 3 PMC affirmative votes.
> 
> Thanks,
> Chesnay
> 
> [1] 
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345488
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-shaded-8.0-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1237
> [5] https://github.com/apache/flink-shaded/tree/release-8.0-rc1
> [6] https://github.com/apache/flink-web/pull/255
> 



Re: [DISCUSS] FLIP-55: Introduction of a Table API Java Expression DSL

2019-08-29 Thread Aljoscha Krettek
Overall, this is a very nice development that should also simplify the code 
base once we deprecate the expression parser!

Regarding method names, I agree with Seth that values/literals should use 
something like “lit()”. I also think that for column references we could use 
“col()” to make it clear that it is a column reference. What do you think?

Aljoscha

> On 28. Aug 2019, at 15:59, Seth Wiesman  wrote:
> 
> I would prefer ‘lit()’ over  ‘val()’ since val is a keyword in Scala. 
> Assuming the intention is to make the dsl ergonomic for Scala developers.
> 
> Seth 
> 
>> On Aug 28, 2019, at 7:58 AM, Timo Walther  wrote:
>> 
>> Hi David,
>> 
>> thanks for your feedback. I was also skeptical about 1 char method names, I 
>> restored the `val()` method for now. If you read literature such as 
>> Wikipedia [1]: "literal is a notation for representing a fixed value in 
>> source code. Almost all programming languages have notations for atomic 
>> values". So they are also talking about "values".
>> 
>> Alteratively we could use `lit(12)` or `l(12)` but I'm not convinced that 
>> this is better.
>> 
>> Regards,
>> Timo
>> 
>> [1] https://en.wikipedia.org/wiki/Literal_(computer_programming)
>> 
>>> On 27.08.19 22:10, David Anderson wrote:
>>> TImo,
>>> 
>>> While it's not exactly pretty, I don't mind the $("field") construct.
>>> It's not particularly surprising. The v() method troubles me more; it
>>> looks mysterious. I think we would do better to have something more
>>> explicit. val() isn't much better -- val("foo") could be interpreted
>>> to mean the value of the "foo" column, or a literal string.
>>> 
>>> David
>>> 
 On Tue, Aug 27, 2019 at 5:45 PM Timo Walther  wrote:
 Hi David,
 
 thanks for your feedback. With the current design, the DSL would be free
 of any ambiguity but it is definitely more verbose esp. around defining
 values.
 
 I would be happy about further suggestions that make the DSL more
 readable. I'm also not sure if we go for `$()` and `v()` instead of more
 readable `ref()` and `val()`. This could maybe make it look less
 "alien", what do you think?
 
 Some people mentioned to overload certain methods for accepting values
 or column names. E.g. `$("field").isEqual("str")` but then string values
 could be confused with column names.
 
 Thanks,
 Timo
 
> On 27.08.19 17:34, David Anderson wrote:
> In general I'm in favor of anything that is going to make the Table
> API easier to learn and more predictable in its behavior. This
> proposal kind of falls in the middle. As someone who has spent hours
> in the crevices between the various flavors of the current
> implementations, I certainly view keeping the various APIs and DSLs
> more in sync, and making them less buggy, as highly desirable.
> 
> On the other hand, some of the details in the proposal do make the
> resulting user code less pretty and less approachable than the current
> Java DSL. In a training context it will be easy to teach, but I wonder
> if we can find a way to make it look less alien at first glance.
> 
> David
> 
>> On Wed, Aug 21, 2019 at 1:33 PM Timo Walther  wrote:
>> Hi everyone,
>> 
>> some of you might remember the discussion I started end of March [1]
>> about introducing a new Java DSL for Table API that is not embedded in a
>> string.
>> 
>> In particular, it solves the following issues:
>> 
>> - No possibility of deprecating functions
>> 
>> - Missing documentation for users
>> 
>> - Missing auto-completion for users
>> 
>> - Need to port the ExpressionParser from Scala to Java
>> 
>> - Scala symbols are deprecated! A Java DSL can also enable the Scala DSL
>> one.
>> 
>> Due to shift of priorities, we could not work on it in Flink 1.9 but the
>> feedback at that time was positive and we should aim for 1.10 to
>> simplify the API with this change.
>> 
>> We propose the following FLIP-55:
>> 
>> https://docs.google.com/document/d/1CfaaD3j8APJDKwzIT4YsX7QD2huKTB4xlA3vnMUFJmA/edit?usp=sharing
>> 
>> 
>> Thanks for any feedback,
>> 
>> Timo
>> 
>> [1]
>> https://lists.apache.org/thread.html/e6f31d7fa53890b91be0991c2da64556a91ef0fc9ab3ffa889dacc23@%3Cdev.flink.apache.org%3E
>> 
>> 



Re: ClassLoader created by BlobLibraryCacheManager is not using context classloader

2019-09-02 Thread Aljoscha Krettek
Hi,

I actually don’t know whether that change would be ok. FlinkUserCodeClassLoader 
has taken FlinkUserCodeClassLoader.class.getClassLoader() as the parent 
ClassLoader before my change. See: 
https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java
 
.

I have the feeling that this might be on purpose because we want to have the 
ClassLoader of the Flink Framework components be the parent ClassLoader, but I 
could be wrong. Maybe Stephan would be most appropriate for answering this.

Best,
Aljoscha

> On 30. Aug 2019, at 16:28, Till Rohrmann  wrote:
> 
> Hi Jan,
> 
> this looks to me like a bug for which you could create a JIRA and PR to fix 
> it. Just to make sure, I've pulled in Aljoscha who is the author of this 
> change to check with him whether we are forgetting something.
> 
> Cheers,
> Till
> 
> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský  > wrote:
> Hi,
> 
> I have come across an issue with classloading in Flink's MiniCluster. 
> The issue is that when I run local flink job from a thread, that has a 
> non-default context classloader (for whatever reason), this classloader 
> is not taken into account when classloading user defined functions. This 
> is due to [1]. Is this behavior intentional, or can I file a JIRA and 
> use Thread.currentThread.getContextClassLoader() there? I have validated 
> that it fixes issues I'm facing.
> 
> Jan
> 
> [1] 
> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280
>  
> 
> 



Re: [DISCUSS] Releasing Flink 1.8.2

2019-09-02 Thread Aljoscha Krettek
I cut a PR for FLINK-13586: https://github.com/apache/flink/pull/9595 


> On 2. Sep 2019, at 05:03, Yu Li  wrote:
> 
> +1 for a 1.8.2 release, thanks for bringing this up Jincheng!
> 
> Best Regards,
> Yu
> 
> 
> On Mon, 2 Sep 2019 at 09:19, Thomas Weise  wrote:
> 
>> +1 for the 1.8.2 release
>> 
>> I marked https://issues.apache.org/jira/browse/FLINK-13586 for this
>> release. It would be good to compensate for the backward incompatible
>> change to ClosureCleaner that was introduced in 1.8.1, which affects
>> downstream dependencies.
>> 
>> Thanks,
>> Thomas
>> 
>> 
>> On Sun, Sep 1, 2019 at 5:10 PM jincheng sun 
>> wrote:
>> 
>>> Hi Jark,
>>> 
>>> Glad to hear that you want to be the Release Manager of flink 1.8.2.
>>> I believe that you will be a great RM, and I am very willing to help you
>>> with the final release in the final stages. :)
>>> 
>>> The release of Apache Flink involves a number of tasks. For details, you
>>> can consult the documentation [1]. If you have any questions, please let
>> me
>>> know and let us work together.
>>> 
>>> 
>>> 
>> https://cwiki.apache.org/confluence/display/FLINK/Creating+a+Flink+Release#CreatingaFlinkRelease-Checklisttoproceedtothenextstep.1
>>> 
>>> Cheers,
>>> Jincheng
>>> 
>>> Till Rohrmann  于2019年8月31日周六 上午12:59写道:
>>> 
 +1 for a 1.8.2 bug fix release. Thanks for kicking this discussion off
 Jincheng.
 
 Cheers,
 Till
 
 On Fri, Aug 30, 2019 at 6:45 PM Jark Wu  wrote:
 
> Thanks Jincheng for bringing this up.
> 
> +1 to the 1.8.2 release, because it already contains a couple of
 important
> fixes and it has been a long time since 1.8.1 came out.
> I'm willing to help the community as much as possible. I'm wondering
>>> if I
> can be the release manager of 1.8.2 or work with you together
>>> @Jincheng?
> 
> Best,
> Jark
> 
> On Fri, 30 Aug 2019 at 18:58, Hequn Cheng 
>>> wrote:
> 
>> Hi Jincheng,
>> 
>> +1 for a 1.8.2 release.
>> Thanks a lot for raising the discussion. It would be nice to have
>>> these
>> critical fixes.
>> 
>> Best, Hequn
>> 
>> 
>> On Fri, Aug 30, 2019 at 6:31 PM Maximilian Michels >> 
> wrote:
>> 
>>> Hi Jincheng,
>>> 
>>> +1 I would be for a 1.8.2 release such that we can fix the
>> problems
> with
>>> the nested closure cleaner which currently block 1.8.1 users with
 Beam:
>>> https://issues.apache.org/jira/browse/FLINK-13367
>>> 
>>> Thanks,
>>> Max
>>> 
>>> On 30.08.19 11:25, jincheng sun wrote:
 Hi Flink devs,
 
 It has been nearly 2 months since the 1.8.1 released. So, what
>> do
 you
>>> think
 about releasing Flink 1.8.2 soon?
 
 We already have some blocker and critical fixes in the
>>> release-1.8
>>> branch:
 
 [Blocker]
 - FLINK-13159 java.lang.ClassNotFoundException when restore job
 - FLINK-10368 'Kerberized YARN on Docker test' unstable
 - FLINK-12578 Use secure URLs for Maven repositories
 
 [Critical]
 - FLINK-12736 ResourceManager may release TM with allocated
>> slots
 - FLINK-12889 Job keeps in FAILING state
 - FLINK-13484 ConnectedComponents end-to-end test instable with
 NoResourceAvailableException
 - FLINK-13508 CommonTestUtils#waitUntilCondition() may attempt
>> to
> sleep
 with negative time
 - FLINK-13806 Metric Fetcher floods the JM log with errors when
>>> TM
 is
>>> lost
 
 Furthermore, I think the following one blocker issue should be
 merged
 before 1.8.2 release.
 
 - FLINK-13897: OSS FS NOTICE file is placed in wrong directory
 
 It would also be great if we can have the fix of
>> Elasticsearch6.x
>>> connector
 threads leaking (FLINK-13689) in 1.8.2 release which is
>>> identified
 as
>>> major.
 
 Please let me know what you think?
 
 Cheers,
 Jincheng
 
>>> 
>> 
> 
 
>>> 
>> 



Re: ClassLoader created by BlobLibraryCacheManager is not using context classloader

2019-09-02 Thread Aljoscha Krettek
I’m not saying we can’t change that code to use the context class loader. I’m 
just not sure whether this might break other things.

Best,
Aljoscha

> On 2. Sep 2019, at 11:24, Jan Lukavský  wrote:
> 
> Essentially, the class loader of Flink should be present in parent hierarchy 
> of context class loader. If FlinkUserCodeClassLoader doesn't use context 
> class loader, then it is actually impossible to use a hierarchy like this:
> 
>  system class loader -> application class loader -> user-defined class loader 
> (defines some UDFs to be used in flink program)
> 
> Flink now uses the application class loader, and so the UDFs fail to 
> deserialize on local flink, because the user-defined class loader is 
> bypassed. Moreover, there is no way to add additional classpath elements for 
> LocalEnvironment (as opposed to RemoteEnvironment). I'm able to hack this by 
> calling addURL method on the application class loader (which is terribly 
> hackish), but that works only on JDK <= 8. No sensible workaround is 
> available for JDK >= 9.
> 
> Alternative solution would be to enable adding jars to class loader when 
> using LocalEnvironment, but that looks a little odd.
> 
> Jan
> 
> On 9/2/19 11:02 AM, Aljoscha Krettek wrote:
>> Hi,
>> 
>> I actually don’t know whether that change would be ok. 
>> FlinkUserCodeClassLoader has taken 
>> FlinkUserCodeClassLoader.class.getClassLoader() as the parent ClassLoader 
>> before my change. See: 
>> https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java
>>  
>> <https://github.com/apache/flink/blob/release-1.2/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/FlinkUserCodeClassLoader.java>.
>> 
>> I have the feeling that this might be on purpose because we want to have the 
>> ClassLoader of the Flink Framework components be the parent ClassLoader, but 
>> I could be wrong. Maybe Stephan would be most appropriate for answering this.
>> 
>> Best,
>> Aljoscha
>> 
>>> On 30. Aug 2019, at 16:28, Till Rohrmann  wrote:
>>> 
>>> Hi Jan,
>>> 
>>> this looks to me like a bug for which you could create a JIRA and PR to fix 
>>> it. Just to make sure, I've pulled in Aljoscha who is the author of this 
>>> change to check with him whether we are forgetting something.
>>> 
>>> Cheers,
>>> Till
>>> 
>>> On Fri, Aug 30, 2019 at 3:44 PM Jan Lukavský >> <mailto:je...@seznam.cz>> wrote:
>>> Hi,
>>> 
>>> I have come across an issue with classloading in Flink's MiniCluster.
>>> The issue is that when I run local flink job from a thread, that has a
>>> non-default context classloader (for whatever reason), this classloader
>>> is not taken into account when classloading user defined functions. This
>>> is due to [1]. Is this behavior intentional, or can I file a JIRA and
>>> use Thread.currentThread.getContextClassLoader() there? I have validated
>>> that it fixes issues I'm facing.
>>> 
>>> Jan
>>> 
>>> [1]
>>> https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280
>>>  
>>> <https://github.com/apache/flink/blob/ce557839d762b5f1ec92aa1885fd3d2ae33d0d0b/flink-runtime/src/main/java/org/apache/flink/runtime/execution/librarycache/BlobLibraryCacheManager.java#L280>
>>> 
>> 



Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

2019-09-02 Thread Aljoscha Krettek
Hi,

Regarding the factory and duplicate() and whatnot, wouldn’t it work to have a 
factory like this:

/**
 * Allows to read and write an instance from and to {@link Configuration}. A 
configurable instance
 * operates in its own key space in {@link Configuration} and will be 
(de)prefixed by the framework. It cannot access keys from other options. A 
factory must have a default constructor.
 *
 */
public interface ConfigurableFactory {

/**
 * Creates an instance from the given configuration.
 */
T fromConfiguration(ConfigurationReader configuration);
}

with Configurable being:

/**
 * Allows to read and write an instance from and to {@link Configuration}. A 
configurable instance
 * operates in its own key space in {@link Configuration} and will be 
(de)prefixed by the framework. It cannot access keys from other options. A 
factory must have a default constructor.
 *
 */
public interface Configurable {

/**
 * Writes this instance to the given configuration.
 */
void writeToConfiguration(ConfigurationWriter configuration); // method 
name TBD
}

This would make the Configurable immutable and we wouldn’t need a duplicate() 
method.

Best,
Aljoscha

> On 2. Sep 2019, at 14:40, Becket Qin  wrote:
> 
> Hi Timo and Dawid,
> 
> Thanks for the patient reply. I agree that both option a) and option b) can
> solve the mutability problem.
> 
> For option a), is it a little intrusive to add a duplicate() method for a
> Configurable? It would be great if we don't put this burden on users if
> possible.
> 
> For option b), I actually feel it is slightly better than option a) from
> API perspective as getFactory() seems a more understandable method of a
> Configurable compared with duplicate(). And users do not need to implement
> much more logic.
> 
> I am curious what is the downside of keeping the Configuration simple to
> only have primitive types, and always create the Configurable using a util
> method? If Configurables are rare, do we need to put the instantiation /
> bookkeeping logic in Configuration?
> 
> @Becket for the toConfiguration this is required for shipping the
>> Configuration to TaskManager, so that we do not have to use java
>> serializability.
> 
> Do you mean a Configurable may be created and configured directly without
> reading settings from a Configuration instance? I thought a Configurable
> will always be created via a ConfigurableFactory by extracting required
> configs from a Configuration. Am I missing something?
> 
> Thanks,
> 
> Jiangjie (Becket) Qin
> 
> On Mon, Sep 2, 2019 at 4:47 PM Dawid Wysakowicz 
> wrote:
> 
>> Hi Timo, Becket
>> 
>> From the options that Timo suggested for improving the mutability
>> situation I would prefer option a) as this is the more explicit option
>> and simpler option. Just as a remark, I think in general Configurable
>> types for options will be rather very rare for some special use-cases,
>> as the majority of options are rather simpler parameters of primitive
>> types (+duration, memory)
>> 
>> @Becket for the toConfiguration this is required for shipping the
>> Configuration to TaskManager, so that we do not have to use java
>> serializability.
>> 
>> Best,
>> 
>> Dawid
>> 
>> 
>> On 02/09/2019 10:05, Timo Walther wrote:
>>> Hi Becket,
>>> 
>>> Re 1 & 3: "values in configurations should actually be immutable"
>>> 
>>> I would also prefer immutability but most of our configuration is
>>> mutable due to serialization/deserialization. Also maps and list could
>>> be mutable in theory. It is difficult to really enforce that for
>>> nested structures. I see two options:
>>> 
>>> a) For the original design: How about we force implementers to add a
>>> duplicate() method in a Configurable object? This would mean that the
>>> object is still mutable but by duplicating the object both during
>>> reading and writing we would avoid the problem you described.
>>> 
>>> b) For the current design: We still use the factory approach but let a
>>> Configurable object implement a getFactory() method such that we know
>>> how to serialize the object. With the help of a factory we can also
>>> duplicate the object easily during reading and writing and ensure
>>> immutability.
>>> 
>>> I would personally go for approach a) to not over-engineer this topic.
>>> But I'm open for option b).
>>> 
>>> Regards,
>>> Timo
>>> 
>>> 
>>> On 31.08.19 04:09, Becket Qin wrote:
 Hi Timo,
 
 Thanks for the reply. I am still a little concerned over the
 mutability of
 the Configurable which could be the value in Configuration.
 
 Re: 1
 
> But in general, people should not use any internal fields.
> Configurable objects are meant for simple little helper POJOs, not
> complex arbitrary nested data structures.
 This seems difficult to enforce... Ideally the values in configurations
 should actually be immutable. The value can only be changed by
 explicitly
 calling setters in Configuration. Otherwi

Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

2019-09-03 Thread Aljoscha Krettek
Hi,

I think it’s important to keep in mind the original goals of this FLIP and not 
let the scope grow indefinitely. As I recall it, the goals are:

 - Extend the ConfigOption system enough to allow the Table API to configure 
options that are right now only available on CheckpointingOptions, 
ExecutionConfig, and StreamExecutionEnvironment. We also want to do this 
without manually having to “forward” all the available configuration options by 
introducing equivalent setters in the Table API

 - Do the above while keeping in mind that eventually we want to allow users to 
configure everything from either the flink-conf.yaml, vie command line 
parameters, or via a Configuration.

I think the FLIP achieves this, with the added side goals of making validation 
a part of ConfigOptions, making them type safe, and making the validation 
constraints documentable (via automatic doc generation.) All this without 
breaking backwards compatibility, if I’m not mistaken.

I think we should first agree what the basic goals are so that we can quickly 
converge to consensus on this FLIP because it blocks other people/work. Among 
other things FLIP-59 depends on this. What are other opinions that people have? 
I know Becket at least has some thoughts about immutability and loading objects 
via the configuration but maybe they could be put into a follow-up FLIP if they 
are needed.

Also, I had one thought on the interaction of this FLIP-54 and FLIP-59 when it 
comes to naming. I think eventually it makes sense to have a common interface 
for things that are configurable from a Configuration (FLIP-59 introduces the 
first batch of this). It seems natural to call this interface Configurable. 
That’s a problem for this FLIP-54 because we also introduce a Configurable. 
Maybe the thing that we introduce here should be called ConfigObject or 
ConfigStruct to highlight that it has a more narrow focus and is really only a 
POJO for holding a bunch of config options that have to go together. What do 
you think?

Best,
Aljoscha

> On 3. Sep 2019, at 14:08, Timo Walther  wrote:
> 
> Hi Danny,
> 
> yes, this FLIP covers all the building blocks we need also for unification of 
> the DDL properties.
> 
> Regards,
> Timo
> 
> 
> On 03.09.19 13:45, Danny Chan wrote:
>>> with the new SQL DDL
>> based on properties as well as more connectors and formats coming up,
>> unified configuration becomes more important
>> 
>> I Cann’t agree more, do you think we can unify the config options key format 
>> here for all the DDL properties ?
>> 
>> Best,
>> Danny Chan
>> 在 2019年8月16日 +0800 PM10:12,dev@flink.apache.org,写道:
>>> with the new SQL DDL
>>> based on properties as well as more connectors and formats coming up,
>>> unified configuration becomes more important
> 
> 



Re: [DISCUSS] Releasing Flink 1.8.2

2019-09-04 Thread Aljoscha Krettek
Hi,

I’m just running the last tests on FLINK-13586 on Travis and them I’m merging.

Best,
Aljoscha 

> On 4. Sep 2019, at 07:37, Jark Wu  wrote:
> 
> Thanks for the work Jincheng! 
> 
> I have moved remaining major issues to 1.8.3 except FLINK-13586. 
> 
> Hi @Aljoscha Krettek <mailto:aljos...@apache.org> , is that possible to merge 
> FLINK-13586 today? 
> 
> Best,
> Jark
> 
> On Wed, 4 Sep 2019 at 10:47, jincheng sun  <mailto:sunjincheng...@gmail.com>> wrote:
> Thanks for the udpate Jark!
> 
> I have add the new version 1.8.3 in JIRA, could you please remark the
> JIRAs(Such as FLINK-13689) which we do not want merge into the 1.8.2
> release :)
> 
>  You are right, I think FLINK-13586 is better to be contained in 1.8.2
> release!
> 
> Thanks,
> Jincheng
> 
> 
> Jark Wu mailto:imj...@gmail.com>> 于2019年9月4日周三 上午10:15写道:
> 
> > Hi all,
> >
> > I am very happy to say that all the blockers and critical issues for
> > release 1.8.2 have been resolved!
> >
> > Great thanks to everyone who contribute to the release.
> >
> > I hope to create the first RC on Sep 05, at 10:00 UTC+8.
> > If you find some other blocker issues for 1.8.2, please let me know before
> > that to account for it for the 1.8.2 release.
> >
> > Before cutting the RC1, I think it has chance to merge the
> > ClosureCleaner.clean fix (FLINK-13586), because the review and travis are
> > both passed.
> >
> > Cheers,
> > Jark
> >
> > On Wed, 4 Sep 2019 at 00:45, Kostas Kloudas  > <mailto:kklou...@gmail.com>> wrote:
> >
> > > Yes, I will do that Jark!
> > >
> > > Kostas
> > >
> > > On Tue, Sep 3, 2019 at 4:19 PM Jark Wu  > > <mailto:imj...@gmail.com>> wrote:
> > > >
> > > > Thanks Kostas for the quick fixing.
> > > >
> > > > However, I find that FLINK-13940 still target to 1.8.2 as a blocker.
> > If I
> > > > understand correctly, FLINK-13940 is aiming for a nicer and better
> > > solution
> > > > in the future.
> > > > So should we update the fixVersion of FLINK-13940?
> > > >
> > > > Best,
> > > > Jark
> > > >
> > > > On Tue, 3 Sep 2019 at 21:33, Kostas Kloudas  > > > <mailto:kklou...@gmail.com>>
> > wrote:
> > > >
> > > > > Thanks for waiting!
> > > > >
> > > > > A fix for FLINK-13940 has been merged on 1.8, 1.9 and the master
> > under
> > > > > FLINK-13941.
> > > > >
> > > > > Cheers,
> > > > > Kostas
> > > > >
> > > > > On Tue, Sep 3, 2019 at 11:25 AM jincheng sun <
> > sunjincheng...@gmail.com <mailto:sunjincheng...@gmail.com>
> > > >
> > > > > wrote:
> > > > > >
> > > > > > +1 FLINK-13940 <https://issues.apache.org/jira/browse/FLINK-13940 
> > > > > > <https://issues.apache.org/jira/browse/FLINK-13940>>
> > > is a
> > > > > > blocker, due to loss data is very important bug, And great thanks
> > for
> > > > > > helping fix it  Kostas!
> > > > > >
> > > > > > Best, Jincheng
> > > > > >
> > > > > > Kostas Kloudas mailto:kklou...@gmail.com>> 
> > > > > > 于2019年9月2日周一 下午7:20写道:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I think this should be also considered a blocker
> > > > > > > https://issues.apache.org/jira/browse/FLINK-13940 
> > > > > > > <https://issues.apache.org/jira/browse/FLINK-13940>.
> > > > > > > It is not a regression but it can result to data loss.
> > > > > > >
> > > > > > > I think I can have a quick fix by tomorrow.
> > > > > > >
> > > > > > > Cheers,
> > > > > > > Kostas
> > > > > > >
> > > > > > > On Mon, Sep 2, 2019 at 12:01 PM jincheng sun <
> > > sunjincheng...@gmail.com <mailto:sunjincheng...@gmail.com>
> > > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Thanks for all of your feedback!
> > > > > > > >
> > > > > > > > Hi Jark, Glad to see that you are doing what RM should 

Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-09-04 Thread Aljoscha Krettek
>>>>>>>>>> 
>>>>>>>>>> Thanks! It works.
>>>>>>>>>> 
>>>>>>>>>> Thanks,
>>>>>>>>>> Dian
>>>>>>>>>> 
>>>>>>>>>>> 在 2019年8月27日,上午10:55,jincheng sun  写道:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Dian, can you check if you have edit access? :)
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Dian Fu  于2019年8月26日周一 上午10:52写道:
>>>>>>>>>>> 
>>>>>>>>>>>> Hi Jincheng,
>>>>>>>>>>>> 
>>>>>>>>>>>> Appreciated for the kind tips and offering of help. Definitely
>>> need
>>>>>>>> it!
>>>>>>>>>>>> Could you grant me write permission for confluence? My Id: Dian
>>> Fu
>>>>>>>>>>>> Thanks,
>>>>>>>>>>>> Dian
>>>>>>>>>>>> 
>>>>>>>>>>>>> 在 2019年8月26日,上午9:53,jincheng sun  写道:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks for your feedback Hequn & Dian.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Dian, I am glad to see that you want help to create the FLIP!
>>>>>>>>>>>>> Everyone will have first time, and I am very willing to help you
>>>>>>>>>> complete
>>>>>>>>>>>>> your first FLIP creation. Here some tips:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> - First I'll give your account write permission for confluence.
>>>>>>>>>>>>> - Before create the FLIP, please have look at the FLIP Template
>>>>>>> [1],
>>>>>>>>>>>> (It's
>>>>>>>>>>>>> better to know more about FLIP by reading [2])
>>>>>>>>>>>>> - Create Flink Python UDFs related JIRAs after completing the
>>> VOTE
>>>>>>> of
>>>>>>>>>>>>> FLIP.(I think you also can bring up the VOTE thread, if you
>>> want!
>>>>> )
>>>>>>>>>>>>> Any problems you encounter during this period,feel free to tell
>>> me
>>>>>>>> that
>>>>>>>>>>>> we
>>>>>>>>>>>>> can solve them together. :)
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>> Jincheng
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> [1]
>>>>>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template
>>>>>>>>>>>>> [2]
>>>>>>>>>>>>> 
>>> https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
>>>>>>>>>>>>> Hequn Cheng  于2019年8月23日周五 上午11:54写道:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> +1 for starting the vote.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Thanks Jincheng a lot for the discussion.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best, Hequn
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Aug 23, 2019 at 10:06 AM Dian Fu <
>>> dian0511...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Hi Jincheng,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> +1 to start the FLIP create and VOTE on this feature. I'm
>>>>> willing
>>>>>>>> to
>>>>>>>>>>>> help
>>>>>>>>>>&g

Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-09-05 Thread Aljoscha Krettek
Hi,

Another thing to consider is the Scope of the FLIP. Currently, we try to 
support (stateful) AggregateFunctions. I have some concerns about whether or 
not DataView/MapView/ListView is a good interface because it requires quite 
some magic from the runners to make it work, such as messing with the 
TypeInformation and injecting objects at runtime. If the FLIP aims for the 
minimum of ScalarFunctions and the whole execution harness, that should be 
easier to agree on.

Another point is the naming of the new methods. I think Timo hinted at the fact 
that we have to consider catalog support for functions. There is ongoing work 
about differentiating between temporary objects and objects that are stored in 
a catalog (FLIP-64 [1]). With this in mind, the method for registering 
functions should be called register_temporary_function() and so on. Unless we 
want to already think about mixing Python and Java functions in the catalog, 
which is outside the scope of this FLIP, I think.

Best,
Aljoscha

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module


> On 5. Sep 2019, at 05:01, jincheng sun  wrote:
> 
> Hi Aljoscha,
> 
> That's a good points, so far, most of the code will live in flink-python
> module, and the rules and relNodes will be put into the both blink and
> flink planner modules, some of the common interface of required by planners
> will be placed in flink-table-common. I think you are right, we should try
> to ensure the changes of this feature is minimal.  For more detail we would
> follow this principle when review the PRs.
> 
> Great thanks for your questions and remind!
> 
> Best,
> Jincheng
> 
> 
> Aljoscha Krettek  于2019年9月4日周三 下午8:58写道:
> 
>> Hi,
>> 
>> Things looks interesting so far!
>> 
>> I had one question: Where will most of the support code for this live?
>> Will this add the required code to flink-table-common or the different
>> runners? Can we implement this in such a way that only a minimal amount of
>> support code is required in the parts of the Table API (and Table API
>> runners) that  are not python specific?
>> 
>> Best,
>> Aljoscha
>> 
>>> On 4. Sep 2019, at 14:14, Timo Walther  wrote:
>>> 
>>> Hi Jincheng,
>>> 
>>> 2. Serializability of functions: "#2 is very convenient for users" means
>> only until they have the first backwards-compatibility issue, after that
>> they will find it not so convinient anymore and will ask why the framework
>> allowed storing such objects in a persistent storage. I don't want to be
>> picky about it, but wanted to raise awareness that sometimes it is ok to
>> limit use cases to guide users for devloping backwards-compatible programs.
>>> 
>>> Thanks for the explanation fo the remaining items. It sounds reasonable
>> to me. Regarding the example with `getKind()`, I actually meant
>> `org.apache.flink.table.functions.ScalarFunction#getKind` we don't allow
>> users to override this property. And I think we should do something similar
>> for the getLanguage property.
>>> 
>>> Thanks,
>>> Timo
>>> 
>>> On 03.09.19 15:01, jincheng sun wrote:
>>>> Hi Timo,
>>>> 
>>>> Thanks for the quick reply ! :)
>>>> I have added more example for #3 and #5 to the FLIP. That are great
>>>> suggestions !
>>>> 
>>>> Regarding 2:
>>>> 
>>>> There are two kind Serialization for CloudPickle(Which is different from
>>>> Java):
>>>> 1) For class and function which can be imported, CloudPickle only
>>>> serialize the full path of the class and function (just like java class
>>>> name).
>>>> 2) For the class and function which can not be imported, CloudPickle
>> will
>>>> serialize the full content of the class and function.
>>>> For #2, It means that we can not just store the full path of the class
>> and
>>>> function.
>>>> 
>>>> The above serialization is recursive.
>>>> 
>>>> However, there is indeed an problem of backwards compatibility when the
>>>> module path of the parent class changed. But I think this is an rare
>> case
>>>> and acceptable. i.e., For Flink framework we never change the user
>>>> interface module path if we want to keep backwards compatibility. For
>> user
>>>> code, if they change the interface of UDF's parent, they should
>> re-register
>>>> their functions.
>>>> 
>>>> If we do not want support #

Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-09-06 Thread Aljoscha Krettek
Hi,

Regarding stateful functions and MapView/DataView/ListView: I think it’s best 
to keep that for a later FLIP and focus on a more basic version. Supporting 
stateful functions, especially with MapView can potentially be very slow so we 
have to see what we can do there.

For the method names, I don’t know. If FLIP-64 passes they have to be changed. 
So we could use the final names right away, but I’m also fine with using the 
old method names for now.

Best,
Aljoscha

> On 5. Sep 2019, at 12:40, jincheng sun  wrote:
> 
> Hi Aljoscha,
> 
> Thanks for your comments!
> 
> Regarding to the FLIP scope, it seems that we have agreed on the design of
> the stateless function support.
> What do you think about starting the development of the stateless function
> support firstly and continue the discussion of stateful function support?
> Or you think we should split the current FLIP into two FLIPs and discuss
> the stateful function support in another thread?
> 
> Currently, the Python DataView/MapView/ListView interfaces design follow
> the Java/Scala naming conversions.
> Of couse, We can continue to discuss whether there are better solutions,
> i.e. using annotations.
> 
> Regarding to the magic logic to support DataView/MapView/ListView, it will
> be done by the framework and is transparent for users.
> Per my understanding, the magic logic is unavoidable no matter what the
> interfaces will be.
> 
> Regarding to the catalog support of python function:1) If it's stored in
> memory as temporary object, just as you said, users can call
> TableEnvironment.register_function(will change to
> register_temporary_function in FLIP-64)
> 2) If it's persisted in external storage, users can call
> Catalog.create_function. There will be no API change per my understanding.
> 
> What do you think?
> Best,Jincheng
> 
> Aljoscha Krettek  于2019年9月5日周四 下午5:32写道:
> 
>> Hi,
>> 
>> Another thing to consider is the Scope of the FLIP. Currently, we try to
>> support (stateful) AggregateFunctions. I have some concerns about whether
>> or not DataView/MapView/ListView is a good interface because it requires
>> quite some magic from the runners to make it work, such as messing with the
>> TypeInformation and injecting objects at runtime. If the FLIP aims for the
>> minimum of ScalarFunctions and the whole execution harness, that should be
>> easier to agree on.
>> 
>> Another point is the naming of the new methods. I think Timo hinted at the
>> fact that we have to consider catalog support for functions. There is
>> ongoing work about differentiating between temporary objects and objects
>> that are stored in a catalog (FLIP-64 [1]). With this in mind, the method
>> for registering functions should be called register_temporary_function()
>> and so on. Unless we want to already think about mixing Python and Java
>> functions in the catalog, which is outside the scope of this FLIP, I think.
>> 
>> Best,
>> Aljoscha
>> 
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>> 
>> 
>>> On 5. Sep 2019, at 05:01, jincheng sun  wrote:
>>> 
>>> Hi Aljoscha,
>>> 
>>> That's a good points, so far, most of the code will live in flink-python
>>> module, and the rules and relNodes will be put into the both blink and
>>> flink planner modules, some of the common interface of required by
>> planners
>>> will be placed in flink-table-common. I think you are right, we should
>> try
>>> to ensure the changes of this feature is minimal.  For more detail we
>> would
>>> follow this principle when review the PRs.
>>> 
>>> Great thanks for your questions and remind!
>>> 
>>> Best,
>>> Jincheng
>>> 
>>> 
>>> Aljoscha Krettek  于2019年9月4日周三 下午8:58写道:
>>> 
>>>> Hi,
>>>> 
>>>> Things looks interesting so far!
>>>> 
>>>> I had one question: Where will most of the support code for this live?
>>>> Will this add the required code to flink-table-common or the different
>>>> runners? Can we implement this in such a way that only a minimal amount
>> of
>>>> support code is required in the parts of the Table API (and Table API
>>>> runners) that  are not python specific?
>>>> 
>>>> Best,
>>>> Aljoscha
>>>> 
>>>>> On 4. Sep 2019, at 14:14, Timo Walther  wrote:
>>>>> 
>>>>> Hi Jincheng,
>>>>> 
>>>>> 2. Serializability of func

Re: [DISCUSS] Flink Python User-Defined Function for Table API

2019-09-06 Thread Aljoscha Krettek
Hi,

Thanks for the quick response! I think this looks good now and it should be 
something that everyone can agree on as a first step.

Best,
Aljoscha

> On 6. Sep 2019, at 12:22, Dian Fu  wrote:
> 
> Hi all,
> 
> I have updated the FLIP and removed content relate to UDAF and also changed 
> the title of the FLIP to "Flink Python User-Defined Stateless Function for 
> Table". Does it make sense to you? 
> 
> Regards,
> Dian
> 
>> 在 2019年9月6日,下午6:09,Dian Fu  写道:
>> 
>> Hi all,
>> 
>> Thanks a lot for the discussion here. It makes sense to limit the scope of 
>> this FLIP to only ScalarFunction. I'll update the FLIP and remove the 
>> content relating to UDAF.
>> 
>> Thanks,
>> Dian
>> 
>>> 在 2019年9月6日,下午6:02,jincheng sun  写道:
>>> 
>>> Hi,
>>> 
>>> Sure, for ensure the 1.10 relesae of flink, let's split the FLIPs, and
>>> FLIP-58 only do the stateless part.
>>> 
>>> Cheers,
>>> Jincheng
>>> 
>>> Aljoscha Krettek  于2019年9月6日周五 下午5:53写道:
>>> 
>>>> Hi,
>>>> 
>>>> Regarding stateful functions and MapView/DataView/ListView: I think it’s
>>>> best to keep that for a later FLIP and focus on a more basic version.
>>>> Supporting stateful functions, especially with MapView can potentially be
>>>> very slow so we have to see what we can do there.
>>>> 
>>>> For the method names, I don’t know. If FLIP-64 passes they have to be
>>>> changed. So we could use the final names right away, but I’m also fine with
>>>> using the old method names for now.
>>>> 
>>>> Best,
>>>> Aljoscha
>>>> 
>>>>> On 5. Sep 2019, at 12:40, jincheng sun  wrote:
>>>>> 
>>>>> Hi Aljoscha,
>>>>> 
>>>>> Thanks for your comments!
>>>>> 
>>>>> Regarding to the FLIP scope, it seems that we have agreed on the design
>>>> of
>>>>> the stateless function support.
>>>>> What do you think about starting the development of the stateless
>>>> function
>>>>> support firstly and continue the discussion of stateful function support?
>>>>> Or you think we should split the current FLIP into two FLIPs and discuss
>>>>> the stateful function support in another thread?
>>>>> 
>>>>> Currently, the Python DataView/MapView/ListView interfaces design follow
>>>>> the Java/Scala naming conversions.
>>>>> Of couse, We can continue to discuss whether there are better solutions,
>>>>> i.e. using annotations.
>>>>> 
>>>>> Regarding to the magic logic to support DataView/MapView/ListView, it
>>>> will
>>>>> be done by the framework and is transparent for users.
>>>>> Per my understanding, the magic logic is unavoidable no matter what the
>>>>> interfaces will be.
>>>>> 
>>>>> Regarding to the catalog support of python function:1) If it's stored in
>>>>> memory as temporary object, just as you said, users can call
>>>>> TableEnvironment.register_function(will change to
>>>>> register_temporary_function in FLIP-64)
>>>>> 2) If it's persisted in external storage, users can call
>>>>> Catalog.create_function. There will be no API change per my
>>>> understanding.
>>>>> 
>>>>> What do you think?
>>>>> Best,Jincheng
>>>>> 
>>>>> Aljoscha Krettek  于2019年9月5日周四 下午5:32写道:
>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> Another thing to consider is the Scope of the FLIP. Currently, we try to
>>>>>> support (stateful) AggregateFunctions. I have some concerns about
>>>> whether
>>>>>> or not DataView/MapView/ListView is a good interface because it requires
>>>>>> quite some magic from the runners to make it work, such as messing with
>>>> the
>>>>>> TypeInformation and injecting objects at runtime. If the FLIP aims for
>>>> the
>>>>>> minimum of ScalarFunctions and the whole execution harness, that should
>>>> be
>>>>>> easier to agree on.
>>>>>> 
>>>>>> Another point is the naming of the new methods. I think Timo hinted at
>>>> the
>>>>>> fact that we have to consider 

Re: Call for approving Elasticsearch 7.x connector

2019-09-11 Thread Aljoscha Krettek
Hi,

Thanks for the heads up! I commented on the issue.

Best,
Aljoscha

> On 9. Sep 2019, at 10:38, vino yang  wrote:
> 
> Hi guys,
> 
> There is an issue about supporting Elasticsearch 7.x.[1]
> Based on our validation and discussion. We found that Elasticsearch 7.x
> does not guarantee API compatibility. Therefore, it does not have the
> ability to provide a universal connector like Kafka. It seems that we have
> to provide a new connector to support Elasticsearch 7.x.
> 
> Consider that Elasticsearch is a widely used system. There have been
> multiple user comments hoping to support Elasticsearch 7.x as soon as
> possible. Therefore, I hope this new connector will be approved as soon as
> possible, so that this work can be started.
> 
> Best,
> Vino
> 
> [1]: https://issues.apache.org/jira/browse/FLINK-13025



Re: [DISCUSS] Flink client api enhancement for downstream project

2019-09-11 Thread Aljoscha Krettek
Hi,

We could try and use the ASF slack for this purpose, that would probably be 
easiest. See https://s.apache.org/slack-invite. We could create a dedicated 
channel for our work and would still use the open ASF infrastructure and people 
can have a look if they are interested because discussion would be public. What 
do you think?

P.S. Committers/PMCs should should be able to login with their apache ID.

Best,
Aljoscha

> On 6. Sep 2019, at 14:24, Zili Chen  wrote:
> 
> Hi Aljoscha,
> 
> I'd like to gather all the ideas here and among documents, and draft a
> formal FLIP
> that keep us on the same page. Hopefully I start a FLIP thread in next week.
> 
> For the implementation or said POC part, I'd like to work with you guys who
> proposed
> the concept Executor to make sure that we go in the same direction. I'm
> wondering
> whether a dedicate thread or a Slack group is the proper one. In my opinion
> we can
> involve the team in a Slack group, concurrent with the FLIP process start
> our branch
> and once we reach a consensus on the FLIP, open an umbrella issue about the
> framework
> and start subtasks. What do you think?
> 
> Best,
> tison.
> 
> 
> Aljoscha Krettek  于2019年9月5日周四 下午9:39写道:
> 
>> Hi Tison,
>> 
>> To keep this moving forward, maybe you want to start working on a proof of
>> concept implementation for the new JobClient interface, maybe with a new
>> method executeAsync() in the environment that returns the JobClient and
>> implement the methods to see how that works and to see where we get. Would
>> you be interested in that?
>> 
>> Also, at some point we should collect all the ideas and start forming an
>> actual FLIP.
>> 
>> Best,
>> Aljoscha
>> 
>>> On 4. Sep 2019, at 12:04, Zili Chen  wrote:
>>> 
>>> Thanks for your update Kostas!
>>> 
>>> It looks good to me that clean up existing code paths as first
>>> pass. I'd like to help on review and file subtasks if I find ones.
>>> 
>>> Best,
>>> tison.
>>> 
>>> 
>>> Kostas Kloudas  于2019年9月4日周三 下午5:52写道:
>>> Here is the issue, and I will keep on updating it as I find more issues.
>>> 
>>> https://issues.apache.org/jira/browse/FLINK-13954
>>> 
>>> This will also cover the refactoring of the Executors that we discussed
>>> in this thread, without any additional functionality (such as the job
>> client).
>>> 
>>> Kostas
>>> 
>>> On Wed, Sep 4, 2019 at 11:46 AM Kostas Kloudas 
>> wrote:
>>>> 
>>>> Great idea Tison!
>>>> 
>>>> I will create the umbrella issue and post it here so that we are all
>>>> on the same page!
>>>> 
>>>> Cheers,
>>>> Kostas
>>>> 
>>>> On Wed, Sep 4, 2019 at 11:36 AM Zili Chen 
>> wrote:
>>>>> 
>>>>> Hi Kostas & Aljoscha,
>>>>> 
>>>>> I notice that there is a JIRA(FLINK-13946) which could be included
>>>>> in this refactor thread. Since we agree on most of directions in
>>>>> big picture, is it reasonable that we create an umbrella issue for
>>>>> refactor client APIs and also linked subtasks? It would be a better
>>>>> way that we join forces of our community.
>>>>> 
>>>>> Best,
>>>>> tison.
>>>>> 
>>>>> 
>>>>> Zili Chen  于2019年8月31日周六 下午12:52写道:
>>>>>> 
>>>>>> Great Kostas! Looking forward to your POC!
>>>>>> 
>>>>>> Best,
>>>>>> tison.
>>>>>> 
>>>>>> 
>>>>>> Jeff Zhang  于2019年8月30日周五 下午11:07写道:
>>>>>>> 
>>>>>>> Awesome, @Kostas Looking forward your POC.
>>>>>>> 
>>>>>>> Kostas Kloudas  于2019年8月30日周五 下午8:33写道:
>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I am just writing here to let you know that I am working on a
>> POC that
>>>>>>>> tries to refactor the current state of job submission in Flink.
>>>>>>>> I want to stress out that it introduces NO CHANGES to the current
>>>>>>>> behaviour of Flink. It just re-arranges things and introduces the
>>>>>>>> notion of an Executor, which is the entity responsible for
>> taking the
>>>>>>>> user-code and submit

Re: [DISCUSS] Features for Apache Flink 1.10

2019-09-11 Thread Aljoscha Krettek

Hi,

Thanks for putting together the list! And I’m +1 for the suggested 
release timeline and also for Gary and Yu as the release managers.


Best,
Aljoscha

On 9 Sep 2019, at 7:39, Yu Li wrote:


Hi Xuefu,

If I understand it correctly, the data type support work should be 
included
in the "Table API improvements->Finish type system" part, please check 
it

and let us know if anything missing there. Thanks.

Best Regards,
Yu


On Mon, 9 Sep 2019 at 11:14, Xuefu Z  wrote:

Looking at feature list, I don't see an item for complete the data 
type

support. Specifically, high precision timestamp is needed to Hive
integration, as it's so common. Missing it would damage the 
completeness of

our Hive effort.

Thanks,
Xuefu

On Sat, Sep 7, 2019 at 7:06 PM Xintong Song  
wrote:


Thanks Gray and Yu for compiling the feature list and kicking off 
this

discussion.

+1 for Gary and Yu being the release managers for Flink 1.10.

Thank you~

Xintong Song



On Sat, Sep 7, 2019 at 4:58 PM Till Rohrmann 

wrote:


Thanks for compiling the list of 1.10 efforts for the community 
Gary. I

think this helps a lot to better understand what the community is

currently

working on.

Thanks for volunteering as the release managers for the next major
release. +1 for Gary and Yu being the RMs for Flink 1.10.

Cheers,
Till

On Sat, Sep 7, 2019 at 7:26 AM Zhu Zhu  wrote:


Thanks Gary for kicking off this discussion.
Really appreciate that you and Yu offer to help to manage 1.10

release.


+1 for Gary and Yu as release managers.

Thanks,
Zhu Zhu

Dian Fu  于2019年9月7日周六 
下午12:26写道:



Hi Gary,

Thanks for kicking off the release schedule of 1.10. +1 for you 
and

Yu

Li

as the release manager.

The feature freeze/release time sounds reasonable.

Thanks,
Dian

在 2019年9月7日,上午11:30,Jark Wu  
写道:


Thanks Gary for kicking off the discussion for 1.10 release.

+1 for Gary and Yu as release managers. Thank you for you 
effort.


Best,
Jark


在 2019年9月7日,00:52,zhijiang 


写道:


Hi Gary,

Thanks for kicking off the features for next release 1.10.  I 
am

very

supportive of you and Yu Li to be the relaese managers.


Just mention another two improvements which want to be covered

in
FLINK-1.10 and I already confirmed with Piotr to reach an 
agreement

before.


1. Data serialize and copy only once for broadcast partition

[1]:

It
would improve the throughput performance greatly in broadcast 
mode

and

was

actually proposed in Flink-1.8. Most of works already done before

and

only

left the last critical jira/PR. It will not take much efforts to

make

it

ready.


2. Let Netty use Flink's buffers directly in credit-based mode

[2] :

It

could avoid memory copy from netty stack to flink managed network

buffer.

The obvious benefit is decreasing the direct memory overhead

greatly

in
large-scale jobs. I also heard of some user cases encounter 
direct

OOM

caused by netty memory overhead. Actually this improvment was

proposed

by
nico in FLINK-1.7 and always no time to focus then. Yun Gao 
already

submitted a PR half an year ago but have not been reviewed yet. I

could

help review the deign and PR codes to make it ready.


And you could make these two items as lowest priority if

possible.


[1] https://issues.apache.org/jira/browse/FLINK-10745
[2] https://issues.apache.org/jira/browse/FLINK-10742

Best,
Zhijiang


--

From:Gary Yao 
Send Time:2019年9月6日(星期五) 17:06
To:dev 
Cc:carp84 
Subject:[DISCUSS] Features for Apache Flink 1.10

Hi community,

Since Apache Flink 1.9.0 has been released more than 2 weeks

ago,

I

want to

start kicking off the discussion about what we want to achieve

for

the

1.10

release.

Based on discussions with various people as well as 
observations

from

mailing
list threads, Yu Li and I have compiled a list of features that

we

deem

important to be included in the next release. Note that the

features

presented
here are not meant to be exhaustive. As always, I am sure that

there

will be

other contributions that will make it into the next release.

This

email

thread
is merely to kick off a discussion, and to give users and

contributors

an
understanding where the focus of the next release lies. If 
there

is

anything
we have missed that somebody is working on, please reply to 
this

thread.



** Proposed features and focus

Following the contribution of Blink to Apache Flink, the

community

released

a
preview of the Blink SQL Query Processor, which offers better

SQL

coverage

and
improved performance for batch queries, in Flink 1.9.0. 
However,

the

integration of the Blink query processor is not fully completed

yet

as

there

are still pending tasks, such as implementing full TPC-DS

support.

With

the

next Flink release, we aim at finishing the Blink integration.

Furthermore, there are several ongoing work threads addressing

long-standing

issues reported by users, such as improving checkpointing under
backpressu

Re: [PROPOSAL] Merge NewClusterClient into ClusterClient

2019-09-18 Thread Aljoscha Krettek
I agree that NewClusterClient and ClusterClient can be merged now that there is 
no pre-FLIP-6 code base anymore.

Side note, there are a lot of methods in ClusterClient that should not really 
be there, in my opinion:
 - all the getOptimizedPlan*() method
 - the run() methods. In the end, only submitJob should be required

We should also see what Till (cc’ed) says, maybe he has an opinion on why the 
separation should be kept.

Best,
Aljoscha

> On 18. Sep 2019, at 11:54, Zili Chen  wrote:
> 
> Hi Xiaogang,
> 
> Thanks for your reply.
> 
> According to the feature discussion thread[1] client API enhancement is a
> planned
> feature towards 1.10 and thus I think this thread is valid if we can reach
> a consensus
> and introduce new client API in this development cycle.
> 
> Best,
> tison.
> 
> [1]
> https://lists.apache.org/thread.html/22639ca7de62a18f50e90db53e73910bd99b7f00c82f7494f4cb035f@%3Cdev.flink.apache.org%3E
> 
> 
> SHI Xiaogang  于2019年9月18日周三 下午3:03写道:
> 
>> Hi Tison,
>> 
>> Thanks for bringing this.
>> 
>> I think it's fine to break the back compatibility of client API now that
>> ClusterClient is not well designed for public usage.
>> But from my perspective, we should postpone any modification to existing
>> interfaces until we come to an agreement on new client API. Otherwise, our
>> users may adapt their implementation more than once.
>> 
>> Regards,
>> Xiaogang
>> 
>> Jeff Zhang  于2019年9月18日周三 上午10:49写道:
>> 
>>> Thanks for raising this discussion. Overall +1 to merge NewClusterClient
>>> into ClusterClient.
>>> 
>>> 1. I think it is OK to break the backward compatibility. This current
>>> client api is no so clean which already cause issue for downstream
>> project
>>> and flink itself.
>>> In flink scala shell, I notice this kind of non-readable code
>>> Option[Either
>>> [MiniCluster , ClusterClient[_]]])
>>> 
>>> 
>> https://github.com/apache/flink/blob/master/flink-scala-shell/src/main/scala/org/apache/flink/api/scala/FlinkShell.scala#L138
>>> I also created tickets and PR to try to simply it.
>>> https://github.com/apache/flink/pull/8546
>>> https://github.com/apache/flink/pull/8533
>>>   Personally I don't think we need to keep backward compatibility for
>>> non-well-designed api, otherwise it will bring lots of unnecessary
>>> overhead.
>>> 
>>> 2. Another concern is that I notice there're many implementation details
>> in
>>> ClusterClient. I think we should just expose a thin interface, so maybe
>> we
>>> can create interface ClusterClient which includes as less methods as
>>> possible, and move all the implementation to AbstractClusterClient.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> Zili Chen  于2019年9月18日周三 上午9:46写道:
>>> 
 Hi devs,
 
 FLINK-14096[1] was created yesterday. It is aimed at merge the bridge
 class NewClusterClient into ClusterClient because with the effort
 under FLINK-10392 this bridge class is no longer necessary.
 
 Technically in current codebase all implementation of interface
 NewClusterClient is subclass of ClusterClient so that the work
 required is no more than move method declaration. It helps we use
 type signature ClusterClient instead of
 >>> latter if we aren't in a type variable context. This should not affect
 anything internal in Flink scope.
 
 However, as mentioned by Kostas in the JIRA and a previous discussion
 under a commit[2], it seems that we provide some levels of backward
 compatibility for ClusterClient and thus it's better to start a public
 discussion here.
 
 There are two concerns from my side.
 
 1. How much impact this proposal brings to users programming directly
 to ClusterClient?
 
 The specific changes here are add two methods `submitJob` and
 `requestJobResult` which are already implemented by RestClusterClient
 and MiniClusterClient. Users would only be affected if they create
 a class that inherits ClusterClient and doesn't implement these
 methods. Besides, users who create a class that implements
 NewClusterClient would be affected by the removal of NewClusterClient.
 
 If we have to provide backward compatibility and the impact is no
 further than those above, we can deprecate NewClusterClient, merge
 the methods into ClusterClient with a dummy default like throw
 Exception.
 
 2. Why do we provide backward compatibility for ClusterClient?
 
 It already surprises Kostas and me while we think ClusterClient is a
 totally internal class which we can evolve regardless of api
 stability. Our community promises api stability by marking class
 and/or method as @Public/@PublicEvolving. It is wried and even
 dangerous we are somehow enforced to provide backward compatibility
 for classes without any annotation.
 
 Besides, as I mention in [2], users who anyway want to program
 directly to internal classes/interfaces are considered to prepare to
 make

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Aljoscha Krettek
Hi,

I think this discussion and the one for FLIP-64 are very connected. To resolve 
the differences, think we have to think about the basic principles and find 
consensus there. The basic questions I see are:

 - Do we want to support overriding builtin functions?
 - Do we want to support overriding catalog functions?
 - And then later: should temporary functions be tied to a catalog/database?

I don’t have much to say about these, except that we should somewhat stick to 
what the industry does. But I also understand that the industry is already very 
divided on this.

Best,
Aljoscha

> On 18. Sep 2019, at 11:41, Jark Wu  wrote:
> 
> Hi,
> 
> +1 to strive for reaching consensus on the remaining topics. We are close to 
> the truth. It will waste a lot of time if we resume the topic some time 
> later. 
> 
> +1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun” way to 
> override a catalog function. 
> 
> I’m not sure about “system.system.fun”, it introduces a nonexistent cat & db? 
> And we still need to do special treatment for the dedicated system.system cat 
> & db? 
> 
> Best,
> Jark
> 
> 
>> 在 2019年9月18日,06:54,Timo Walther  写道:
>> 
>> Hi everyone,
>> 
>> @Xuefu: I would like to avoid adding too many things incrementally. Users 
>> should be able to override all catalog objects consistently according to 
>> FLIP-64 (Support for Temporary Objects in Table module). If functions are 
>> treated completely different, we need more code and special cases. From an 
>> implementation perspective, this topic only affects the lookup logic which 
>> is rather low implementation effort which is why I would like to clarify the 
>> remaining items. As you said, we have a slight consenus on overriding 
>> built-in functions; we should also strive for reaching consensus on the 
>> remaining topics.
>> 
>> @Dawid: I like your idea as it ensures registering catalog objects 
>> consistent and the overriding of built-in functions more explicit.
>> 
>> Thanks,
>> Timo
>> 
>> 
>> On 17.09.19 11:59, kai wang wrote:
>>> hi, everyone
>>> I think this flip is very meaningful. it supports functions that can be
>>> shared by different catalogs and dbs, reducing the duplication of functions.
>>> 
>>> Our group based on flink's sql parser module implements create function
>>> feature, stores the parsed function metadata and schema into mysql, and
>>> also customizes the catalog, customizes sql-client to support custom
>>> schemas and functions. Loaded, but the function is currently global, and is
>>> not subdivided according to catalog and db.
>>> 
>>> In addition, I very much hope to participate in the development of this
>>> flip, I have been paying attention to the community, but found it is more
>>> difficult to join.
>>> thank you.
>>> 
>>> Xuefu Z  于2019年9月17日周二 上午11:19写道:
>>> 
 Thanks to Tmo and Dawid for sharing thoughts.
 
 It seems to me that there is a general consensus on having temp functions
 that have no namespaces and overwrite built-in functions. (As a side note
 for comparability, the current user defined functions are all temporary and
 having no namespaces.)
 
 Nevertheless, I can also see the merit of having namespaced temp functions
 that can overwrite functions defined in a specific cat/db. However,  this
 idea appears orthogonal to the former and can be added incrementally.
 
 How about we first implement non-namespaced temp functions now and leave
 the door open for namespaced ones for later releases as the requirement
 might become more crystal? This also helps shorten the debate and allow us
 to make some progress along this direction.
 
 As to Dawid's idea of having a dedicated cat/db to host the temporary temp
 functions that don't have namespaces, my only concern is the special
 treatment for a cat/db, which makes code less clean, as evident in treating
 the built-in catalog currently.
 
 Thanks,
 Xuefiu
 
 On Mon, Sep 16, 2019 at 5:07 PM Dawid Wysakowicz <
 wysakowicz.da...@gmail.com>
 wrote:
 
> Hi,
> Another idea to consider on top of Timo's suggestion. How about we have a
> special namespace (catalog + database) for built-in objects? This catalog
> would be invisible for users as Xuefu was suggesting.
> 
> Then users could still override built-in functions, if they fully qualify
> object with the built-in namespace, but by default the common logic of
> current dB & cat would be used.
> 
> CREATE TEMPORARY FUNCTION func ...
> registers temporary function in current cat & dB
> 
> CREATE TEMPORARY FUNCTION cat.db.func ...
> registers temporary function in cat db
> 
> CREATE TEMPORARY FUNCTION system.system.func ...
> Overrides built-in function with temporary function
> 
> The built-in/system namespace would not be writable for permanent
 objects.
> WDYT?
> 
> This way I think we can hav

Re: [DISCUSS] Does removing deprecated interfaces needs another FLIP

2020-02-07 Thread Aljoscha Krettek

I would say a ML discussion or even a Jira issue is enough because

a) the methods are already deprecated
b) the methods are @PublicEvolving, which I don't consider a super 
strong guarantee to users (we still shouldn't remove them lightly, but 
we can if we have to...)


Best,
Aljoscha

On 07.02.20 04:40, Kurt Young wrote:

Hi dev,

Currently I want to remove some already deprecated methods from
TableEnvironment which annotated with @PublicEnvolving. And I also created
a discussion thread [1] to both dev and user mailing lists to gather
feedback on that. But I didn't find any matching rule in Flink bylaw [2] to
follow. Since this is definitely a API breaking change, but we already
voted for that back in the FLIP which deprecated these methods.

I'm not sure about how to proceed for now. Looks like I have 2 choices:

1. If no one raise any objections in discuss thread in like 72 hours, I
will create a jira to start working on it.
2. Since this is a API breaking change, I need to open another FLIP to tell
that I want to remove these deprecated methods. This seems a little
redundant with the first FLIP which deprecate the methods.

What do you think?

Best,
Kurt

[1]
https://lists.apache.org/thread.html/r98af66feb531ce9e6b94914e44391609cad857e16ea84db5357c1980%40%3Cdev.flink.apache.org%3E
[2] https://cwiki.apache.org/confluence/display/FLINK/Flink+Bylaws



Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-07 Thread Aljoscha Krettek
If we need it, we can probably beef up the JobListener to allow 
accessing some information about the whole graph or sources and sinks. 
My only concern right now is that we don't have a stable interface for 
our job graphs/pipelines right now.


Best,
Aljoscha

On 06.02.20 23:00, Gyula Fóra wrote:

Hi Jeff & Till!

Thanks for the feedback, this is exactly the discussion I was looking for.
The JobListener looks very promising if we can expose the JobGraph somehow
(correct me if I am wrong but it is not accessible at the moment).

I did not know about this feature that's why I added my JobSubmission hook
which was pretty similar but only exposing the JobGraph. In general I like
the listener better and I would not like to add anything extra if we can
avoid it.

Actually the bigger part of the integration work that will need more
changes in Flink will be regarding the accessibility of sources/sinks from
the JobGraph and their specific properties. For instance at the moment the
Kafka sources and sinks do not expose anything publicly such as topics,
kafka configs, etc. Same goes for other data connectors that we need to
integrate in the long run. I guess there will be a separate thread on this
once we iron out the initial integration points :)

I will try to play around with the JobListener interface tomorrow and see
if I can extend it to meet our needs.

Cheers,
Gyula

On Thu, Feb 6, 2020 at 4:08 PM Jeff Zhang  wrote:


Hi Gyula,

Flink 1.10 introduced JobListener which is invoked after job submission and
finished.  May we can add api on JobClient to get what info you needed for
altas integration.


https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java#L46


Gyula Fóra  于2020年2月5日周三 下午7:48写道:


Hi all!

We have started some preliminary work on the Flink - Atlas integration at
Cloudera. It seems that the integration will require some new hook
interfaces at the jobgraph generation and submission phases, so I

figured I

will open a discussion thread with my initial ideas to get some early
feedback.

*Minimal background*
Very simply put Apache Atlas is a data governance framework that stores
metadata for our data and processing logic to track ownership, lineage

etc.

It is already integrated with systems like HDFS, Kafka, Hive and many
others.

Adding Flink integration would mean that we can track the input output

data

of our Flink jobs, their owners and how different Flink jobs are

connected

to each other through the data they produce (lineage). This seems to be a
very big deal for a lot of companies :)

*Flink - Atlas integration in a nutshell*
In order to integrate with Atlas we basically need 2 things.
  - Flink entity definitions
  - Flink Atlas hook

The entity definition is the easy part. It is a json that contains the
objects (entities) that we want to store for any give Flink job. As a
starter we could have a single FlinkApplication entity that has a set of
inputs and outputs. These inputs/outputs are other Atlas entities that

are

already defines such as Kafka topic or Hbase table.

The Flink atlas hook will be the logic that creates the entity instance

and

uploads it to Atlas when we start a new Flink job. This is the part where
we implement the core logic.

*Job submission hook*
In order to implement the Atlas hook we need a place where we can inspect
the pipeline, create and send the metadata when the job starts. When we
create the FlinkApplication entity we need to be able to easily determine
the sources and sinks (and their properties) of the pipeline.

Unfortunately there is no JobSubmission hook in Flink that could execute
this logic and even if there was one there is a mismatch of abstraction
levels needed to implement the integration.
We could imagine a JobSubmission hook executed in the JobManager runner

as

this:

void onSuccessfulSubmission(JobGraph jobGraph, Configuration
configuration);

This is nice but the JobGraph makes it super difficult to extract sources
and UDFs to create the metadata entity. The atlas entity however could be
easily created from the StreamGraph object (used to represent the logical
flow) before the JobGraph is generated. To go around this limitation we
could add a JobGraphGeneratorHook interface:

void preProcess(StreamGraph streamGraph); void postProcess(JobGraph
jobGraph);

We could then generate the atlas entity in the preprocess step and add a
jobmission hook in the postprocess step that will simply send the already
baked in entity.

*This kinda works but...*
The approach outlined above seems to work and we have built a POC using

it.

Unfortunately it is far from nice as it exposes non-public APIs such as

the

StreamGraph. Also it feels a bit weird to have 2 hooks instead of one.

It would be much nicer if we could somehow go back from JobGraph to
StreamGraph or at least have an easy way to access source/sink UDFS.

What do you think?

Cheers,
Gyula




--
Best Regards

Jeff Zhang





Re: [DISCUSS] Drop connectors for Elasticsearch 2.x and 5.x

2020-02-10 Thread Aljoscha Krettek

+1 for dropping them, this stuff is quite old by now.

On 10.02.20 15:04, Benchao Li wrote:

+1 for dropping 2.x - 5.x.

FYI currently only 6.x and 7.x ES Connectors are supported by table api.

Flavio Pompermaier  于2020年2月10日周一 下午10:03写道:


+1 for dropping all Elasticsearch connectors < 6.x

On Mon, Feb 10, 2020 at 2:45 PM Dawid Wysakowicz 
wrote:


Hi all,

As described in this https://issues.apache.org/jira/browse/FLINK-11720
ticket our elasticsearch 5.x connector does not work out of the box on
some systems and requires a version bump. This also happens for our e2e.
We cannot bump the version in es 5.x connector, because 5.x connector
shares a common class with 2.x that uses an API that was replaced in 5.2.

Both versions are already long eol: https://www.elastic.co/support/eol

I suggest to drop both connectors 5.x and 2.x. If it is too much to drop
both of them, I would strongly suggest dropping at least 2.x connector
and update the 5.x line to a working es client module.

What do you think? Should we drop both versions? Drop only the 2.x
connector? Or keep them both?

Best,

Dawid









Re: [VOTE] FLIP-55: Introduction of a Table API Java Expression DSL

2020-02-11 Thread Aljoscha Krettek

+1

Best,
Aljoscha

On 11.02.20 11:17, Jingsong Li wrote:

Thanks Dawid for your explanation,

+1 for vote.

So I am big +1 to accepting java.lang.Object in the Java DSL, without
scala implicit conversion, a lot of "lit" look unfriendly to users.

Best,
Jingsong Lee

On Tue, Feb 11, 2020 at 6:07 PM Dawid Wysakowicz 
wrote:


Hi,

To answer some of the questions:

@Jingsong We use Objects in the java API to make it possible to use raw
Objects without the need to wrap them in literals. If an expression is
passed it is used as is. If anything else is used, it is assumed to be
an literal and is wrapped into a literal. This way we can e.g. write
$("f0").plus(1).

@Jark I think it makes sense to shorten them, I will do it I hope people
that already voted don't mind.

@Dian That's a valid concern. I would not discard the '$' as a column
expression for java and scala. I think once we introduce the expression
DSL for python we can add another alias to java/scala. Personally I'd be
in favor of col.

On 11/02/2020 10:41, Dian Fu wrote:

Hi Dawid,

Thanks for driving this feature. The design looks very well for me

overall.


I have only one concern: $ is not allowed to be used in the identifier

of Python and so we have to come out with another symbol when aligning this
feature in the Python Table API. I noticed that there are also other
options proposed in the discussion thread, e.g. ref, col, etc. I think it
would be great if the proposed symbol could be supported in both the
Java/Scala and Python Table API. What's your thoughts?


Regards,
Dian


在 2020年2月11日,上午11:13,Jark Wu  写道:

+1 for this.

I have some minor comments:
- I'm +1 to use $ in both Java and Scala API.
- I'm +1 to use lit(), Spark also provides lit() function to create a
literal value.
- Is it possible to have `isGreater` instead of `isGreaterThan` and
`isGreaterOrEqual` instead of `isGreaterThanOrEqualTo` in

BaseExpressions?


Best,
Jark

On Tue, 11 Feb 2020 at 10:21, Jingsong Li 

wrote:



Hi Dawid,

Thanks for driving.

- adding $ in scala api looks good to me.
- Just a question, what should be expected to java.lang.Object? literal
object or expression? So the Object is the grammatical sugar of

literal?


Best,
Jingsong Lee

On Mon, Feb 10, 2020 at 9:40 PM Timo Walther 

wrote:



+1 for this.

It will also help in making a TableEnvironment.fromElements() possible
and reduces technical debt. One entry point of TypeInformation less in
the API.

Regards,
Timo


On 10.02.20 08:31, Dawid Wysakowicz wrote:

Hi all,

I wanted to resurrect the thread about introducing a Java Expression
DSL. Please see the updated flip page[1]. Most of the flip was

concluded

in previous discussion thread. The major changes since then are:

* accepting java.lang.Object in the Java DSL

* adding $ interpolation for a column in the Scala DSL

I think it's important to move those changes forward as it makes it
easier to transition to the new type system (Java parser supports

only

the old type system stack for now) that we are working on for the

past

releases.

Because the previous discussion thread was rather conclusive I want

to

start already with a vote. If you think we need another round of
discussion, feel free to say so.


The vote will last for at least 72 hours, following the consensus

voting

process.

FLIP wiki:

[1]




https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL


Discussion thread:





https://lists.apache.org/thread.html/eb5e7b0579e5f1da1e9bf1ab4e4b86dba737946f0261d94d8c30521e@%3Cdev.flink.apache.org%3E







--
Best, Jingsong Lee








Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-02-13 Thread Aljoscha Krettek
I think exposing the Pipeline should be ok. Using the internal 
StreamGraph might be problematic because this might change/break but 
that's a problem of the external code.


Aljoscha

On 11.02.20 16:26, Gyula Fóra wrote:

Hi All!

I have made a prototype that simply adds a getPipeline() method to the
JobClient interface. Then I could easily implement the Atlas hook using the
JobListener interface. I simply check if Pipeline is instanceof StreamGraph
and do the logic there.

I think this is so far the cleanest approach and I much prefer this
compared to working on the JobGraph directly which would expose even more
messy internals.

Unfortunately this change alone is not enough for the integration as we
need to make sure that all Sources/Sinks that we want to integrate to atlas
publicly expose some of their properties:

- Kafka source/sink:
   - Kafka props
   - Topic(s) - this is tricky for sinks
- FS source /sink:
   - Hadoop props
   - Base path for StreamingFileSink
   - Path for ContinuousMonitoringSource

Most of these are straightforward changes, the only question is what we
want to register in Atlas from the available connectors. Ideally users
could also somehow register their own Atlas metadata for custom sources and
sinks, we could probably introduce an interface for that in Atlas.

Cheers,
Gyula

On Fri, Feb 7, 2020 at 10:37 AM Gyula Fóra  wrote:


Maybe we could improve the Pipeline interface in the long run, but as a
temporary solution the JobClient could expose a getPipeline() method.

That way the implementation of the JobListener could check if its a
StreamGraph or a Plan.

How bad does that sound?

Gyula

On Fri, Feb 7, 2020 at 10:19 AM Gyula Fóra  wrote:


Hi Aljoscha!

That's a valid concert but we should try to figure something out, many
users need this before they can use Flink.

I think the closest thing we have right now is the StreamGraph. In
contrast with the JobGraph  the StreamGraph is pretty nice from a metadata
perspective :D
The big downside of exposing the StreamGraph is that we don't have it in
batch. On the other hand we could expose the JobGraph but then the
integration component would still have to do the heavy lifting for batch
and stream specific operators and UDFs.

Instead of exposing either StreamGraph/JobGraph, we could come up with a
metadata like representation for the users but that would be like
implementing Atlas integration itself without Atlas dependencies :D

As a comparison point, this is how it works in Storm:
Every operator (spout/bolt), stores a config map (string->string) with
all the metadata such as operator class, and the operator specific configs.
The Atlas hook works on this map.
This is very fragile and depends on a lot of internals. Kind of like
exposing the JobGraph but much worse. I think we can do better.

Gyula

On Fri, Feb 7, 2020 at 9:55 AM Aljoscha Krettek 
wrote:


If we need it, we can probably beef up the JobListener to allow
accessing some information about the whole graph or sources and sinks.
My only concern right now is that we don't have a stable interface for
our job graphs/pipelines right now.

Best,
Aljoscha

On 06.02.20 23:00, Gyula Fóra wrote:

Hi Jeff & Till!

Thanks for the feedback, this is exactly the discussion I was looking

for.

The JobListener looks very promising if we can expose the JobGraph

somehow

(correct me if I am wrong but it is not accessible at the moment).

I did not know about this feature that's why I added my JobSubmission

hook

which was pretty similar but only exposing the JobGraph. In general I

like

the listener better and I would not like to add anything extra if we

can

avoid it.

Actually the bigger part of the integration work that will need more
changes in Flink will be regarding the accessibility of sources/sinks

from

the JobGraph and their specific properties. For instance at the moment

the

Kafka sources and sinks do not expose anything publicly such as topics,
kafka configs, etc. Same goes for other data connectors that we need to
integrate in the long run. I guess there will be a separate thread on

this

once we iron out the initial integration points :)

I will try to play around with the JobListener interface tomorrow and

see

if I can extend it to meet our needs.

Cheers,
Gyula

On Thu, Feb 6, 2020 at 4:08 PM Jeff Zhang  wrote:


Hi Gyula,

Flink 1.10 introduced JobListener which is invoked after job

submission and

finished.  May we can add api on JobClient to get what info you

needed for

altas integration.




https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/core/execution/JobListener.java#L46



Gyula Fóra  于2020年2月5日周三 下午7:48写道:


Hi all!

We have started some preliminary work on the Flink - Atlas

integration at

Cloudera. It seems that the integration will require some new hook
interfaces at the jobgraph generation and submission phases, so I

figured I

will open a discuss

Re: [DISCUSS] Improve history server with log support

2020-02-13 Thread Aljoscha Krettek

Hi,

what's the difference in approach to the mentioned related Jira Issue 
([1])? I commented there because I'm skeptical about adding 
Hadoop-specific code to the generic cluster components.


Best,
Aljoscha

[1] https://issues.apache.org/jira/browse/FLINK-14317

On 13.02.20 03:47, SHI Xiaogang wrote:

Hi Rong Rong,

Thanks for the proposal. We are also suffering from some pains brought by
history server. To address them, we propose a trace system, which is very
similar to the metric system, for historical information.

A trace is semi-structured information about events in Flink. Useful traces
include:
* job traces: which contain the job graph of submitted jobs.
* schedule traces: A schedule trace is typically composed of the
information of task slots. They are generated when a job finishes, fails,
or is canceled. As a job may restart mutliple times, a job typically has
multiple schedule traces.
* checkpoint traces: which are generated when a checkpoint completes or
fails.
* task manager traces: which are generated when a task manager terminates.
Users can access the link to aggregated logs intaskmanager traces.

Users can use TraceReport to collect traces in Flink and export them to
external storage (e.g., ElasticSearch). By retrieving traces when
exceptions happen, we can improve user experience in altering.

Regards,
Xiaogang

Rong Rong  于2020年2月13日周四 上午9:41写道:


Hi All,

Recently we have been experimenting using Flink’s history server as a
centralized debugging service for completed streaming jobs.

Specifically, we dynamically generate links to access log files on the YARN
host; in the meantime, we use the Flink history server to show job graphs,
exceptions and other info of the completed jobs[2].

This causes some pain for our users, namely: It is inconvenient to go to
YARN host to access logs; then go to Flink history server for the other
information.

Thus we would like to propose an improvement to the currently Flink history
server:

-

To support dynamic links to residual log files from the host machine
within the retention period [3];
-

To support dynamic links to aggregated log files provided by the
cluster, if supported: such as Hadoop HistoryServer[1], or Kubernetes
cluster level logging[4]?
-

   Similar integration with Hadoop HistoryServer was already proposed
   before[5] with slightly different approach.


Any feedback and suggestions are highly appreciated!

--

Rong

[1]

https://hadoop.apache.org/docs/r2.9.2/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/HistoryServerRest.html

[2]

https://ci.apache.org/projects/flink/flink-docs-release-1.9/monitoring/historyserver.html

[3]

https://hadoop.apache.org/docs/r2.9.2/hadoop-yarn/hadoop-yarn-common/yarn-default.xml#yarn.nodemanager.log.retain-seconds

[4]

https://kubernetes.io/docs/concepts/cluster-administration/logging/#cluster-level-logging-architectures
[5] https://issues.apache.org/jira/browse/FLINK-14317





[ANNOUNCE] New Documentation Style Guide

2020-02-14 Thread Aljoscha Krettek

Hi Everyone,

we just merged a new style guide for documentation writing: 
https://flink.apache.org/contributing/docs-style.html.


Anyone who is writing documentation or is planning to do so should check 
this out. Please open a Jira Issue or respond here if you have any 
comments or questions.


Some of the most important points in the style guide are:

 - We should use direct language and address the reader as you instead 
of passive constructions. Please read the guide if you want to 
understand what this means.


 - We should use "alert blocks" instead of simple inline alert tags. 
Again, please refer to the guide to see what this means exactly if 
you're not sure.


There's plenty more and some interesting links about 
technical/documentation writing as well.


Best,
Aljoscha


Re: [ANNOUNCE] New Documentation Style Guide

2020-02-17 Thread Aljoscha Krettek
Just to be clear: I didn't write this style guide. Marta (in cc) wrote 
this, I just finally merged it.


Best,
Aljoscha

On 17.02.20 11:45, jincheng sun wrote:

Thanks for this great job Aljoscha!

Best,
Jincheng



Yu Li  于2020年2月17日周一 下午6:42写道:


I think the guide itself is a great example of the new style! Thanks for
driving this, Aljoscha!

Best Regards,
Yu


On Mon, 17 Feb 2020 at 16:44, Becket Qin  wrote:


The guideline is very practical! I like it! Thanks for putting it

together,

Aljoscha.

Jiangjie (Becket) Qin

On Mon, Feb 17, 2020 at 10:04 AM Xintong Song 
wrote:


Thanks for the summary!

I've read through the guidelines and found it very helpful. Many of the
questions I had while working on the 1.10 docs the answer can be found

in

the guideline. And it also inspires me with questions I have never

thought

of, especially the language style part.

Thank you~

Xintong Song



On Sun, Feb 16, 2020 at 12:55 AM Zhijiang
 wrote:


Thanks for bringing this great and valuable document.

I read through the document and was inspired especially by some

sections

in "Voice and Tone" and "General Guiding Principles".
  I think it is not only helpful for writing Flink documents, but also
provides guideline/benefit for other writings.
It also reminded me to extend the Flink glossary if necessary.

Best,
Zhijiang


--
From:Jingsong Li 
Send Time:2020 Feb. 15 (Sat.) 23:21
To:dev 
Subject:Re: [ANNOUNCE] New Documentation Style Guide

Thank for the great work,

In 1.10, I have modified and reviewed some documents. In that

process,

sometimes there is some confusion, how to write is the standard. How

to

write is correct to the users.
Docs style now tells me. Learned a lot.

Best,
Jingsong Lee

On Sat, Feb 15, 2020 at 10:00 PM Dian Fu 

wrote:



Thanks for the great work! This is very helpful to keep the

documentation

style consistent across the whole project. It's also very helpful

for

non-native English contributors like me.


在 2020年2月15日,下午3:42,Jark Wu  写道:

Great summary! Thanks for adding the translation specification in

it.

I learned a lot from the guide.

Best,
Jark

On Fri, 14 Feb 2020 at 23:39, Aljoscha Krettek <

aljos...@apache.org>

wrote:



Hi Everyone,

we just merged a new style guide for documentation writing:
https://flink.apache.org/contributing/docs-style.html.

Anyone who is writing documentation or is planning to do so

should

check

this out. Please open a Jira Issue or respond here if you have

any

comments or questions.

Some of the most important points in the style guide are:

  - We should use direct language and address the reader as you

instead

of passive constructions. Please read the guide if you want to
understand what this means.

  - We should use "alert blocks" instead of simple inline alert

tags.

Again, please refer to the guide to see what this means exactly

if

you're not sure.

There's plenty more and some interesting links about
technical/documentation writing as well.

Best,
Aljoscha






--
Best, Jingsong Lee












Re: [DISCUSS] Drop connectors for Elasticsearch 2.x and 5.x

2020-02-18 Thread Aljoscha Krettek
Wouldn't removing the ES 2.x connector be enough because we can then 
update the ES 5.x connector? It seems there are some users that still 
want to use that one.


Best,
Aljoscha

On 18.02.20 10:42, Robert Metzger wrote:

The ES5 connector is causing some problems on the CI system. It would be
nice if we could make a decision here soon. I don't want to invest time
into fixing it, if we are going to remove it.

I'm still in favor of removing it. If we see that there's demand for the
5.x connector after the 1.11 release, somebody can take the source and
contribute it to Apache Bahir or a GitHub account and then posts it to
flink-packages.org.


On Thu, Feb 13, 2020 at 3:34 PM Dawid Wysakowicz 
wrote:


Sorry for late reply,

@all I think there is a general consensus that we want to drop ES 2.x
support. I created https://issues.apache.org/jira/browse/FLINK-16046 to
track it.


@Stephan @Chesnay @Itamar In our connectors we use Java High Level Rest
Client. ES promises to maintain compatibility of it with any newer minor
version of ES. So if we have 6.1 client we can use it with any 6.2, 6.3
etc.

ES provides also a low level rest client which does not include any
direct es dependencies and can work with any version of ES. It does not
provide any marshalling unmarshalling or higher level features as
Chesnay said.

Correct me if I am wrong @Itamar but your HTTP client is a simplified
version of the ES's high level rest client with a subset of its
features. I think it will still have the same problems as ES's High
Level Rest Client's because ES does not guarantee that newer message
formats will be compatible with older versions of ES or that message
formats are compatible across major versions at all.


@Stephan @Danny As for the 5.x connector. Any ideas how can we get
user's feedback about it? I cross posted on the user mailing list with
no luck so far. Personally I would be in favor of dropping the
connector. Worst case scenario users still have the possibility of
building the connector themselves from source with just bumping the
flink's versions. As far as I can tell there were no changes to the code
base for quite some time.

Best,

Dawid

On 11/02/2020 10:46, Chesnay Schepler wrote:

I suppose the downside in an HTTP ES sink is that you don't get _any_
form of high-level API from ES, and we'd have to manually build an
HTTP request that matches the ES format. Of course you also lose any
client-side verification that the clients did, if there is any (but I
guess the API itself prevented certain errors).

On 11/02/2020 09:32, Stephan Ewen wrote:

+1 to drop ES 2.x - unsure about 5.x (makes sense to get more user input
for that one).

@Itamar - if you would be interested in contributing a "universal" or
"cross version" ES connector, that could be very interesting. Do you
know
if there are known performance issues or feature restrictions with that
approach?
@dawid what do you think about that?


On Tue, Feb 11, 2020 at 6:28 AM Danny Chan 

wrote:



5.x seems to have a lot of users, is the 6.x completely compatible with
5.x ~

Best,
Danny Chan
在 2020年2月10日 +0800 PM9:45,Dawid Wysakowicz
,写道:

Hi all,

As described in this

https://issues.apache.org/jira/browse/FLINK-11720

ticket our elasticsearch 5.x connector does not work out of the box on
some systems and requires a version bump. This also happens for our
e2e.
We cannot bump the version in es 5.x connector, because 5.x connector
shares a common class with 2.x that uses an API that was replaced
in 5.2.

Both versions are already long eol:

https://www.elastic.co/support/eol


I suggest to drop both connectors 5.x and 2.x. If it is too much to
drop
both of them, I would strongly suggest dropping at least 2.x connector
and update the 5.x line to a working es client module.

What do you think? Should we drop both versions? Drop only the 2.x
connector? Or keep them both?

Best,

Dawid











Re: [DISCUSS] Improvements on FLIP Process

2020-02-18 Thread Aljoscha Krettek

Hi,

thanks for starting this discussion!

However, I have a somewhat opposing opinion to this: we should disallow 
using Google Docs for FLIPs and FLIP discussions and follow the already 
established process more strictly.


My reasons for this are:
 - discussions on the Google Doc are not reflected in Apache infrastructure
 - discussions on Google Docs are non-linear and hard to follow
 - when discussions on Google Docs are resolved these discussions are 
not visible/re-readable anymore (I know there's history, but meh)
 - if discussion is kept purely to the ML this is easily observable for 
any interested parties and it's there if somewhat want's to recheck the 
discussion in the future
 - going from Google Doc to Wiki is an extra step that seems 
unnecessary to me (but that's just personal opinion, I mean, I don't 
have to do the extra work here...)


Best,
Aljoscha

On 18.02.20 09:02, Hequn Cheng wrote:

Hi everyone,

Currently, when we create a FLIP we follow the FLIP process in the Flink
Improvement Proposals wiki[1]. The process mainly includes the following
steps:
1. Create a FLIP wiki page.
2. Raise the discussion on the mailing list.
3. Once the proposal is finalized, call a vote to have the proposal adopted.
There is also a discussion[2] on the FLIP process which may be helpful for
you.

As it is not allowed commented on the wiki, we usually have a google doc
for the discussion at step 2 and whenever there is a change we need to pick
it to the wiki page. This makes things somehow redundant. To solve this, we
can rearrange the step a little bit and avoid the pick:
1. Raise the discussion on the mailing list. The subject of the thread is
of the format [DISCUSS][FLIP] {your FLIP heading}. Also, the design doc
should follow the FLIP-Template strictly. (The [FLIP] tag is used to inform
people that it is a FLIP discussion and more attention should be paid.)
2. Create a FLIP wiki page once we reached an agreement on the discussion.
We can simply copy the google doc into the FLIP wiki page.
3. Once the proposal is finalized, call a vote to have the proposal
adopted. It should be noted that we should always vote on a FLIP wiki page
instead of a google doc. The wiki page is the final version of the google
doc.

This can bring some benefits:
1. Make the discussion more effective as we force people to write and
discuss on a google doc that follows the FLIP template which
includes necessary information such as Motivation, Interfaces, Proposed
changes, etc.
2. Avoid redundant pick from google doc to Flink wiki page. Once we reached
an agreement on the discussion, we can simply copy the google doc into the
FLIP wiki page.
3. As adopted FLIP should mostly be "immutable", we can even make the wiki
page PMC or committer editable since it just needs a simple copy from the
google doc.

Looking forward to your feedback!

Best,
Hequn

[1]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-META-FLIP-Sticking-or-not-to-a-strict-FLIP-voting-process-td29978.html#a29988



Re: Hotfixes on the master

2020-02-19 Thread Aljoscha Krettek
used
akka-testkit dependencies
dd34b050e8e7bd4b03ad0870a432b1631e1c0e9d [hotfix][build] Remove unused
shaded-asm7 dependencies

// dependency changes
244d2db78307cd7dff1c60a664046adb6fe5c405 [hotfix][web][build] Cleanup
dependencies

In my opinion, everything that is not a typo, a compile error (breaking

the

master) or something generated (like parts of the docs) should go

through a

quick pull request.
Why? I don't think many people review changes in the commit log in a way
they review pull request changes.

In addition to that, I propose to prefix hotfixes that have been added as
part of a ticket with that ticket number.
So instead of "[hotfix] Harden kubernetes test", we do
"[FLINK-13978][minor]
Harden kubernetes test".
Why? For people checking the commit history, it is much easier to see if

a

hotfix has been reviewed as part of a JIRA ticket review, or whether it

is

a "hotpush" hotfix.

For changes that are too small for a JIRA ticket, but need a review, I
propose to use the "[minor]" tag. A good example of such a change is

this:




https://github.com/apache/flink/commit/0dc4e767c9c48ac58430a59d05185f2b071f53f5


My tagging minor changes accordingly in the pull requests, it is also
easier for fellow committers to quickly check them.

Summary:
[FLINK-]: regular, reviewed change
[FLINK-][minor]: minor, unrelated changes reviewed with a regular
ticket
[minor]: minor, reviewed change
[hotfix]: unreviewed change that fixes a typo, compile error or something
generated


What's your opinion on this?


On Sat, May 28, 2016 at 1:36 PM Vasiliki Kalavri <
vasilikikala...@gmail.com>
wrote:


Hi all,

in principle I agree with Max. I personally avoid hotfixes and always

open

a PR, even for javadoc improvements.

I believe the main problem is that we don't have a clear definition of

what

constitutes a "hotfix". Ideally, even cosmetic changes and

documentation

should be reviewed; I've seen documentation added as a hotfix that had
spelling mistakes, which led to another hotfix... Using hotfixes to do
major refactoring or add features is absolutely unacceptable, in my

view.

On the other hand, with the current PR load it's not practical to ban
hotfixes all together.

I would suggest to update our contribution guidelines with some

definition

of a hotfix. We could add a list of questions to ask before pushing

one.

e.g.:
- does the change fix a spelling mistake in the docs? => hotfix
- does the change add a missing javadoc? => hotfix
- does the change improve a comment? => hotfix?
- is the change a small refactoring in a code component you are

maintainer

of? => hotfix
- did you change code in a component you are not very familiar with /

not

the maintainer of? => open PR
- is this major refactoring? (e.g. more than X lines of code) => open

PR

- does it fix a trivial bug? => open JIRA and PR

and so on...

What do you think?

Cheers,
-V.

On 27 May 2016 at 17:40, Greg Hogan  wrote:


Max,

I certainly agree that hotfixes are not ideal for large refactorings

and

new features. Some thoughts ...

A hotfix should be maven verified, as should a rebased PR. Travis is

often

backed up for half a day or more.

Is our Jira and PR process sufficiently agile to handle these

hotfixes?

Will committers simply include hotfixes with other PRs, and would it

be

better to retain these as smaller, separate commits?

For these cosmetic changes and small updates will the Jira and PR

yield

beneficial documentation addition to what is provided in the commit
message?

Counting hotfixes by contributor, the top of the list looks as I

would

expect.

Greg

Note: this summary is rather naive and includes non-squashed hotfix

commits

included in a PR
$ git shortlog --grep 'hotfix' -s -n release-0.9.0..
 94  Stephan Ewen
 42  Aljoscha Krettek
 20  Till Rohrmann
 16  Robert Metzger
 13  Ufuk Celebi
  9  Fabian Hueske
  9  Maximilian Michels
  6  Greg Hogan
  5  Stefano Baghino
  3  smarthi
  2  Andrea Sella
  2  Gyula Fora
  2  Jun Aoki
  2  Sachin Goel
  2  mjsax
  2  zentol
  1  Alexander Alexandrov
  1  Gabor Gevay
  1  Prez Cannady
  1  Steve Cosenza
  1  Suminda Dharmasena
  1  chengxiang li
  1  jaoki
  1  kl0u
  1  qingmeng.wyh
  1  sksamuel
  1  vasia

On Fri, May 27, 2016 at 6:10 AM, Maximilian Michels 
wrote:


Hi Flinksters,

I'd like to address an issue that has been concerning me for a

while.

In the Flink community we like to push "hotfixes" to the master.
Hotfixes come in various shapes: From very small cosmetic changes
(JavaDoc) to major refactoring and even new features.

IMHO we should move away from these hotfixes. Here's why:

1) They tend to break the master because they lack test coverage
2) They are usually not communicated with the maintainer or person
wo

Re: [DISCUSS] Kicking off the 1.11 release cycle

2020-02-19 Thread Aljoscha Krettek

+1

Although I would hope that it can be more than just "anticipated".

Best,
Aljoscha

On 19.02.20 15:40, Till Rohrmann wrote:

Thanks for volunteering as one of our release managers Zhijiang.

+1 for the *anticipated feature freeze date* end of April. As we go along
and collect more data points we might be able to strengthen our
initial anticipation.

Cheers,
Till

On Wed, Feb 19, 2020 at 4:44 AM Zhijiang 
wrote:


Thanks for kicking off the next release and the introduction, @Stephan!

It's my pleasure to become the release manager and involve in other
community works. I am working together with @Piotr for a long time,  so
very happy to cooperate for the release manager work again. The previous
release work was always well done, and I can learn a lot from these rich
experiences.

+1 for the "feature freeze date" around end of April.
  Although we have the FF SF in the meantime, fortunately there are no long
public holidays during this period in China.

Best,
Zhijiang


--
From:Stephan Ewen 
Send Time:2020 Feb. 19 (Wed.) 01:15
To:dev 
Cc:zhijiang ; pnowojski 
Subject:[DISCUSS] Kicking off the 1.11 release cycle

Hi all!

Now that the 1.10 release is out (congrats again, everyone!), I wanted to
bring up some questions about organizing the next release cycle.

The most important questions at the beginning would be
   - Who would volunteer as Release Managers
   - When would be the release date.

For the release managers, Piotrek and Zhijiang have mentioned previously
that they would be interested, so I am copying them here to chime in.
@Piotr and @Zhijiang could you confirm if you are interested in helping
out with this?

About the release date: By our original "3 months release cycle"
assumption, we should aim for a release **Mid May**, meaning a feature
freeze no later than end of April.
That would be indeed a shorter release cycle than 1.10, and the assumption
of a shorter testing period. But aiming for a shorter release cycle than
1.10 would actually be nice, in my opinion. 1.10 grew very big in the end,
which caused also a very long testing period (plus Christmas and Chinese
New Year are also partially to blame).

The exact feature freeze date is anyways a community discussion later, but
what do you think about starting with an "anticipated feature freeze date"
around end of April, so that committers, contributors, and users know
roughly what to expect?

Best,
Stephan







[DISCUSS] Extend (or maintain) "shell" script support for Windows

2020-02-19 Thread Aljoscha Krettek

Hi,

the background is this series of Jira Issues and PRs around extending 
the .bat scripts for windows: 
https://issues.apache.org/jira/browse/FLINK-5333.


I would like to resolve this, by either closing the Jira Issues as 
"Won't Do" or finally merging these PRs. The questions I have are:


 - Should we add more full-featured (complicated?) windows scripts that 
are essentially re-implementations of our existing "unix" scripts?
 - Would windows users use these or would they, by now, use the linux 
subsystem for windows or cygwin?

 - Should we even remove the existing .bat scripts that we have?

Maintaining the windows scripts is hard because we only have one (I 
think, Chesnay) developer on windows and no CI for windows.


What do you think?

Best,
Aljoscha


Re: [PROPOSAL] Reverse the dependency from flink-streaming-java to flink-client

2020-03-05 Thread Aljoscha Krettek

Hi,

thanks for starting the discussion, Tison!

I'd like to fix this dependency mess rather sooner than later, but we do 
have to consider the fact that we are breaking the dependency setup of 
users. If they they only had a dependency on flink-streaming-java before 
but used classes from flink-clients they would have to explicitly add 
this dependency now.


Let's see what others think.

Best,
Aljoscha

On 05.03.20 02:53, tison wrote:

Hi devs,

Here is a proposal to reverse the dependency from flink-streaming-java to
flink-client, for a proper
module dependency graph. Since it changes current structure, it should be
discussed publicly.

The original idea comes from that flink-streaming-java acts as an API only
module just as what
we do in its batch companion flink-java. If a Flink user want to write a
minimum DataStream
program, the only dependency should be flink-streaming java.

However, currently as it is implemented, flink-client and even
flink-runtime are transitively polluted
in when user depends on flink-streaming-java. These dependencies polluted
in as

flink-client:
   - previously, ClusterClient, which is removed by FLIP-73 Executors
   - accidentally, ProgramInvocationException, we just throw in place as it
is accessible.
   - transitively, flink-optimizer, for one utility.
   - transitively, flink-java, for several utilities.
flink-runtime:
   - mainly for JobGraph generating.

With a previous discussion with @Aljoscha Krettek  our
goal is briefly making flink-streaming-java
an API only module. As a first step we can break the dependency from
flink-streaming-java to
flink-client[1][2].

With this first step, continuously we factor out common utilities in
flink-java to
flink-core and eventually eliminate dependencies from streaming to batch;
while
orthogonally, we factor out job compilation logic into
flink-streaming-compiler module and
break the dependency to flink-runtime. The final dependency graph will be:


flink-client -> flink-streaming-compiler -> flink-runtime
  \->
flink-streaming-java

Looking forward to your feedback. Basically whether or not it is in a right
direction, and if so,
how the community integrates this proposal.

Best,
tison.

[1] https://issues.apache.org/jira/browse/FLINK-15090
[2] https://issues.apache.org/jira/browse/FLINK-16427



Re: [DISCUSS] Disable "Squash and merge" button for Flink repository on GitHub

2020-03-06 Thread Aljoscha Krettek
If there is a noreply email address that could be on purpose. This 
happens when you configure github to not show your real e-mail address. 
This also happens when contributors open a PR and don't want to show 
their real e-mail address. I talked to at least 1 person that had it set 
up like this on purpose.


Best,
Aljoscha

On 05.03.20 17:37, Stephan Ewen wrote:

It looks like this feature still messes up email addresses, for example if
you do a "git log | grep noreply" in the repo.

Don't most PRs consist anyways of multiple commits where we want to
preserve "refactor" and "feature" differentiation in the history, rather
than squash everything?

On Thu, Mar 5, 2020 at 4:54 PM Piotr Nowojski  wrote:


Hi,

If it’s really not preserving ownership (I didn’t notice the problem
before), +1 for removing “squash and merge”.

However -1 for removing “rebase and merge”. I didn’t see any issues with
it and I’m using it constantly.

Piotrek


On 5 Mar 2020, at 16:40, Jark Wu  wrote:

Hi all,

Thanks for the feedbacks. But I want to clarify the motivation to disable
"Squash and merge" is just because of the regression/bug of the missing
author information.
If GitHub fixes this later, I think it makes sense to bring this button
back.

Hi Stephan & Zhijiang,

To be honest, I love the "Squash and merge" button and often use it. It
saves me a lot of time to merge PRs, because pulling and pushing commits

in

China is very unstable.

I don't think the potential problems you mentioned is a "problem".
For "Squash and merge",
- "Merge commits": there is no "merge" commits, because GitHub will

squash

commits and rebase the commit and then add to the master branch.
- "This closes #" line to track back: when you click "Squash and
merge", it allows you to edit the title and description, so you can
add "This closes #" message to the description the same with in the
local git. Besides, GitHub automatically append "(#)" after the

title,

which is also helpful to track.

Best,
Jark

On Thu, 5 Mar 2020 at 23:36, Robert Metzger  wrote:


+1 for disabling this feature for now.

Thanks a lot for spotting this!

On Thu, Mar 5, 2020 at 3:54 PM Zhijiang 
wrote:


+1 for disabling "Squash and merge" if feasible to do that.

The possible benefit to use this button is for saving some efforts to
squash some intermediate "[fixup]" commits during PR review.
But it would bring more potential problems as mentioned below, missing
author information and message of "This closes #", etc.
Even it might cause unexpected format of long commit content

description

if not handled carefully in the text box.

Best,
Zhijiang


--
From:tison 
Send Time:2020 Mar. 5 (Thu.) 21:34
To:dev 
Subject:Re: [DISCUSS] Disable "Squash and merge" button for Flink
repository on GitHub

Hi Yadong,

Maybe we firstly reach out INFRA team and see the reply from their

side.


Since the actual operator is INFRA team, in the dev mailing list we can
focus on motivation and
wait for the reply.

Best,
tison.


Yadong Xie  于2020年3月5日周四 下午9:29写道:


Hi Jark

I think GitHub UI can not disable both the "Squash and merge" button

and

"Rebase and merge" at the same time if there exists any protected

branch

in

the repository(according to github rules).

If we only left "merge and commits" button, it will against requiring

a

linear commit history rules here







https://help.github.com/en/github/administering-a-repository/requiring-a-linear-commit-history


tison  于2020年3月5日周四 下午9:04写道:


For implement it, file a JIRA ticket in INFRA [1]

Best,
tison.
[1] https://issues.apache.org/jira/projects/INFRA


Stephan Ewen  于2020年3月5日周四 下午8:57写道:


Big +1 to disable it.

I have never been a fan, it has always caused problems:
  - Merge commits
  - weird alias emails
  - lost author information
  - commit message misses the "This closes #" line to track

back

commits to PRs/reviews.

The button goes against best practice, it should go away.

Best,
Stephan


On Thu, Mar 5, 2020 at 1:51 PM Yadong Xie 

wrote:



Hi Jark
There is a conversation about this here:













https://github.community/t5/How-to-use-Git-and-GitHub/Authorship-of-merge-commits-made-by-Github-Apps-changed/td-p/48797

I think GitHub will fix it soon, it is a bug, not a feature :).

Jingsong Li  于2020年3月5日周四 下午8:32写道:


Thanks for deep investigation.

+1 to disable "Squash and merge" button now.
But I think this is a very serious problem, It affects too many

GitHub

workers. Github should deal with it quickly?

Best,
Jingsong Lee

On Thu, Mar 5, 2020 at 7:21 PM Xingbo Huang <

hxbks...@gmail.com>

wrote:



Hi Jark,

Thanks for bringing up this discussion. Good catch. Agree

that

we

can

disable "Squash and merge"(also the other buttons) for now.

There is a guideline on how to do that in

















https://help.github.com/en/github/administering-a-repository/configuring-commit-squashing-for-pull-requests

.

Best,
Xingbo

Jark Wu  于2020年3月5日周

Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

2020-03-09 Thread Aljoscha Krettek

> For the -R flag, this was in the PoC that I published just as a quick
> implementation, so that I can move fast to the entrypoint part.
> Personally, I would not even be against having a separate command in
> the CLI for this, sth like run-on-cluster or something along those
> lines.
> What do you think?

I would be in favour of something like "bin/flink run-application", 
maybe we should even have "run-job" in the future to differentiate.


> For fetching jars, in the FLIP we say that as a first implementation
> we can have Local and DFS. I was wondering if in the case of YARN,
> both could be somehow implemented
> using LocalResources, and let Yarn do the actual fetch. But I have not
> investigated it further. Do you have any opinion on this?

By now I'm 99 % sure that we should use YARN for that, i.e. use 
LocalResource. Then YARN does the fetching. This is also how the current 
per-job cluster deployment does it, the Flink code uploads local files 
to (H)DFS and then sets the remote paths as a local resource that the 
entrypoint then uses.


Best,
Aljoscha


Re: [PROPOSAL] Reverse the dependency from flink-streaming-java to flink-client

2020-03-09 Thread Aljoscha Krettek

On 09.03.20 03:15, tison wrote:


So far, there is a PR[1] that implements the proposal in this thread.

I look forward to your reviews or start a vote if required.


Nice, I'll try and get to review that this week.

Best,
Aljoscha


Re: [DISCUSS] Extend (or maintain) "shell" script support for Windows

2020-03-10 Thread Aljoscha Krettek
Since there was no-one that said we should keep the windows scripts and 
no-one responded on the user ML thread I'll close the Jira issues/PRs 
about extending the scripts.


Aljoscha


Re: [DISCUSS] FLIP-85: Delayed Job Graph Generation

2020-03-10 Thread Aljoscha Krettek

On 10.03.20 03:31, Yang Wang wrote:

For the "run-job", do you mean to submit a Flink job to an existing session
or
just like the current per-job to start a dedicated Flink cluster? Then will
"flink run" be deprecated?


I was talking about the per-job mode that starts a dedicated Flink 
cluster. This was more thinking about the future but it might make sense 
to separate these modes more. "flink run" would then only be used for 
submitting to a session cluster, on standalone or K8s or whatnot.



On Yarn deployment, we could register the local or HDFS jar/files
as LocalResource.
And let Yarn to localize the resource to workdir, when the entrypoint is
launched, all
the jars and dependencies exist locally. So the entrypoint will *NOT* do
the real fetching,
do i understand correctly?


Yes, this is exactly what I meant.

Best,
Aljoscha


Re: [DISCUSS] Extend (or maintain) "shell" script support for Windows

2020-03-10 Thread Aljoscha Krettek

On 10.03.20 14:35, Robert Metzger wrote:

I'm wondering whether we should file a ticket to remove the *.bat files in
bin/ ?


We can leave them there because they're not doing much harm, and 
removing them might actively break some existing setup.


Best,
Aljoscha



Re: [Discussion] Job generation / submission hooks & Atlas integration

2020-03-11 Thread Aljoscha Krettek

Thanks! I'm reading the document now and will get back to you.

Best,
Aljoscha


Re: [DISCUSS]FLIP-113: Support SQL and planner hints

2020-03-11 Thread Aljoscha Krettek

Hi,

I don't understand this discussion. Hints, as I understand them, should 
work like this:


- hints are *optional* advice for the optimizer to try and help it to 
find a good execution strategy
- hints should not change query semantics, i.e. they should not change 
connector properties executing a query with taking into account the 
hints *must* produce the same result as executing the query without 
taking into account the hints


From these simple requirements you can derive a solution that makes 
sense. I don't have a strong preference for the syntax but we should 
strive to be in line with prior work.


Best,
Aljoscha

On 11.03.20 11:53, Danny Chan wrote:

Thanks Timo for summarize the 3 options ~

I agree with Kurt that option2 is too complicated to use because:

• As a Kafka topic consumer, the user must define both the virtual column for 
start offset and he must apply a special filter predicate after each query
• And for the internal implementation, the metadata column push down is another 
hard topic, each kind of message queue may have its offset attribute, we need 
to consider the expression type for different kind; the source also need to 
recognize the constant column as a config option(which is weird because usually 
what we pushed down is a table column)

For option 1 and option3, I think there is no difference, option1 is also a 
hint syntax which is introduced in Sybase and referenced then deprecated by 
MS-SQL in 199X years because of the ambitiousness. Personally I prefer /*+ */ 
style table hint than WITH keyword for these reasons:

• We do not break the standard SQL, the hints are nested in SQL comments
• We do not need to introduce additional WITH keyword which may appear in a 
query if we use that because a table can be referenced in all kinds of SQL 
contexts: INSERT/DELETE/FROM/JOIN …. That would make our sql query break too 
much of the SQL from standard
• We would have uniform syntax for hints as query hint, one syntax fits all and 
more easy to use


And here is the reason why we choose a uniform Oracle style query hint syntax 
which is addressed by Julian Hyde when we design the syntax from the Calcite 
community:

I don’t much like the MSSQL-style syntax for table hints. It adds a new use of 
the WITH keyword that is unrelated to the use of WITH for common-table 
expressions.

A historical note. Microsoft SQL Server inherited its hint syntax from Sybase a 
very long time ago. (See “Transact SQL Programming”[1], page 632, “Optimizer 
hints”. The book was written in 1999, and covers Microsoft SQL Server 6.5 / 7.0 
and Sybase Adaptive Server 11.5, but the syntax very likely predates Sybase 
4.3, from which Microsoft SQL Server was forked in 1993.)

Microsoft later added the WITH keyword to make it less ambiguous, and has now 
deprecated the syntax that does not use WITH.

They are forced to keep the syntax for backwards compatibility but that doesn’t 
mean that we should shoulder their burden.

I think formatted comments are the right container for hints because it allows 
us to change the hint syntax without changing the SQL parser, and makes clear 
that we are at liberty to ignore hints entirely.

Julian

[1] https://www.amazon.com/s?k=9781565924017 


Best,
Danny Chan
在 2020年3月11日 +0800 PM4:03,Timo Walther ,写道:

Hi Danny,

it is true that our DDL is not standard compliant by using the WITH
clause. Nevertheless, we aim for not diverging too much and the LIKE
clause is an example of that. It will solve things like overwriting
WATERMARKs, add additional/modifying properties and inherit schema.

Bowen is right that Flink's DDL is mixing 3 types definition together.
We are not the first ones that try to solve this. There is also the SQL
MED standard [1] that tried to tackle this problem. I think it was not
considered when designing the current DDL.

Currently, I see 3 options for handling Kafka offsets. I will give some
examples and look forward to feedback here:

*Option 1* Runtime and semantic parms as part of the query

`SELECT * FROM MyTable('offset'=123)`

Pros:
- Easy to add
- Parameters are part of the main query
- No complicated hinting syntax

Cons:
- Not SQL compliant

*Option 2* Use metadata in query

`CREATE TABLE MyTable (id INT, offset AS SYSTEM_METADATA('offset'))`

`SELECT * FROM MyTable WHERE offset > TIMESTAMP '2012-12-12 12:34:22'`

Pros:
- SQL compliant in the query
- Access of metadata in the DDL which is required anyway
- Regular pushdown rules apply

Cons:
- Users need to add an additional comlumn in the DDL

*Option 3*: Use hints for properties

`
SELECT *
FROM MyTable /*+ PROPERTIES('offset'=123) */
`

Pros:
- Easy to add

Cons:
- Parameters are not part of the main query
- Cryptic syntax for new users
- Not standard compliant.

If we go with this option, I would suggest to make it available in a
separate map and don't mix it with statically defined properties. Such
that the factory can decide which properties 

Re: Flink Kafka consumer auto-commit timeout

2020-03-11 Thread Aljoscha Krettek

On 09.03.20 06:10, Rong Rong wrote:

- Is this feature (disabling checkpoint and restarting job from Kafka
committed GROUP_OFFSET) not supported?


I believe the Flink community never put much (any?) effort into this 
because the Flink Kafka Consumer does its own offset handling. Starting 
from the committed offsets should work fine, though, the default startup 
mode is even StartupMode.GROUP_OFFSETS.



- How does Flink-Kafka actually handles auto-commit to coordinator given
the fact that Flink ignores the coordinator assignments and uses
self-assigning partitions instead?


I think we don't do anything for this case, the Kafka Consumer code will 
do the committing if 'enable.auto.commit' is set. I don't know how this 
will play with out code because we disable the automatic group handling.


Do you think letting Kafka do the auto committing is ever useful, if you 
have a Flink job that does checkpoints you will get the correct offset 
committing and you can start a job from the committed offsets. In what 
cases would you want to use the builtin Kafka offset committing?


Best,
Aljoscha


Re: [VOTE] [FLIP-85] Flink Application Mode

2020-03-12 Thread Aljoscha Krettek

+1 (binding)

Aljoscha


Re: [VOTE] [FLIP-85] Flink Application Mode

2020-03-12 Thread Aljoscha Krettek

On 12.03.20 14:05, Flavio Pompermaier wrote:


There's also a related issue that I opened a long time ago
https://issues.apache.org/jira/browse/FLINK-10879 that could be closed once
implemented this FLIP (or closed immediately and referenced as duplicated
by the new JIRA ticket that would be created


Thanks for the pointer! I'm closing that for now.



Re: [DISCUSS] Drop Bucketing Sink

2020-03-12 Thread Aljoscha Krettek

+1

Aljoscha


Re: Flink Kafka consumer auto-commit timeout

2020-03-13 Thread Aljoscha Krettek

Thanks for the update!

On 13.03.20 13:47, Rong Rong wrote:

1. I think we have finally pinpointed what the root cause to this issue is:
When partitions are assigned manually (e.g. with assign() API instead
subscribe() API) the client will not try to rediscover the coordinator if
it dies [1]. This seems to no longer be an issue after Kafka 1.1.0
After cherry-picking the PR[2] back to Kafka 0.11.x branch and package it
with our Flink application, we haven't seen this issue re-occurred so far.


So the solution to this thread is: we don't do anything because it is a 
Kafka bug that was fixed?



2. The GROUP_OFFSETS is in fact the default startup mode if Checkpoint is
not enable - that's why I was a bit surprise that this problem was reported
so many times.
To follow up on the question "whether resuming from GROUP_OFFSETS are
useful": there are definitely use cases where users don't want to use
checkpointing (e.g. due to resource constraint, storage cost consideration,
etc), but somehow still want to avoid a certain amount of data loss. Most
of our analytics use cases falls into this category.


Yes, this is what I assumed. I was not suggesting to remove the feature. 
We also just leave it as is, right?


Best,
Aljoscha


Re: PackagedProgram and ProgramDescription

2020-03-30 Thread Aljoscha Krettek

On 18.03.20 14:45, Flavio Pompermaier wrote:

what do you think if we exploit this job-submission sprint to address also
the problem discussed in https://issues.apache.org/jira/browse/FLINK-10862?


That's a good idea! What should we do? It seems that most committers on 
the issue were in favour of deprecating/removing ProgramDescription.


Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-01 Thread Aljoscha Krettek

Agreed to what Dawid and Timo said.

To answer your question about multi line SQL: no, we don't think we need 
this in Flink 1.11, we only wanted to make sure that the interfaces that 
we now put in place will potentially allow this in the future.


Best,
Aljoscha

On 01.04.20 09:31, godfrey he wrote:

Hi, Timo & Dawid,

Thanks so much for the effort of `multiline statements supporting`,
I have a few questions about this method:

1. users can well control the execution logic through the proposed method
  if they know what the statements are (a statement is a DDL, a DML or
others).
but if a statement is from a file, that means users do not know what the
statements are,
the execution behavior is unclear.
As a platform user, I think this method is hard to use, unless the platform
defines
a set of rule about the statements order, such as: no select in the middle,
dml must be at tail of sql file (which may be the most case in product
env).
Otherwise the platform must parse the sql first, then know what the
statements are.
If do like that, the platform can handle all cases through `executeSql` and
`StatementSet`.

2. SQL client can't also use `executeMultilineSql` to supports multiline
statements,
  because there are some special commands introduced in SQL client,
such as `quit`, `source`, `load jar` (not exist now, but maybe we need this
command
  to support dynamic table source and udf).
Does TableEnvironment also supports those commands?

3. btw, we must have this feature in release-1.11? I find there are few
user cases
  in the feedback document which behavior is unclear now.

regarding to "change the return value from `Iterable 于2020年4月1日周三 上午3:14写道:


Thank you Timo for the great summary! It covers (almost) all the topics.
Even though in the end we are not suggesting much changes to the current
state of FLIP I think it is important to lay out all possible use cases
so that we do not change the execution model every release.

There is one additional thing we discussed. Could we change the result
type of TableResult#collect to Iterator? Even though those
interfaces do not differ much. I think Iterator better describes that
the results might not be materialized on the client side, but can be
retrieved on a per record basis. The contract of the Iterable#iterator
is that it returns a new iterator each time, which effectively means we
can iterate the results multiple times. Iterating the results is not
possible when we don't retrieve all the results from the cluster at once.

I think we should also use Iterator for
TableEnvironment#executeMultilineSql(String statements):
Iterator.

Best,

Dawid

On 31/03/2020 19:27, Timo Walther wrote:

Hi Godfrey,

Aljoscha, Dawid, Klou, and I had another discussion around FLIP-84. In
particular, we discussed how the current status of the FLIP and the
future requirements around multiline statements, async/sync, collect()
fit together.

We also updated the FLIP-84 Feedback Summary document [1] with some
use cases.

We believe that we found a good solution that also fits to what is in
the current FLIP. So no bigger changes necessary, which is great!

Our findings were:

1. Async vs sync submission of Flink jobs:

Having a blocking `execute()` in DataStream API was rather a mistake.
Instead all submissions should be async because this allows supporting
both modes if necessary. Thus, submitting all queries async sounds
good to us. If users want to run a job sync, they can use the
JobClient and wait for completion (or collect() in case of batch jobs).

2. Multi-statement execution:

For the multi-statement execution, we don't see a contradication with
the async execution behavior. We imagine a method like:

TableEnvironment#executeMultilineSql(String statements):
Iterable

Where the `Iterator#next()` method would trigger the next statement
submission. This allows a caller to decide synchronously when to
submit statements async to the cluster. Thus, a service such as the
SQL Client can handle the result of each statement individually and
process statement by statement sequentially.

3. The role of TableResult and result retrieval in general

`TableResult` is similar to `JobClient`. Instead of returning a
`CompletableFuture` of something, it is a concrete util class where
some methods have the behavior of completable future (e.g. collect(),
print()) and some are already completed (getTableSchema(),
getResultKind()).

`StatementSet#execute()` returns a single `TableResult` because the
order is undefined in a set and all statements have the same schema.
Its `collect()` will return a row for each executed `INSERT INTO` in
the order of statement definition.

For simple `SELECT * FROM ...`, the query execution might block until
`collect()` is called to pull buffered rows from the job (from
socket/REST API what ever we will use in the future). We can say that
a statement finished successfully, when the `collect#Iterator#hasNext`
has returned false.

I hope this summarizes our discussion @D

Re: [DISCUSS] Change default planner to blink planner in 1.11

2020-04-01 Thread Aljoscha Krettek
+1 to making Blink the default planner, we definitely don't want to 
maintain two planners for much longer.


Best,
Aljoscha


Re: [DISCUSS]FLIP-113: Support SQL and planner hints

2020-04-02 Thread Aljoscha Krettek
I think we're designing ourselves into ever more complicated corners 
here. Maybe we need to take a step back and reconsider. What would you 
think about this (somewhat) simpler proposal:


- introduce a hint called CONNECTOR_OPTIONS(k=v,...). or 
CONNECTOR_PROPERTIES, depending on what naming we want to have for this 
in the future. This will simply overwrite all connector properties, the 
table factories don't know about hints but simply work with the 
properties that they are given


- this special hint is disabled by default and can be activated with a 
global option "foo.bazzle.connector-hints" (or something like this) 
which has a warning that describes that this can change query semantics etc.


That's it. This makes connector implementations a lot easier while still 
allowing inline configuration.


I still don't like using hint syntax at all for this, because I strongly 
maintain that hints should not change query syntax. In general using 
hints should be kept to a minimum because they usually point to 
shortcomings in the system.


Best,
Aljoscha

On 02.04.20 06:06, Jingsong Li wrote:

Hi Dawid,


When a factory is instantiated it has access to the CatalogTable,

therefore it has access to all the original properties. In turn it knows
the original format and can call FormatFactory#supportedHintOptions().

Factory can only get CatalogTable when creating source or sink,
right? IIUC, TableFactory may be stateless too.
When invoking SourceFactory#supportedHintOptions(), it can not
get CatalogTable, so it is impossible to create FormatFactory?

Best,
Jingsong Lee

On Wed, Apr 1, 2020 at 10:05 PM Danny Chan  wrote:


Hi, Dawid, thanks so much for the detail suggestions ~

1. Regarding the motivation:

I agree it's not a good suggested way based on the fact that we have
better solution, but i think we can support override that as long as it
exists as one of the the table options. I would remove if from the
motication part.

2. The options passes around during sql-to-rel conversion, right after we
generate the RelOptTable (CatalogSourceTable#toRel in Flink), this is
indeed a push way method at least in the RelOptTable layer, then we hand
over the options to TableSourceFactory with our own context, which is fine
becuase TableSourceFactory#Context is the contact to pass around these
table-about variables.

3. "We should not end up with an extreme example where we can change the
connector type", i totally agree that, and i have listed the
"connector.type" as forbidden attribute in the WIKI. As for the format, i
think the connector itself can/should control whether to override the
"format.type", that is one of the reason i change the
TableSourceFactory#supportedHintOpitons to
TableSourceFactory#forbiddenHintOpitons, use a blacklist, we can limit the
format keys we want conveniently.

4. SQL Hints syntax.


I think the k and v in the hint_item should be QUOTED_STRING (not sure

if it is equivalent to string_literal).

I disagree, we at least should keep sync with our DDL: use the string
literal as the key. We did also support the simple identifier because this
is the common hint syntax from Calcite, it does not hurt anything for the
OPTIONS hint, the unsupported keys would validate fails.(If you think that
may cause some confuse, i can make the syntax pluggable for each hint in
CALCITE 1.23)

We only supports OPTIONS hint in the FLIP, and i have changed the title to
"Supports dynamic table options", would make it more clear in the WIKI.

5. Yes, we also have this concerns from our offline discussion, that is
one of the reason, why i change the TableSourceFactory#supportedHintOpitons
to TableSourceFactory#forbiddenHintOpitons, i do not choose Set
supportedHintOptionKeys() with wildcard for 2 reasons:

   - The wildcard is still not descriptive, we can still not forbidden one
of the properties among the wildcard properties, we can not enable or
disable them totally
   - ConfigOption is our new structure for keys, and it does not support
wildcard yet.

Best,
Danny Chan
在 2020年4月1日 +0800 PM9:12,Dawid Wysakowicz ,写道:

Hi,
Few comments from my side:
1. Regarding the motivation:
I think the example with changing the update-mode is not a good one. In

the long term this should be done with EMIT CHANGELOG (discussed in
FLIP-105).

Nitpicking: I would mention that it is rather for debugging/ad-hoc

solution. I think this should not be a recommended way for production use
cases as it bypasses the Catalog, which should be the source of truth.

2. I could not understand how the additional options will be passed to

the TableSourceFactory. Could you elaborate a bit more on that? I see there
is a Context interface that gives the options. But cannot find a way to get
the context itself in the factory. Moreover I think it would make more
sense to have rather a push based approach here. Something like
applyOptions(ReadableConfig) method.

3. As for the concerns Jingsong raised in the voting thread. I think it

is not a big pro

[PSA] Please report all occurrences of test failures

2020-04-03 Thread Aljoscha Krettek

Hi All,

we're currently struggling a bit with test stability, it seems 
especially on Azure. If you encounter a test failure in a PR or anywhere 
else, please take the time to check if there is already a Jira issue or 
create a new one. If there is already an Issue, please report the 
additional failure and also post a link to the log. This will help us 
gauge which test-stability issues are still relevant or which need the 
highest priority.


I usually use this JQL query to search for test failures:

 project = FLINK AND resolution = Unresolved AND summary ~ 
"FooBarTest*" ORDER BY priority DESC, updated DESC


I even have a Google Chrome search engine set up with that query and a 
placeholder:



https://issues.apache.org/jira/issues/?jql=project%20%3D%20FLINK%20AND%20resolution%20%3D%20%20Unresolved%20AND%20summary%20~%20%22%s%22%20ORDER%20BY%20priority%20DESC%2C%20updated%20DESC

You can set up a similar thing in Firefox and other browsers.

Best,
Aljoscha


Re: [VOTE] FLIP-110: Support LIKE clause in CREATE TABLE

2020-04-03 Thread Aljoscha Krettek

+1

Aljoscha


Re: [DISCUSS]FLIP-113: Support SQL and planner hints

2020-04-06 Thread Aljoscha Krettek
The reason I'm saying it should be disabled by default is that this uses 
hint syntax, and hints should really not change query semantics.


I'm quite strongly against hints that change query semantics, but if we 
disable this by default I would be (reluctantly) OK with the feature. 
Companies that create deployments or set up the SQL environment for 
users can enable the feature if they want.


But yes, I also agree that we don't need whitelisting/blacklisting, 
which makes this a lot easier to do.


Best,
Aljoscha

On 06.04.20 04:27, Danny Chan wrote:

Hi, everyone ~

@Aljoscha @Timo


I think we're designing ourselves into ever more complicated corners

here

I kindly agree that, personally didn't see strong reasons why we should limit 
on each connector properties:

• we can define any table options for CREATE TABLE, why we treat the dynamic 
options differently, we never consider any security problems when create table, 
we should not either for dynamic table options
• If we do not have whitelist properties or blacklist properties, the table 
source creation work would be much easier, just used the merged options. There 
is no need to modify each connector to decide which options could be overridden 
and how we merge them(the merge work is redundant).
• @Timo, how about we support another interface 
`TableSourceFactory#Context.getExecutionOptions`, we always use this interface 
to get the options to create our table source. There is no need to copy the 
catalog table itselt, we just need to generate our Context correctly.
• @Aljoscha I agree to have a global config option, but I disagree to default 
disable it, a global default config would break the user experience too much, 
especially when user want to modify the options in a ad-hoc way.



I suggest to remove `TableSourceFactory#supportedHintOptions` or 
`TableSourceFactory#forbiddenHintOptions` based on the fact that we does not 
have black/white list for CREATE TABLE at all at lease for current codebase.


@Timo (i have replied offline but allows to represent it here again)

The `TableSourceFactory#supportedHintOptions` doesn't work well for 3 reasons 
compared to `TableSourceFactory#forbiddenHintOptions`:
1. For key with wildcard, like connector.property.* , use a blacklist make us 
have the ability to disable some of the keys under that, i.e. 
connector.property.key1 , a whitelist can only match with prefix

2. We want the connectors to have the ability to disable format type switch 
format.type but allows all the other properties, e.g. format.* without 
format.type(let's call it SET_B), if we use the whitelist, we have to enumerate 
all the specific format keys start with format (SET_B), but with the old 
connector factories, we have no idea what specific format keys it 
supports(there is either a format.* or nothing).

3. Except the cases for 1 and 2, for normal keys(no wildcard), the blacklist 
and whitelist has the same expressiveness, use blacklist makes the code not too 
verbose to enumerate all the duplicate keys with #supportedKeys .(Not very 
strong reason, but i think as a connector developer, it makes sense)

Best,
Danny Chan
在 2020年4月3日 +0800 PM11:28,Timo Walther ,写道:

Hi everyone,

@Aljoscha: I disagree with your approach because a `Catalog` can return
a custom factory that is not using any properties. The hinting must be
transparent to a factory. We should NOT modify the metadata
`CatalogTable` at any point in time after the catalog.

@Danny, @Jingsong: How about we stick to the original design that we
wanted to vote on but use:

Set supportedHintProperties()

This fits better to the old factory design. And for the new FLIP-95
factories we will use `ConfigOption` and provide good utilities for
merging with hints etc.

We can allow `"format.*"` in `supportedHintProperties()` to allow
hinting in formats.

What do you think?

Regards,
Timo


On 02.04.20 16:24, Aljoscha Krettek wrote:

I think we're designing ourselves into ever more complicated corners
here. Maybe we need to take a step back and reconsider. What would you
think about this (somewhat) simpler proposal:

- introduce a hint called CONNECTOR_OPTIONS(k=v,...). or
CONNECTOR_PROPERTIES, depending on what naming we want to have for this
in the future. This will simply overwrite all connector properties, the
table factories don't know about hints but simply work with the
properties that they are given

- this special hint is disabled by default and can be activated with a
global option "foo.bazzle.connector-hints" (or something like this)
which has a warning that describes that this can change query semantics
etc.

That's it. This makes connector implementations a lot easier while still
allowing inline configuration.

I still don't like using hint syntax at all for this, because I strongly
maintain that hints should not change query syntax. In general using
hints should be kept to a minimum be

Re: [DISCUSS] FLIP-124: Add open/close and Collector to (De)SerializationSchema

2020-04-07 Thread Aljoscha Krettek

On 07.04.20 08:45, Dawid Wysakowicz wrote:


@Jark I was aware of the implementation of SinkFunction, but it was a
conscious choice to not do it that way.

Personally I am against giving a default implementation to both the new
and old methods. This results in an interface that by default does
nothing or notifies the user only in the runtime, that he/she has not
implemented a method of the interface, which does not sound like a good
practice to me. Moreover I believe the method without a Collector will
still be the preferred method by many users. Plus it communicates
explicitly what is the minimal functionality required by the interface.
Nevertheless I am happy to hear other opinions.


Dawid and I discussed this before. I did the extension of the 
SinkFunction but by now I think it's better to do it this way, because 
otherwise you can implement the interface without implementing any methods.



@all I also prefer the buffering approach. Let's wait a day or two more
to see if others think differently.


I'm also in favour of buffering outside the lock.

Also, +1 to this FLIP.

Best,
Aljoscha


Re: [DISCUSS]FLIP-113: Support SQL and planner hints

2020-04-08 Thread Aljoscha Krettek

On 08.04.20 04:27, Jark Wu wrote:


I have a minor concern about the global configuration
`table.optimizer.dynamic-table-options.enabled`, does it belong to
optimizer?
 From my point of view, it is just an API to set table options and uses
Calcite in the implementation.
I'm also thinking about what's the name of other configurations, e.g
time-zone, code-gen length, state ttl.
Should they prefix with "optimizer" or "exec" or something else or nothing?


I agree that this is probably not the right option name. Could we just 
have `table.dynamic-table-options.enabled`?


[PSA] Please check your github email configuration when merging on Github

2020-04-08 Thread Aljoscha Krettek

Hi Everyone,

we have a lot of commits recently that were committed by "GitHub 
". This happens when your GitHub account is not 
configured correctly with respect to your email address. Please make 
sure that your commits somehow show who is the committer.


For reference, check out 
https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/merging-a-pull-request. 
Especially point 5.


Best,
Aljoscha


Re: Configuring autolinks to Flink JIRA ticket in github repos

2020-04-09 Thread Aljoscha Krettek

That is very nice! Thanks for taking care of this ~3q

On 09.04.20 11:08, Dian Fu wrote:

Cool! Thanks Yun for this effort. Very useful feature.

Regards,
Dian


在 2020年4月9日,下午4:32,Yu Li  写道:

Great! Thanks for the efforts Yun.

Best Regards,
Yu


On Thu, 9 Apr 2020 at 16:15, Jark Wu  wrote:


Thanks Yun,

This's a great feature! I was surprised by the autolink feature yesterday
(didn't know your work at that time).

Best,
Jark

On Thu, 9 Apr 2020 at 16:12, Yun Tang  wrote:


Hi community

The autolink to Flink JIRA ticket has taken effect. You could refer to

the

commit details page[1] to see all Flink JIRA titles within commits has

the

hyper link underline. Moreover, you don't need to use markdown language

to

create hyper link to Flink JIRA ticket when discussing in the pull
requests. e.g FLINK-16850 could point to the link instead of

[FLINK-16850](

https://issues.apache.org/jira/browse/FLINK-16850)


[1] https://github.com/apache/flink/commits/master

Best
Yun Tang


From: Till Rohrmann 
Sent: Thursday, April 2, 2020 23:11
To: dev 
Subject: Re: Configuring autolinks to Flink JIRA ticket in github repos

Nice, this is a cool feature. Thanks for asking INFRA for it.

Cheers,
Till

On Wed, Apr 1, 2020 at 6:52 PM Yun Tang  wrote:


Hi community.

I noticed that Github supports autolink reference recently [1]. This is
helpful to allow developers could open Jira ticket link from pull

requests

title directly when accessing github repo.

I have already created INFRA-20055 [2] to ask for configuration for

seven

Flink related github repos. Hope it could be resolved soon 🙂


[1]




https://help.github.com/en/github/administering-a-repository/configuring-autolinks-to-reference-external-resources

[2] https://issues.apache.org/jira/browse/INFRA-20055

Best
Yun Tang











Re: [DISCUSS] FLIP-124: Add open/close and Collector to (De)SerializationSchema

2020-04-14 Thread Aljoscha Krettek

On 10.04.20 17:35, Jark Wu wrote:

1) For correctness, it is necessary to perform the watermark generation as
early as possible in order to be close to the actual data
  generation within a source's data partition. This is also the purpose of
per-partition watermark and event-time alignment.
  Many on going FLIPs (e.g. FLIP-27, FLIP-95) works a lot on this effort.
Deseriazing records and generating watermark in chained
  ProcessFunction makes it difficult to do per-partition watermark in the
future.


For me, this this the main reason for this, i.e. we need to extract the 
records in the source so that we can correctly generate per-partition 
watermarks.


Best,
Aljoscha


[DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-15 Thread Aljoscha Krettek

Hi Everyone,

I'd like to discuss about releasing a more full-featured Flink 
distribution. The motivation is that there is friction for SQL/Table API 
users that want to use Table connectors which are not there in the 
current Flink Distribution. For these users the workflow is currently 
roughly:


 - download Flink dist
 - configure csv/Kafka/json connectors per configuration
 - run SQL client or program
 - decrypt error message and research the solution
 - download additional connector jars
 - program works correctly

I realize that this can be made to work but if every SQL user has this 
as their first experience that doesn't seem good to me.


My proposal is to provide two versions of the Flink Distribution in the 
future: "fat" and "slim" (names to be discussed):


 - slim would be even trimmer than todays distribution
 - fat would contain a lot of convenience connectors (yet to be 
determined which one)


And yes, I realize that there are already more dimensions of Flink 
releases (Scala version and Java version).


For background, our current Flink dist has these in the opt directory:

 - flink-azure-fs-hadoop-1.10.0.jar
 - flink-cep-scala_2.12-1.10.0.jar
 - flink-cep_2.12-1.10.0.jar
 - flink-gelly-scala_2.12-1.10.0.jar
 - flink-gelly_2.12-1.10.0.jar
 - flink-metrics-datadog-1.10.0.jar
 - flink-metrics-graphite-1.10.0.jar
 - flink-metrics-influxdb-1.10.0.jar
 - flink-metrics-prometheus-1.10.0.jar
 - flink-metrics-slf4j-1.10.0.jar
 - flink-metrics-statsd-1.10.0.jar
 - flink-oss-fs-hadoop-1.10.0.jar
 - flink-python_2.12-1.10.0.jar
 - flink-queryable-state-runtime_2.12-1.10.0.jar
 - flink-s3-fs-hadoop-1.10.0.jar
 - flink-s3-fs-presto-1.10.0.jar
 - flink-shaded-netty-tcnative-dynamic-2.0.25.Final-9.0.jar
 - flink-sql-client_2.12-1.10.0.jar
 - flink-state-processor-api_2.12-1.10.0.jar
 - flink-swift-fs-hadoop-1.10.0.jar

Current Flink dist is 267M. If we removed everything from opt we would 
go down to 126M. I would reccomend this, because the large majority of 
the files in opt are probably unused.


What do you think?

Best,
Aljoscha



Re: [VOTE] FLIP-108: Add GPU support in Flink

2020-04-15 Thread Aljoscha Krettek
Is the only really new method on the public APIs 
getExternalResourceInfos(..) on the RuntimeContext? I'm generally quite 
skeptical about adding anything to that interface but the method seems ok.


Side note for the configuration keys: the pattern is similar to metrics 
configuration. There we have "metrics.reporters" = names> and then metrics.reporter Your proposal is 
slightly different in that it uses "external-resource.list". Keeping 
this in line with metrics configuration would suggest to use 
"external-resources", and then "external-resource". What 
do you think?


Also, why is there this long discussion in a [VOTE] thread?

Best,
Aljoscha

On 15.04.20 10:32, Yangze Guo wrote:

Thanks for the explanation. I do not have a strong opinion regarding
this interface. So, if it is better from your perspective, I'm +1 for
this. I just saying it may not help a lot regarding the type-safe.

Regarding the bounded wildcard type, yes, it's the implementation
detail. If it won't make a difference for user, I'm also +1 for not
using bounded wildcard type there.

Best,
Yangze Guo

On Wed, Apr 15, 2020 at 4:23 PM Till Rohrmann  wrote:


I think  Set
getExternalResourceInfos(String resourceName, Class
externalResourceType) is not less flexible than the other API since you can
always pass in ExternalResourceInfo.class as the second argument.

The benefit I see for the user is that he does not have to do the
instanceof checks and type casts himself. This is admittedly not a big deal
but still a better API imo.

I think the interface of the Driver and what is returned by the
RuntimeContext don't have to have the same type because you can cast it or
repack it. If the current implementation simply stores what the Driver
returns and RuntimeContext returns this map, then it might seem that there
is a connection. But this should be an implementation detail rather than a
necessity.

Maybe we could also pull in someone from the SDK team to give us his
opinion on the user facing API.

Cheers,
Till

On Wed, Apr 15, 2020 at 10:13 AM Xintong Song  wrote:



I agree that such an interface won't give compile time checks but I think
that it could be easier to use from a user's perspective because there is
no explicit casting required.
public interface RuntimeContext {
  Set

getExternalResourceInfos(String

resourceName, Class externalResourceType);
}



I'm not sure how less efforts is required from users to pass in a
`externalResourceType` compared to do an explicit type casting.
A potential side effect of passing in a `externalResourceType` is that, it
requires user (e.g. the operator) to know which specific type should be
returned in advance, which may limit the flexibility.

E.g., we might have an operator that can work with multiple different
implementations of `ExternalResourceInfo`. It may decide its behavior based
on the actually type returned by `getExternalResourceInfos` at runtime.


Thank you~

Xintong Song



On Wed, Apr 15, 2020 at 4:09 PM Yangze Guo  wrote:


@Till
If we add "Class externalResourceType" param, what if there are
multiple subtypes in the ExternalResourceInfos set of one external
resource? It seems user has to set the T to ExternalResourceInfo and
the mechanism is useless at this case.

Best,
Yangze Guo

On Wed, Apr 15, 2020 at 3:57 PM Till Rohrmann 
wrote:


Ok, if there can be multiple resources of the same type then we

definitely

need the name as a differentiator.

I agree that such an interface won't give compile time checks but I

think

that it could be easier to use from a user's perspective because there

is

no explicit casting required.

public interface RuntimeContext {
  Set

getExternalResourceInfos(String

resourceName, Class externalResourceType);
}

One minor note: I think the value of the returned map does not need to

use

a bounded wildcard type because for the user it won't make a

difference.


Cheers,
Till

On Wed, Apr 15, 2020 at 8:20 AM Yangze Guo  wrote:


Hi Till,


ExternalResourceDriver could return a Set
ExternalResourceInfo>.
It sounds good.


then one could make the interface type-safe by changing it to
public interface RuntimeContext {
 Set
getExternalResourceInfo(Class externalResourceType);
}

I think it may not help.
- I think the assumption of "there is always only one resource of a
specific type" is too strong. The external resource framework should
only assume it gets a set of ExternalResourceInfo from the driver.

The

concrete implementation is given by user. So, if we give such an
assumption, it would hurt the flexibility. There could be multiple
types in the returned externalResourceInfo set. There could also be
different types returned from different driver implementation or
version. The contract about the return type between Driver and
Operator should be guaranteed by user.
- Since the Drivers are loaded dynamically in runtime, if there is a
type mismatch, the job would fail in runtime instead of in compile
time, no matter the type extra

Re: [DISCUSS] Integration of training materials into Apache Flink

2020-04-16 Thread Aljoscha Krettek
I'd be very happy if someone took over that part of the documentation! 
There are open issues for the TODOs in the concepts section here: 
https://issues.apache.org/jira/browse/FLINK-12639. But feel free to 
comment there/close/re-arrange as you see fit. Maybe we use this thread 
and Jira to coordinate the efforts.


Aljoscha

On 16.04.20 10:54, David Anderson wrote:

I am happy to get the repo created for you.


Thank you, @seth. I think we are also going to want a new flink-training
component in Jira. Maybe you can help there too?


If we go with the documentation (vs flink.apache.org) do you think we
should remove any of the existing content? There is already a getting
started section with quickstarts and walkthroughs and a concepts section.
In particular, the concepts section today is not complete and almost every
page on master contains multiple TODOs.


I'll look at this, and also coordinate with what Aljoscha is doing there.
But yes, there is room for improvement in this part of the docs, so I'm
expecting to be able to help with that.

David

On Wed, Apr 15, 2020 at 9:20 PM Seth Wiesman  wrote:


Hi David,

I am happy to get the repo created for you.

If we go with the documentation (vs flink.apache.org) do you think we
should remove any of the existing content? There is already a getting
started section with quickstarts and walkthroughs and a concepts section.
In particular, the concepts section today is not complete and almost every
page on master contains multiple TODOs. I don't believe anyone is working
on these.  What do you think about replacing the current concepts section
with the training material? I just re-examined the training site and I
believe it covers the same material as concepts but better. Of course, we
would salvage anything worth keeping, like the glossary.

Seth

On Wed, Apr 15, 2020 at 2:02 PM David Anderson 
wrote:


Thank you all for the very positive response to our proposal to

contribute

the training materials that have been at training.ververica.com to the
Apache Flink project. Now I’d like to begin the more detailed discussion

of

how to go about this.

In that earlier thread I mentioned that we were thinking of merging the
markdown-based web pages into flink.apache.org, and to add the exercises
to
flink-playgrounds. This was based on thinking that it would be something

of

a maintenance headache to add the website content into the docs, where it
would have to be versioned.

Since then, a better approach has been suggested:

We already have quite a bit of “getting started” material in the docs:

Code

Walkthroughs, Docker Playgrounds, Tutorials, and Examples. Having a

second

location (namely flink.apache.org) where this kind of content could be
found doesn’t seem ideal. So let’s go ahead and add the expository

content

from the training materials to the documentation, with pointers into the
rest of the docs (which are already present in the training), and with
pointers to the exercises (rather than including the exercise

descriptions

in the docs). This will keep the content that will need more frequent
revision out of the documentation.

Then let’s create a new repo -- named flink-training -- that contains the
exercises, the solutions, and the tests that go with them, PLUS all of

the

material that describes how to get setup to do the exercises, the
explanations for each exercise, and accompanying discussion material that
should be read after doing each exercise. Note that the exercise

solutions

already have tests, and Travis is already being used for CI on the

existing

project, so CI shouldn’t be an issue.

Action Item: would a committer or PMC member kindly volunteer to help

with

creating this new flink-training repo?

With the content refactored in this way, I believe ongoing maintenance
won’t be much trouble. With each new Flink release I’ve been updating the
exercises to build against the latest release, and to avoid any newly
deprecated parts of the API. But since these exercises are focused on the
most basic parts of the API, that hasn’t been difficult. As for content
from the training website that would move into the docs, this content is
much more stable, and has only needed a gentle revision every year or

two.


Regards,
David









Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-16 Thread Aljoscha Krettek
cted" use-cases.
Then we might finally address this issue properly, i.e., tooling to
assemble custom distributions and/or better error messages if
Flink-provided extensions cannot be found.

On 15/04/2020 15:23, Kurt Young wrote:

Regarding to the specific solution, I'm not sure about the "fat" and

"slim"

solution though. I get the idea
that we can make the slim one even more lightweight than current
distribution, but what about the "fat"
one? Do you mean that we would package all connectors and formats

into

this? I'm not sure if this is
feasible. For example, we can't put all versions of kafka and hive
connector jars into lib directory, and
we also might need hadoop jars when using filesystem connector to

access

data from HDFS.

So my guess would be we might hand-pick some of the most frequently

used

connectors and formats
into our "lib" directory, like kafka, csv, json metioned above, and

still

leave some other connectors out of it.
If this is the case, then why not we just provide this distribution

to

user? I'm not sure i get the benefit of
providing another super "slim" jar (we have to pay some costs to

provide

another suit of distribution).

What do you think?

Best,
Kurt


On Wed, Apr 15, 2020 at 7:08 PM Jingsong Li 

wrote:



Big +1.

I like "fat" and "slim".

For csv and json, like Jark said, they are quite small and don't

have

other

dependencies. They are important to kafka connector, and important
to upcoming file system connector too.
So can we move them to both "fat" and "slim"? They're so important,

and

they're so lightweight.

Best,
Jingsong Lee

On Wed, Apr 15, 2020 at 4:53 PM godfrey he 

wrote:



Big +1.
This will improve user experience (special for Flink new users).
We answered so many questions about "class not found".

Best,
Godfrey

Dian Fu  于2020年4月15日周三 下午4:30写道:


+1 to this proposal.

Missing connector jars is also a big problem for PyFlink users.

Currently,

after a Python user has installed PyFlink using `pip`, he has to

manually

copy the connector fat jars to the PyFlink installation directory

for

the

connectors to be used if he wants to run jobs locally. This

process

is

very

confuse for users and affects the experience a lot.

Regards,
Dian


在 2020年4月15日,下午3:51,Jark Wu  写道:

+1 to the proposal. I also found the "download additional jar"

step

is

really verbose when I prepare webinars.

At least, I think the flink-csv and flink-json should in the

distribution,

they are quite small and don't have other dependencies.

Best,
Jark

On Wed, 15 Apr 2020 at 15:44, Jeff Zhang 

wrote:



Hi Aljoscha,

Big +1 for the fat flink distribution, where do you plan to put

these

connectors ? opt or lib ?

Aljoscha Krettek  于2020年4月15日周三 下午3:30写道:


Hi Everyone,

I'd like to discuss about releasing a more full-featured Flink
distribution. The motivation is that there is friction for

SQL/Table

API

users that want to use Table connectors which are not there in

the

current Flink Distribution. For these users the workflow is

currently

roughly:

   - download Flink dist
   - configure csv/Kafka/json connectors per configuration
   - run SQL client or program
   - decrypt error message and research the solution
   - download additional connector jars
   - program works correctly

I realize that this can be made to work but if every SQL user

has

this

as their first experience that doesn't seem good to me.

My proposal is to provide two versions of the Flink

Distribution

in

the

future: "fat" and "slim" (names to be discussed):

   - slim would be even trimmer than todays distribution
   - fat would contain a lot of convenience connectors (yet to

be

determined which one)

And yes, I realize that there are already more dimensions of

Flink

releases (Scala version and Java version).

For background, our current Flink dist has these in the opt

directory:

   - flink-azure-fs-hadoop-1.10.0.jar
   - flink-cep-scala_2.12-1.10.0.jar
   - flink-cep_2.12-1.10.0.jar
   - flink-gelly-scala_2.12-1.10.0.jar
   - flink-gelly_2.12-1.10.0.jar
   - flink-metrics-datadog-1.10.0.jar
   - flink-metrics-graphite-1.10.0.jar
   - flink-metrics-influxdb-1.10.0.jar
   - flink-metrics-prometheus-1.10.0.jar
   - flink-metrics-slf4j-1.10.0.jar
   - flink-metrics-statsd-1.10.0.jar
   - flink-oss-fs-hadoop-1.10.0.jar
   - flink-python_2.12-1.10.0.jar
   - flink-queryable-state-runtime_2.12-1.10.0.jar
   - flink-s3-fs-hadoop-1.10.0.jar
   - flink-s3-fs-presto-1.10.0.jar
   - flink-shaded-netty-tcnative-dynamic-2.0.25.Final-9.0.jar
   - flink-sql-client_2.12-1.10.0.jar
   - flink-state-processor-api_2.12-1.10.0.jar
   - flink-swift-fs-hadoop-1.10.0.jar

Current Flink dist is 267M. If we removed everything from opt

we

would

go down to 126M. I would reccomend this, because the large

majority

of

the files in opt are probably unused.

What do you think?

Best,
Aljoscha



--
Best Regards

Jeff Zhang





--
Best, Jingsong Lee









--
Best, Jingsong Lee







Re: [VOTE] FLIP-124: Add open/close and Collector to (De)SerializationSchema

2020-04-16 Thread Aljoscha Krettek

+1 (binding)

Aljoscha


Re: [PROPOSAL] Google Season of Docs 2020.

2020-04-17 Thread Aljoscha Krettek

Hi,

first, excellent that you're driving this, Marta!

By now I have made quite some progress on the FLIP-42 restructuring so 
that is not a good effort for someone to join now. Plus there is also 
[1], which is about incorporating the existing Flink Training material 
into the concepts section of the Flink doc.


What I think would be very good is working on the Table API/SQL 
documentation [2]. We don't necessarily have to take the FLIP as a basis 
but we can, or we can start from a blank slate. I think the current 
structure as well as the content is sub-optimal (not good, really). It 
would be ideal to have someone get to now the system and then write 
documentation for that part of Flink that has both good structure and 
content and nicely guides new users.


I would be very happy to mentor that effort.

Best,
Aljoscha

[1] 
https://lists.apache.org/thread.html/rea9cbfffd9a1c0ca6f78f7e8c8497d81f32ed4a7b9e7a05d59f3b2e9%40%3Cdev.flink.apache.org%3E


[2] 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=127405685


On 17.04.20 09:21, Robert Metzger wrote:

Thanks a lot for volunteering to drive an application for the Flink project!

Last year, we discussed finishing the chinese translation as a potential
project. I believe there's still a need for this.
Since the work on the project starts pretty far in the future (September),
the translation project is a good fit as well (there's currently no major
effort on the translation, rather a constant flow of PRs, but I don't think
that is enough to finish the translation).


On Fri, Apr 17, 2020 at 9:15 AM Konstantin Knauf  wrote:


Hi Marta,

Thanks for kicking off the discussion. Aljoscha has recently revived the
implementation of the FLIP-42 and has already moved things around quite a
bit. [1]

There are a lot of areas that can be improved of course, but a lot of them
require very deep knowledge about the system (e.g. the "Deployment" or
"Concepts" section). One area that I could imagine working well in such a
format is to work on the "Connectors" section. Aljoscha has already moved
this to the top-level, but it besides that it has not been touched yet in
the course of FLIP-42. The documentation project could be around
restructuring, standardization and generally improving the documentation of
our connectors for both Datastream as well as Table API/SQL.

Cheers,

Konstantin

[1] https://ci.apache.org/projects/flink/flink-docs-master/

On Wed, Apr 15, 2020 at 12:11 PM Marta Paes Moreira 
wrote:


Hi, Everyone.

Google is running its Season of Docs [1] program again this year. The

goal

of the program is to pair open source organizations/projects with
professional technical writers to improve their documentation.

The Flink community submitted an application in 2019 (led by Konstantin)
[2,3], but was unfortunately not accepted into the program. This year,

I'm

volunteering to write and submit the proposal in the upcoming weeks. To
achieve this, there are a few things that need to be sorted out in

advance:


-
*Mentors *Each proposed project idea requires at least two volunteers to
mentor technical writers through the process. *Who would like to
participate as a mentor*? You can read about the responsibilities here
[4].


-
*Project Ideas *We can submit as many project ideas as we'd like, but

it's

unlikely that more than 2 are accepted. *What would you consider a
priority for documentation improvement*? In my opinion, reorganizing

the

documentation to make it easier to navigate and more accessible to
newcomers would be a top priority. You can check FLIP-42/FLINK-12639

[5]

for improvements that are already under consideration and [6] for last
year's mailing list discussion.


- *Alternative Organization Administrator*
I volunteer as an administrator, but Google requires two. *Who would
like to join me as an application administrator*?

The deadline is *May 4th *and the accepted projects would kick-off the

work

with technical writers on *September 14th*. Let me know if you have any
questions!

Thanks,

Marta

[1] https://developers.google.com/season-of-docs
[2]



https://lists.apache.org/thread.html/3c789b6187da23ad158df59bbc598543b652e3cfc1010a14e294e16a@%3Cdev.flink.apache.org%3E

[3]



https://docs.google.com/document/d/1Up53jNsLztApn-mP76AB6xWUVGt3nwS9p6xQTiceKXo/edit?usp=sharing

[4] https://developers.google.com/season-of-docs/docs/mentor-guide
[5] https://issues.apache.org/jira/browse/FLINK-12639
[6]



https://lists.apache.org/thread.html/3c789b6187da23ad158df59bbc598543b652e3cfc1010a14e294e16a@%3Cdev.flink.apache.org%3E





--

Konstantin Knauf







Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-17 Thread Aljoscha Krettek
 maven); this should would work

in

China, no?

A web tool would of course be fancy, but I don't know how feasible this

is

with the ASF infrastructure.
You wouldn't be able to mirror the distribution, so the load can't be
distributed. I doubt INFRA would like this.

Note that third-parties could also start distributing use-case oriented
distributions, which would be perfectly fine as far as I'm concerned.

On 16/04/2020 16:57, Kurt Young wrote:

I'm not so sure about the web tool solution though. The concern I have

for

this approach is the final generated
distribution is kind of non-deterministic. We might generate too many
different combinations when user trying to
package different types of connector, format, and even maybe hadoop
releases.  As far as I can tell, most open
source projects and apache projects will only release some
pre-defined distributions, which most users are already
familiar with, thus hard to change IMO. And I also have went through in
some cases, users will try to re-distribute
the release package, because of the unstable network of apache website

from

China. In web tool solution, I don't
think this kind of re-distribution would be possible anymore.

In the meantime, I also have a concern that we will fall back into our

trap

again if we try to offer this smart & flexible
solution. Because it needs users to cooperate with such mechanism. It's
exactly the situation what we currently fell
into:
1. We offered a smart solution.
2. We hope users will follow the correct instructions.
3. Everything will work as expected if users followed the right
instructions.

In reality, I suspect not all users will do the second step correctly.

And

for new users who only trying to have a quick
experience with Flink, I would bet most users will do it wrong.

So, my proposal would be one of the following 2 options:
1. Provide a slim distribution for advanced product users and provide a
distribution which will have some popular builtin jars.
2. Only provide a distribution which will have some popular builtin

jars.


If we are trying to reduce the distributions we released, I would

prefer

2


1.

Best,
Kurt


On Thu, Apr 16, 2020 at 9:33 PM Till Rohrmann  <

trohrm...@apache.org> wrote:



I think what Chesnay and Dawid proposed would be the ideal solution.
Ideally, we would also have a nice web tool for the website which

generates

the corresponding distribution for download.

To get things started we could start with only supporting to
download/creating the "fat" version with the script. The fat version

would

then consist of the slim distribution and whatever we deem important

for

new users to get started.

Cheers,
Till

On Thu, Apr 16, 2020 at 11:33 AM Dawid Wysakowicz <

dwysakow...@apache.org> 

wrote:


Hi all,

Few points from my side:

1. I like the idea of simplifying the experience for first time users.
As for production use cases I share Jark's opinion that in this case I
would expect users to combine their distribution manually. I think in
such scenarios it is important to understand interconnections.
Personally I'd expect the slimmest possible distribution that I can
extend further with what I need in my production scenario.

2. I think there is also the problem that the matrix of possible
combinations that can be useful is already big. Do we want to have a
distribution for:

 SQL users: which connectors should we include? should we include
hive? which other catalog?

 DataStream users: which connectors should we include?

For both of the above should we include yarn/kubernetes?

I would opt for providing only the "slim" distribution as a release
artifact.

3. However, as I said I think its worth investigating how we can

improve

users experience. What do you think of providing a tool, could be e.g.

a

shell script that constructs a distribution based on users choice. I
think that was also what Chesnay mentioned as "tooling to
assemble custom distributions" In the end how I see the difference
between a slim and fat distribution is which jars do we put into the
lib, right? It could have a few "screens".

1. Which API are you interested in:
a. SQL API
b. DataStream API


2. [SQL] Which connectors do you want to use? [multichoice]:
a. Kafka
b. Elasticsearch
...

3. [SQL] Which catalog you want to use?

...

Such a tool would download all the dependencies from maven and put them
into the correct folder. In the future we can extend it with additional
rules e.g. kafka-0.9 cannot be chosen at the same time with
kafka-universal etc.

The benefit of it would be that the distribution that we release could
remain "slim" or we could even make it slimmer. I might be missing
something here though.

Best,

Dawdi

On 16/04/2020 11:02, Aljoscha Krettek wrote:

I want to reinforce my opinion from earlier: This is about improving
the situation both for first-time users and

[DISCUSS] FLIP-126: Unify (and separate) Watermark Assigners

2020-04-20 Thread Aljoscha Krettek

Hi Everyone!

We would like to start a discussion on "FLIP-126: Unify (and separate) 
Watermark Assigners" [1]. This work was started by Stephan in an 
experimental branch. I expanded on that work to provide a PoC for the 
changes proposed in this FLIP: [2].


Currently, we have two different flavours of Watermark 
Assigners: AssignerWithPunctuatedWatermarks 
and AssignerWithPeriodicWatermarks. Both of them extend 
from TimestampAssigner. This means that sources that want to support 
watermark assignment/extraction in the source need to support two 
separate interfaces, we have two operator implementations for the 
different flavours. Also, this makes features such as generic support 
for idleness detection more complicated to implemented because we again 
have to support two types of watermark assigners.


In this FLIP we propose two things:

Unify the Watermark Assigners into one Interface WatermarkGenerator
Separate this new interface from the TimestampAssigner
The motivation for the first is to simplify future implementations and 
code duplication. The motivation for the second point is again code 
deduplication, most assigners currently have to extend from some base 
timestamp extractor or duplicate the extraction logic, or users have to 
override an abstract method of the watermark assigner to provide the 
timestamp extraction logic.


Additionally, we propose to add a generic wrapping WatermarkGenerator 
that provides idleness detection, i.e. it can mark a stream/partition as 
idle if no data arrives after a configured timeout.


The "unify and separate" part refers to the fact that we want to unify 
punctuated and periodic assigners but at the same time split the 
timestamp assigner from the watermark generator.


Please find more details in the FLIP [1]. Looking forward to
your feedback.

Best,
Aljoscha

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-126%3A+Unify+%28and+separate%29+Watermark+Assigners


[2] https://github.com/aljoscha/flink/tree/stephan-event-time


Re: [DISCUSS] Adding support for Hadoop 3 and removing flink-shaded-hadoop

2020-04-22 Thread Aljoscha Krettek
+1 to getting rid of flink-shaded-hadoop. But we need to document how 
people can now get a Flink dist that works with Hadoop. Currently, when 
you download the single shaded jar you immediately get support for 
submitting to YARN via bin/flink run.


Aljoscha


On 22.04.20 09:08, Till Rohrmann wrote:

Hi Robert,

I think it would be a helpful simplification of Flink's build setup if we
can get rid of flink-shaded-hadoop. Moreover relying only on the vanilla
Hadoop dependencies for the modules which interact with Hadoop/Yarn sounds
like a good idea to me.

Adding support for Hadoop 3 would also be nice. I'm not sure, though, how
Hadoop's API's have changed between 2 and 3. It might be necessary to
introduce some bridges in order to make it work.

Cheers,
Till

On Tue, Apr 21, 2020 at 4:37 PM Robert Metzger  wrote:


Hi all,

for the upcoming 1.11 release, I started looking into adding support for
Hadoop 3[1] for Flink. I have explored a little bit already into adding a
shaded hadoop 3 into “flink-shaded”, and some mechanisms for switching
between Hadoop 2 and 3 dependencies in the Flink build.

However, Chesnay made me aware that we could also go a different route: We
let Flink depend on vanilla Hadoop dependencies and stop providing shaded
fat jars for Hadoop through “flink-shaded”.

Why?
- Maintaining properly shaded Hadoop fat jars is a lot of work (we have
insufficient test coverage for all kinds of Hadoop features)
- For Hadoop 2, there are already some known and unresolved issues with our
shaded jars that we didn’t manage to fix

Users will have to use Flink with Hadoop by relying on vanilla or
vendor-provided Hadoop dependencies.

What do you think?

Best,
Robert

[1] https://issues.apache.org/jira/browse/FLINK-11086







Re: [DISCUSS] Exact feature freeze date

2020-04-23 Thread Aljoscha Krettek

+1

Aljoscha

On 23.04.20 15:23, Till Rohrmann wrote:

+1 for extending the feature freeze until May 15th.

Cheers,
Till

On Thu, Apr 23, 2020 at 1:00 PM Piotr Nowojski  wrote:


Hi Stephan,

As release manager I’ve seen that quite a bit of features could use of the
extra couple of weeks. This also includes some features that I’m involved
with, like FLIP-76, or limiting the in-flight buffers.

+1 From my side for extending the feature freeze until May 15th.

Piotrek


On 23 Apr 2020, at 10:10, Stephan Ewen  wrote:

Hi all!

I want to bring up a discussion about when we want to do the feature

freeze

for 1.11.

When kicking off the release cycle, we tentatively set the date to end of
April, which would be in one week.

I can say from the features I am involved with (FLIP-27, FLIP-115,
reviewing some state backend improvements, etc.) that it would be helpful
to have two additional weeks.

When looking at various other feature threads, my feeling is that there

are

more contributors and committers that could use a few more days.
The last two months were quite exceptional in and we did lose a bit of
development speed here and there.

How do you think about making *May 15th* the feature freeze?

Best,
Stephan









Re: [DISCUSS] Removing deprecated state methods in 1.11

2020-04-23 Thread Aljoscha Krettek
Definitely +1! I'm always game for decreasing the API surface if it 
doesn't decrease functionality.


Aljoscha

On 23.04.20 14:18, DONG, Weike wrote:

Hi Stephan,

+1 for the removal, as there are so many deprecated methods scattered
around, making APIs a little bit messy and confusing.

Best,
Weike

Stephan Ewen 于2020年4月23日 周四下午8:07写道:


Hi all!

There are a bunch of deprecated methods for state access:

   - RuntimeContext.getFoldingState(...)
   - OperatorStateStore.getSerializableListState(...)
   -  OperatorStateStore.getOperatorState(...)

All of them have been deprecated for around three years now. All have good
alternatives.

Time to remove them in 1.11?

Best,
Stephan







Re: [DISCUSS] Releasing "fat" and "slim" Flink distributions

2020-04-24 Thread Aljoscha Krettek
re (1): I don't know about that, probably the people that did the 
metrics reporter plugin support had some thoughts about that.


re (2): I agree, that's why I initially suggested to split it into 
"slim" and "fat" because our current "medium fat" selection of jars in 
Flink dist does not serve anyone too well. It's too fat for people that 
want to build lean application images. It's to lean for people that want 
a good first out-of-box experience.


Aljoscha

On 17.04.20 16:38, Stephan Ewen wrote:

@Aljoscha I think that is an interesting line of thinking. the swift-fs may
be rarely enough used to move it to an optional download.

I would still drop two more thoughts:

(1) Now that we have plugins support, is there a reason to have a metrics
reporter or file system in /opt instead of /plugins? They don't spoil the
class path any more.

(2) I can imagine there still being a desire to have a "minimal" docker
file, for users that want to keep the container images as small as
possible, to speed up deployment. It is fine if that would not be the
default, though.


On Fri, Apr 17, 2020 at 12:16 PM Aljoscha Krettek 
wrote:


I think having such tools and/or tailor-made distributions can be nice
but I also think the discussion is missing the main point: The initial
observation/motivation is that apparently a lot of users (Kurt and I
talked about this) on the chinese DingTalk support groups, and other
support channels have problems when first using the SQL client because
of these missing connectors/formats. For these, having additional tools
would not solve anything because they would also not take that extra
step. I think that even tiny friction should be avoided because the
annoyance from it accumulates of the (hopefully) many users that we want
to have.

Maybe we should take a step back from discussing the "fat"/"slim" idea
and instead think about the composition of the current dist. As
mentioned we have these jars in opt/:

   17M flink-azure-fs-hadoop-1.10.0.jar
   52K flink-cep-scala_2.11-1.10.0.jar
180K flink-cep_2.11-1.10.0.jar
746K flink-gelly-scala_2.11-1.10.0.jar
626K flink-gelly_2.11-1.10.0.jar
512K flink-metrics-datadog-1.10.0.jar
159K flink-metrics-graphite-1.10.0.jar
1.0M flink-metrics-influxdb-1.10.0.jar
102K flink-metrics-prometheus-1.10.0.jar
   10K flink-metrics-slf4j-1.10.0.jar
   12K flink-metrics-statsd-1.10.0.jar
   36M flink-oss-fs-hadoop-1.10.0.jar
   28M flink-python_2.11-1.10.0.jar
   22K flink-queryable-state-runtime_2.11-1.10.0.jar
   18M flink-s3-fs-hadoop-1.10.0.jar
   31M flink-s3-fs-presto-1.10.0.jar
196K flink-shaded-netty-tcnative-dynamic-2.0.25.Final-9.0.jar
518K flink-sql-client_2.11-1.10.0.jar
   99K flink-state-processor-api_2.11-1.10.0.jar
   25M flink-swift-fs-hadoop-1.10.0.jar
160M opt

The "filesystem" connectors ar ethe heavy hitters, there.

I downloaded most of the SQL connectors/formats and this is what I got:

   73K flink-avro-1.10.0.jar
   36K flink-csv-1.10.0.jar
   55K flink-hbase_2.11-1.10.0.jar
   88K flink-jdbc_2.11-1.10.0.jar
   42K flink-json-1.10.0.jar
   20M flink-sql-connector-elasticsearch6_2.11-1.10.0.jar
2.8M flink-sql-connector-kafka_2.11-1.10.0.jar
   24M sql-connectors-formats

We could just add these to the Flink distribution without blowing it up
by much. We could drop any of the existing "filesystem" connectors from
opt and add the SQL connectors/formats and not change the size of Flink
dist. So maybe we should do that instead?

We would need some tooling for the sql-client shell script to pick-up
the connectors/formats up from opt/ because we don't want to add them to
lib/. We're already doing that for finding the flink-sql-client jar,
which is also not in lib/.

What do you think?

Best,
Aljoscha

On 17.04.20 05:22, Jark Wu wrote:

Hi,

I like the idea of web tool to assemble fat distribution. And the
https://code.quarkus.io/ looks very nice.
All the users need to do is just select what he/she need (I think this

step

can't be omitted anyway).
We can also provide a default fat distribution on the web which default
selects some popular connectors.

Best,
Jark

On Fri, 17 Apr 2020 at 02:29, Rafi Aroch  wrote:


As a reference for a nice first-experience I had, take a look at
https://code.quarkus.io/
You reach this page after you click "Start Coding" at the project

homepage.


Rafi


On Thu, Apr 16, 2020 at 6:53 PM Kurt Young  wrote:


I'm not saying pre-bundle some jars will make this problem go away, and
you're right that only hides the problem for
some users. But what if this solution can hide the problem for 90%

users?

Would't that be good enough for us to try?

Regarding to would users following instructions really be such a big
problem?
I'm afraid yes. Otherwise I won't answer such questions for at least a
dozen times and I won't see such questions coming
up from time to time.

Re: Multiple rebalances are incorrectly ignored in some cases.

2020-04-27 Thread Aljoscha Krettek

On 27.04.20 09:34, David Morávek wrote:


When we include `flatMap` in between rebalances ->
`.rebalance().flatMap(...).rebalance()`, we need to reshuffle again,
because dataset distribution may have changed (eg. you can possibli emit
unbouded stream from a single element). Unfortunatelly `flatMap` output is
still incorrectly marked as `FORCED_REBALANCED` and the second reshuffle
gets ignored.


This indeed seems incorrect. Did you look into the Flink code to see why 
the output of the flatMap is `FORCED_REBALANCED`?


Aljoscha


Re: A query on codebase exploration

2020-04-27 Thread Aljoscha Krettek

Hi Manish,

welcome to the community! You could start from a user program example 
and then try and figure out how that leads to job execution. So probably 
start with the DataStream WordCount example, figure out what the methods 
on DataStream do, that is how they build up a graph of Transformations. 
Then try and follow what happens when you call execute(), how the graph 
of Transformations is translated eventually to a StreamGraph and then a 
JobGraph that is then executed/submitted.


Feel free to ask if you get stuck or have more questions on this.

Aljoscha

On 27.04.20 12:22, Manish G wrote:

Hi,

Looking into the codebase, its quite huge.

Any suggestions/guidelines on which parts should one explore first, and to
maintain whole picture too?

Manish





Re: [DISCUSS] FLIP-73: Introducing Executors for job submission

2019-09-25 Thread Aljoscha Krettek
Hi,

I’m fine with either signature for the new execute() method but I think we 
should focus on the executor discovery and executor configuration part in this 
FLIP while FLIP-74 is about the evolution of the method signature to return a 
future.

I understand that it’s a bit weird, that this FLIP introduces a new interface 
only to be changed within the same Flink release in a follow-up FLIP. But I 
think we can still do it. What do you think?

Best,
Aljoscha

> On 25. Sep 2019, at 10:11, Kostas Kloudas  wrote:
> 
> Hi Thomas and Zili,
> 
> As you both said the Executor is a new addition so there are no
> compatibility concerns.
> Backwards compatibility comes into play on the
> (Stream)ExecutionEnvironment#execute().
> 
> This method has to stay and keep having the same (blocking) semantics,
> but we can
> add a new one, sth along the lines of executeAsync() that will return
> the JobClient and
> will allow the caller to interact with the job.
> 
> Cheers,
> Kostas
> 
> On Wed, Sep 25, 2019 at 2:44 AM Zili Chen  wrote:
>> 
>>> Since Exceutor is a new interface, why is backward compatibility a concern?
>> 
>> For backward compatibility, it is on (Stream)ExecutionEnvironment#execute.
>> You're right that we don't stick to blocking to return a JobExecutionResult 
>> in
>> Executor aspect but implementing env.execute with a unique
>> 
>> Executor#execute(or with suffix Async): CompletableFuture
>> 
>> what do you think @Kostas Kloudas?
>> 
>>> I could see that become an issue later when replacing Executor execute with
>>> executeAsync. Or are both targeted for 1.10?
>> 
>> IIUC both Executors and JobClient are targeted for 1.10.
>> 
>> 
>> Thomas Weise  于2019年9月25日周三 上午2:39写道:
>>> 
>>> Since Exceutor is a new interface, why is backward compatibility a concern?
>>> 
>>> I could see that become an issue later when replacing Executor execute with
>>> executeAsync. Or are both targeted for 1.10?
>>> 
>>> 
>>> On Tue, Sep 24, 2019 at 10:24 AM Zili Chen  wrote:
>>> 
 Hi Thomas,
 
> Should the new Executor execute method be defined as asynchronous? It
 could
> return a job handle to interact with the job and the legacy environments
> can still block to retain their semantics.
 
 During our discussion there will be a method
 
 executeAsync(...): CompletableFuture
 
 where JobClient can be regarded as job handle in your context.
 
 I think we remain
 
 execute(...): JobExecutionResult
 
 just for backward compatibility because this effort towards 1.10 which is
 not a
 major version bump.
 
 BTW, I am drafting details of JobClient(as FLIP-74). Will start a separated
 discussion
 thread on that interface as soon as I finish an early version.
 
 Best,
 tison.
 
 
 Thomas Weise  于2019年9月25日周三 上午1:17写道:
 
> Thanks for the proposal. These changes will make it significantly easier
 to
> programmatically use Flink in downstream frameworks.
> 
> Should the new Executor execute method be defined as asynchronous? It
 could
> return a job handle to interact with the job and the legacy environments
> can still block to retain their semantics.
> 
> (The blocking execution has also made things more difficult in Beam, we
> could simply switch to use Executor directly.)
> 
> Thomas
> 
> 
> On Tue, Sep 24, 2019 at 6:48 AM Kostas Kloudas 
> wrote:
> 
>> Hi all,
>> 
>> In the context of the discussion about introducing the Job Client API
> [1],
>> there was a side-discussion about refactoring the way users submit jobs
> in
>> Flink. There were many different interesting ideas on the topic and 3
>> design documents that were trying to tackle both the issue about code
>> submission and the Job Client API.
>> 
>> This discussion thread aims at the job submission part and proposes the
>> approach of introducing the Executor abstraction which will abstract
 the
>> job submission logic from the Environments and will make it API
 agnostic.
>> 
>> The FLIP can be found at [2].
>> 
>> Please keep the discussion here, in the mailing list.
>> 
>> Looking forward to your opinions,
>> Kostas
>> 
>> [1]
>> 
>> 
> 
 https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
>> [2]
>> 
>> 
> 
 https://cwiki.apache.org/confluence/display/FLINK/FLIP-73%3A+Introducing+Executors+for+job+submission
>> 
> 
 



Re: [DISCUSS] FLIP-74: Flink JobClient API

2019-09-25 Thread Aljoscha Krettek
Hi Tison,

Thanks for proposing the document! I had some comments on the document.

I think the only complex thing that we still need to figure out is how to get a 
JobClient for a job that is already running. As you mentioned in the document. 
Currently I’m thinking that its ok to add a method to Executor for retrieving a 
JobClient for a running job by providing an ID. Let’s see what Kostas has to 
say on the topic.

Best,
Aljoscha

> On 25. Sep 2019, at 12:31, Zili Chen  wrote:
> 
> Hi all,
> 
> Summary from the discussion about introducing Flink JobClient API[1] we
> draft FLIP-74[2] to
> gather thoughts and towards a standard public user-facing interfaces.
> 
> This discussion thread aims at standardizing job level client API. But I'd
> like to emphasize that
> how to retrieve JobClient possibly causes further discussion on different
> level clients exposed from
> Flink so that a following thread will be started later to coordinate
> FLIP-73 and FLIP-74 on
> expose issue.
> 
> Looking forward to your opinions.
> 
> Best,
> tison.
> 
> [1]
> https://lists.apache.org/thread.html/ce99cba4a10b9dc40eb729d39910f315ae41d80ec74f09a356c73938@%3Cdev.flink.apache.org%3E
> [2]
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-74%3A+Flink+JobClient+API



Re: REST API / JarRunHandler: More flexibility for launching jobs

2019-09-26 Thread Aljoscha Krettek
Hi,

Regarding the original proposal: I don’t think spawning another process inside 
the JarHandler.runJar() is the way to go here. Looking at the bigger picture, 
the proposal would get us to roughly this situation:

1. Spawn Kubernetes containers (JobManager and TaskManagers)
2. User does a REST call to JobManager.runJar() to submit the user job
3. JobManager.runJar() opens a port that waits for job submission
4. JobMananger.runJar() invokes UserProgram.main()
5. UserProgram.main() launches a process (BeamJobService) that opens a port to 
wait for a Python process to connect to it
6. UserProgram.main() launches another process (the Python code, or any 
language, really) that connects to BeamJobService to submit the Pipeline
7. BeamJobService receives the Pipeline and talks to the port open on 
JobManager (via REST service, maybe) to submit the Job
8. Job is executed
9. Where is UserProgram.main() at this point?

I think that even running UserProgram.main() in the JobManager is already too 
much. A JobManager should accept JobGraphs (or something) and execute them, 
nothing more. Running UserProgram.main() makes some things complicated or 
weird. For example, what happens when that UserProgram.main() creates a 
RemoteEnvironment and uses that? What happens when the user code calls 
execute() multiple times.

I think a good solution for the motivating use case is to

a) run BeamJobService as a separate service that talks to a running JobManager 
via REST for submitting jobs that it receives

b) Spawning a JobManager inside the BeamJobService, i.e. the BeamJobService is 
like the entry point in a per-job Kubernetes model. Something that the new 
Executor work ([1], [2]) will enable.

Any thoughts? I’m happy to jump on a call about this because these things are 
very tricky to figure out and I might be wrong.

Best,
Aljoscha

[1] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-73%3A+Introducing+Executors+for+job+submission
[2] 
https://docs.google.com/document/d/1E-8UjOLz4QPUTxetGWbU23OlsIH9VIdodpTsxwoQTs0/edit?ts=5d88e631

> On 6. Aug 2019, at 09:51, Till Rohrmann  wrote:
> 
> I think there was the idea to make the JobGraph a "public"/stable interface
> other projects can rely on at some point. If I remember correctly, then we
> wanted to define a proto buf definition for the JobGraph so that clients
> written in a different language can submit JobGraphs and we could extend
> the data structure. As far as I know, this effort hasn't been started yet
> and is still in the backlog (I think there doesn't exist a JIRA issue yet).
> 
> The problem came up when discussing additions to the JobGraph because they
> need to be backwards compatible otherwise newer version of Flink would not
> be able to recover jobs. I think so far Flink provides backwards
> compatibility between different versions of the JobGraph. However, this is
> not officially guaranteed.
> 
> Cheers,
> Till
> 
> On Tue, Aug 6, 2019 at 3:56 AM Zili Chen  wrote:
> 
>> It sounds like a request to change the interface Program into
>> 
>> public interface Program {
>>  JobGraph getJobGraph(String... args);
>> }
>> 
>> Also, given that JobGraph is said as internal interface or
>> cannot be relied on, we might introduce and use a
>> representation that allows for cross version compatibility.
>> 
>> 
>> Thomas Weise  于2019年8月6日周二 上午12:11写道:
>> 
>>> If the goal is to keep job creation and job submission separate and we
>>> agree that there should be more flexibility for the job construction,
>> then
>>> JobGraph and friends should be stable API that the user can depend on. If
>>> that's the case, the path Chesnay pointed to may become viable.
>>> 
>>> There was discussion in the past that JobGraph cannot be relied on WRT
>>> backward compatibility and I would expect that at some point we want to
>>> move to a representation that allows for cross version compatibility.
>> Beam
>>> is an example how this could be accomplished (with its pipeline proto).
>>> 
>>> So if the Beam job server was able to produce the JobGraph, is there
>>> agreement that we should provide a mechanism that allows the program
>> entry
>>> point to return the JobGraph directly (without using the
>>> ExecutionEnvironment to build it)?
>>> 
>>> 
>>> On Mon, Aug 5, 2019 at 2:10 AM Zili Chen  wrote:
>>> 
 Hi Thomas,
 
 If REST handler calls main(), the behavior inside main() is
 unpredictable.
 
 Now the jar run handler extract the job graph and submit
 it with the job id configured in REST request. If REST
 handler calls main() we can hardly even know how much
 jobs are executed.
 
 A new environment, as you said,
 ExtractJobGraphAndSubmitToDispatcherEnvironment can be
 added to satisfy your requirement. However, it is a bit
 out of Flink scope. It might be better to write your own
 REST handler.
 
 WebMonitorExtension is for extending REST handlers but
 it seems also unable to customize...
 
 Best,

  1   2   3   4   5   6   7   8   9   10   >