Re: [DISCUSS] Flink project bylaws

2019-07-17 Thread Dawid Wysakowicz
Hi all,

Sorry for joining late. I just wanted to say that I really like the
proposed bylaws!

One comment, I also have the same concerns as expressed by few people
before that the "committer +1" on code change might be hard to achieve
currently. On the other hand I would say this would be beneficial for
the quality/uniformity of our codebase and knowledge sharing.

I was just wondering what should be the next step for this? I think it
would make sense to already use those bylaws and put them to PMC vote.

Best,

Dawid

On 12/07/2019 13:35, Piotr Nowojski wrote:
> Hi Aljoscha and Becket
>
> Right, 3 days for FLIP voting is fine I think.
>
>>> I’m missing this stated somewhere clearly. If we are stating that a single
>>> committers +1 is good enough for code review, with 0 hours delay (de facto
>>> the current state), we should also write down that this is subject to the
>>> best judgement of the committer to respect the components expertise and
>>> ongoing development plans (also the de facto current state).
>> Adding the statement would help clarify the intention, but it may be a
>> little difficult to enforce and follow..
> I would be fine with that, it’s a soft/vague rule anyway, intended to be used 
> with your “best judgemenet". I would like to just avoid a situation when 
> someone violates current uncodified state and refers to the bylaws which are 
> saying merging with any committer +1 is always fine (like mine +1 for 
> flink-python or flink-ml). 
>
> Piotrek
>
>> On 12 Jul 2019, at 11:29, Aljoscha Krettek  wrote:
>>
>> @Piotr regarding the 3 days voting on the FLIP. This is just about the 
>> voting, before that there needs to be the discussion thread. If three days 
>> have passed on a vote and there is consensus (i.e. 3 committers/PMCs have 
>> voted +1) that seems a high enough bar for me. So far, we have rarely see 
>> any FLIPs pass that formal bar.
>>
>> According to the recent META-FLIP thread, we want to use "lazy majority" for 
>> the FLIP voting process. I think that should be changed from “consensus” in 
>> the proposed bylaws.
>>
>> Regarding the current state: do we have a current state that we all agree 
>> on? I have the feeling that if we try to come up with something that 
>> reflects the common state, according to PMCs/commiters, that might take a 
>> very long time. In that case I think it’s better to adopt something that we 
>> all like, rather than trying to capture how we do it now.
>>
>> Aljoscha
>>
>>> On 12. Jul 2019, at 11:07, Piotr Nowojski  wrote:
>>>
>>> Hi,
>>>
>>> Thanks for the proposal. Generally speaking +1 from my side to the general 
>>> idea and most of the content. I also see merit to the Chesney's proposal to 
>>> start from the current state. I think either would be fine for me.
>>>
>>> Couple of comments:
>>>
>>> 1. 
>>>
>>> I also think that requiring +1 from another committer would slow down and 
>>> put even more strain on the current reviewing bottleneck that we are 
>>> having. Even if the change clear and simple, context switch cost is quite 
>>> high, and that’s just one less PR that the second “cross” committer could 
>>> have reviewed somewhere else in that time. Besides, current setup that we 
>>> have (with no cross +1 from another committer required) works quite well 
>>> and I do not feel that’s causing troubles. On the other hand reviewing 
>>> bottleneck is. 
>>>
>>> 2.
>>>
 I think a committer should know when to ask another committer for feedback 
 or not.
>>> I’m missing this stated somewhere clearly. If we are stating that a single 
>>> committers +1 is good enough for code review, with 0 hours delay (de facto 
>>> the current state), we should also write down that this is subject to the 
>>> best judgement of the committer to respect the components expertise and 
>>> ongoing development plans (also the de facto current state).
>>>
>>> 3.
>>>
>>> Minimum length of 3 days for FLIP I think currently might be 
>>> problematic/too quick and can lead to problems if respected to the letter. 
>>> Again I think it depends highly on whether the committers with highest 
>>> expertise in the affected components managed to respond or not. 
>>>
>>> Piotrek
>>>
 On 12 Jul 2019, at 09:42, Chesnay Schepler  wrote:

 I'm wondering whether we shouldn't first write down Bylaws that reflect 
 the current state, and then have separate discussions for individual 
 amendments. My gut feeling is that this discussion will quickly become a 
 chaotic mess with plenty points being discussed at once.

 On 11/07/2019 20:03, Bowen Li wrote:
> On Thu, Jul 11, 2019 at 10:38 AM Becket Qin  wrote:
>
>> Thanks everyone for all the comments and feedback. Please see the replies
>> below:
>>
>> 
>> Re: Konstantin
>>
>>> * In addition to a simple "Code Change" we could also add a row for 
>>> "Code
>>> Change requiring a FLIP" with a reference to the FLIP proces

Re: [ANNOUNCE] Jiangjie (Becket) Qin has been added as a committer to the Flink project

2019-07-18 Thread Dawid Wysakowicz
Congratulations Becket! Good to have you onboard!

On 18/07/2019 10:56, Till Rohrmann wrote:
> Congrats Becket!
>
> On Thu, Jul 18, 2019 at 10:52 AM Jeff Zhang  wrote:
>
>> Congratulations Becket!
>>
>> Xu Forward  于2019年7月18日周四 下午4:39写道:
>>
>>> Congratulations Becket! Well deserved.
>>>
>>>
>>> Cheers,
>>>
>>> forward
>>>
>>> Kurt Young  于2019年7月18日周四 下午4:20写道:
>>>
 Congrats Becket!

 Best,
 Kurt


 On Thu, Jul 18, 2019 at 4:12 PM JingsongLee >>> .invalid>
 wrote:

> Congratulations Becket!
>
> Best, Jingsong Lee
>
>
> --
> From:Congxian Qiu 
> Send Time:2019年7月18日(星期四) 16:09
> To:dev@flink.apache.org 
> Subject:Re: [ANNOUNCE] Jiangjie (Becket) Qin has been added as a
 committer
> to the Flink project
>
> Congratulations Becket! Well deserved.
>
> Best,
> Congxian
>
>
> Jark Wu  于2019年7月18日周四 下午4:03写道:
>
>> Congratulations Becket! Well deserved.
>>
>> Cheers,
>> Jark
>>
>> On Thu, 18 Jul 2019 at 15:56, Paul Lam 
>>> wrote:
>>> Congrats Becket!
>>>
>>> Best,
>>> Paul Lam
>>>
 在 2019年7月18日,15:41,Robert Metzger  写道:

 Hi all,

 I'm excited to announce that Jiangjie (Becket) Qin just became
>> a
> Flink
 committer!

 Congratulations Becket!

 Best,
 Robert (on behalf of the Flink PMC)
>>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>



signature.asc
Description: OpenPGP digital signature


[DISCUSS] Support temporary tables in SQL API

2019-07-22 Thread Dawid Wysakowicz
Hi all,

When working on FLINK-13279[1] we realized we could benefit from a
better temporary objects support in the Catalog API/Table API.
Unfortunately we are already long past the feature freeze that's why I
wanted to get some opinions from the community how should we proceed
with this topic. I tried to prepare a summary of the current state and 3
different suggested approaches that we could take. Please see the
attached document[2]

I will appreciate your thoughts!


[1] https://issues.apache.org/jira/browse/FLINK-13279

[2]
https://docs.google.com/document/d/1RxLj4tDB9GXVjF5qrkM38SKUPkvJt_BSefGYTQ-cVX4/edit?usp=sharing




signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] Support temporary tables in SQL API

2019-07-23 Thread Dawid Wysakowicz
I think we all agree so far that we should implement one of the short
term solutions for 1.9 release (#2 or #3) and continue the discussion on
option #1 for the next release. Personally I prefer option #2, because
it is closest to the current behavior and as Kurt said it is the most
intuitive one, but I am also fine with option #3

To sum up the opinions so far:

/option #2: 3 votes(Kurt, Aljoscha, me)/

/option #3: 2 votes(Timo, Jingsong)/

I wasn't sure which option out of the two Xuefu prefers.

I would like to conclude the discussion by the end of tomorrow, so that
we can prepare a proper fix as soon as possible. Therefore I would
suggest to proceed with the option that gets the most votes until
tomorrow (*July 24th 12:00 CET*), unless there are some hard objections.


Comment on option #1 concerns:

I agree with Jingsong on that. I think there are some benefits of the
approach, as it makes Flink in control of the temporary tables.

1. We have a unified behavior across all catalogs. Also for the catalogs
that do not support temporary tables natively.

2. As Flink is in control of the temporary tables it makes it easier to
control their lifecycle.

Best,

Dawid

On 23/07/2019 11:40, JingsongLee wrote:
> And I think we should recommend user to use catalog api to
>  createTable and createFunction,(I guess most scenarios do
>  not use temporary objects) in this way, it is good to option #3
>
> Best, JingsongLee
>
>
> --
> From:JingsongLee 
> Send Time:2019年7月23日(星期二) 17:35
> To:dev 
> Subject:Re: [DISCUSS] Support temporary tables in SQL API
>
> Thanks Dawid and other people.
> +1 for using option #3 for 1.9.0 and go with option #1
>  in 1.10.0.
>
> Regarding Xuefu's concern, I don't know how necessary it is for each catalog 
> to
>  deal with tmpView. I think Catalog is different from DB, we can have single 
> concept for tmpView, that make user easier to understand.
>
> Regarding option #2, It is hard to use if we let user to use fully qualified 
> name for hive catalog. Would this experience be too bad to use?
>
> Best, Jingsong Lee
>
>
> --
> From:Kurt Young 
> Send Time:2019年7月23日(星期二) 17:03
> To:dev 
> Subject:Re: [DISCUSS] Support temporary tables in SQL API
>
> Thanks Dawid for driving this discussion.
> Personally, I would +1 for using option #2 for 1.9.0 and go with option #1
> in 1.10.0.
>
> Regarding Xuefu's concern about option #1, I think we could also try to
> reuse the in-memory catalog
> for the builtin temporary table storage.
>
> Regarding to option #2 and option #3, from user's perspective, IIUC option
> #2 allows user to have
> simple name to reference temporary table and should use fully qualified
> name for external catalogs.
> But option #3 provide the opposite behavior, user can use simple name for
> external tables after he
> changed current catalog and current database, but have to use fully
> qualified name for temporary
> tables. IMO, option #2 will be more straightforward.
>
> Best,
> Kurt
>
>
> On Tue, Jul 23, 2019 at 4:01 PM Aljoscha Krettek 
> wrote:
>
>> I would be fine with option 3) but I think option 2) is the more implicit
>> solution that has less surprising behaviour.
>>
>> Aljoscha
>>
>>> On 22. Jul 2019, at 23:59, Xuefu Zhang  wrote:
>>>
>>> Thanks to Dawid for initiating the discussion. Overall, I agree with Timo
>>> that for 1.9 we should have some quick and simple solution, leaving time
>>> for more thorough discussions for 1.10.
>>>
>>> In particular, I'm not fully with solution #1. For one thing, it seems
>>> proposing storing all temporary objects in a memory map in
>> CatalogManager,
>>> and the memory map duplicates the functionality of the in-memory catalog,
>>> which also store temporary objects. For another, as pointed out by the
>>> google doc, different db may handle the temporary tables differently, and
>>> accordingly it may make more sense to let each catalog to handle its
>>> temporary objects.
>>>
>>> Therefore, postponing the fix buys us time to flush out all the details.
>>>
>>> Thanks,
>>> Xuefu
>>>
>>> On Mon, Jul 22, 2019 at 7:19 AM Timo Walther  wrote:
>>>
>>>> Thanks for summarizing our offline discussion Dawid! Even though I would
>>>> prefer solution 1 instead of releasing half-baked features, I also
>>>> understand that the Table API should not further block the next release.
>>>> Therefore, I would be fine with solution 3 but introduce 

Re: [ANNOUNCE] Kete Young is now part of the Flink PMC

2019-07-23 Thread Dawid Wysakowicz
Congratulations!

Best,

Dawid

On 23/07/2019 13:39, Hequn Cheng wrote:
> Congratulations Kurt!
>
> Best, Hequn
>
> On Tue, Jul 23, 2019 at 7:27 PM vino yang  wrote:
>
>> Congratulations Kurt!
>>
>> Bo WANG  于2019年7月23日周二 下午7:13写道:
>>
>>> Congratulations Kurt!
>>>
>>>
>>> Best,
>>>
>>> Bo WANG
>>>
>>>
>>> On Tue, Jul 23, 2019 at 5:24 PM Robert Metzger 
>>> wrote:
>>>
 Hi all,

 On behalf of the Flink PMC, I'm happy to announce that Kete Young is
>> now
 part of the Apache Flink Project Management Committee (PMC).

 Kete has been a committer since February 2017, working a lot on Table
>>> API /
 SQL. He's currently co-managing the 1.9 release! Thanks a lot for your
>>> work
 for Flink!

 Congratulations & Welcome Kurt!

 Best,
 Robert




signature.asc
Description: OpenPGP digital signature


Re: [ANNOUNCE] Zhijiang Wang has been added as a committer to the Flink project

2019-07-23 Thread Dawid Wysakowicz
Congratulations!

Best,

Dawid

On 23/07/2019 12:03, Danny Chan wrote:
> Congratulations Zhijiang!
>
> Best,
> Danny Chan
> 在 2019年7月22日 +0800 PM10:12,dev@flink.apache.org,写道:
>> Congratulations Zhijiang!


signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] Support temporary tables in SQL API

2019-07-24 Thread Dawid Wysakowicz
Hi all,

Thank you Xuefu for clarifying your opinion. Now we have 3 votes for
both of the options. To conclude this discussion I am willing to change
my vote to option 3 as I had only a slight preference towards option 2.

Therefore the final results of the poll are as follows:

/option #2: 2 votes(Kurt, Aljoscha)/

/option #3: 4 votes(Timo, Jingsong, Xuefu, me)/

I will prepare appropriate PRs according to the decision (unless
somebody objects). We will revisit the long-term solution in a separate
thread as part of the 1.10 release after 1.9 is released.

Thank you all for your opinions!

Best,

Dawid

On 24/07/2019 09:35, Aljoscha Krettek wrote:
> Isn’t https://issues.apache.org/jira/browse/FLINK-13279 
> <https://issues.apache.org/jira/browse/FLINK-13279> already a sign that there 
> are surprises for users if we go with option #3?
>
> Aljoscha
>
>> On 24. Jul 2019, at 00:33, Xuefu Z  wrote:
>>
>> I favored #3 if that wasn't obvious.
>>
>> Usability issue with #2 makes Hive  too hard to use. #3 keeps the old
>> behavior for existing users who don't have Hive and thus there is only one,
>> in-memory catalog. If a user does register Hive, he/she understands that
>> there are multiple catalogs and that fully qualified table name is
>> necessary. Thus, #3 has no impact (and no surprises) for existing users,
>> and new requirement of fully qualified names is for only for users of the
>> new feature (multiple catalogs), which seems very natural.
>>
>> Thanks,
>> Xuefu
>>
>> On Tue, Jul 23, 2019 at 5:47 AM Dawid Wysakowicz > <mailto:dwysakow...@apache.org>>
>> wrote:
>>
>>> I think we all agree so far that we should implement one of the short term
>>> solutions for 1.9 release (#2 or #3) and continue the discussion on option
>>> #1 for the next release. Personally I prefer option #2, because it is
>>> closest to the current behavior and as Kurt said it is the most intuitive
>>> one, but I am also fine with option #3
>>>
>>> To sum up the opinions so far:
>>>
>>> *option #2: 3 votes(Kurt, Aljoscha, me)*
>>>
>>> *option #3: 2 votes(Timo, Jingsong)*
>>>
>>> I wasn't sure which option out of the two Xuefu prefers.
>>>
>>> I would like to conclude the discussion by the end of tomorrow, so that we
>>> can prepare a proper fix as soon as possible. Therefore I would suggest to
>>> proceed with the option that gets the most votes until tomorrow (*July
>>> 24th 12:00 CET*), unless there are some hard objections.
>>>
>>>
>>> Comment on option #1 concerns:
>>>
>>> I agree with Jingsong on that. I think there are some benefits of the
>>> approach, as it makes Flink in control of the temporary tables.
>>>
>>> 1. We have a unified behavior across all catalogs. Also for the catalogs
>>> that do not support temporary tables natively.
>>>
>>> 2. As Flink is in control of the temporary tables it makes it easier to
>>> control their lifecycle.
>>>
>>> Best,
>>>
>>> Dawid
>>> On 23/07/2019 11:40, JingsongLee wrote:
>>>
>>> And I think we should recommend user to use catalog api to
>>> createTable and createFunction,(I guess most scenarios do
>>> not use temporary objects) in this way, it is good to option #3
>>>
>>> Best, JingsongLee
>>>
>>>
>>> --
>>> From:JingsongLee >> <mailto:lzljs3620...@aliyun.com.INVALID>> >> <mailto:lzljs3620...@aliyun.com.INVALID>>
>>> Send Time:2019年7月23日(星期二) 17:35
>>> To:dev mailto:dev@flink.apache.org>> 
>>> mailto:dev@flink.apache.org>>
>>> Subject:Re: [DISCUSS] Support temporary tables in SQL API
>>>
>>> Thanks Dawid and other people.
>>> +1 for using option #3 for 1.9.0 and go with option #1
>>> in 1.10.0.
>>>
>>> Regarding Xuefu's concern, I don't know how necessary it is for each 
>>> catalog to
>>> deal with tmpView. I think Catalog is different from DB, we can have single 
>>> concept for tmpView, that make user easier to understand.
>>>
>>> Regarding option #2, It is hard to use if we let user to use fully 
>>> qualified name for hive catalog. Would this experience be too bad to use?
>>>
>>> Best, Jingsong Lee
>>>
>>>
>>> --
>>> From:Kurt Young mailto:ykt...

Re: [DISCUSS] Repository split

2019-08-08 Thread Dawid Wysakowicz
First of all I don't have much(if not at all) experience with working
with a multi repository project of Flink's size. I would like to mention
a few thoughts of mine, though. In general I am slightly against
splitting the repository. I fear that what we actually want to do is to
introduce double standards for different modules with the repository split.

As I understand there are two issues we want to solve with the split:

1) long build/testing time

2) increasing number of PRs

Ad. 1 I agree this is a problem and that we don't necessarily need to
run all the tests with every change or build the whole project all the
time. However, I think we could achieve that in a single repository and
at the same time keep the option to build all modules at once. If I am
not mistaken this the approach that Apache Beam community decided to
take (see e.g.
https://github.com/apache/beam/blob/master/.test-infra/jenkins/job_PreCommit_Java.groovy
where they define paths to file that if changed trigger the
corresponding CI job). Maybe we could make it easier if we restructure
the repository? To something like:

   flink/
   |--flink-main/
   |--flink-core/
   |--flink-runtime/
   ...
   |--flink-connectors/
   ...
   |--flink-filesystems.../
   ...

   |--root.pom

In my opinion the Releases section from Chesnay's message shows well
that it might not be the best option to split the repository. The option
a) looks for me equivalent to what I suggested above but with a split.
The option b) looks for me super complicated and I can see no benefit
over option a). The option c) would be the most reasonable one if we
decided to split the repository, if you ask me. The problem with this
approach is the compatibility matrix (which versions of connectors work
with which versions of Flink?). Moreover, for me it is an indicator of
what I mentioned that we introduce double standards for those modules. I
am not saying that I am totally against that, but I think this should be
a conscious decision.

Ad. 2 I can't see how repository split could help with that rather than
moving some of the PRs to a separate list (that probably even less
people would look into). Also I think we can achieve something like that
already with github filters, no?

To sum up my thoughts:

 1. I think it is a good idea to split our CI builds to sub-modules
(connectors being the first candidate), that would trigger on a
changed path basis, but without splitting the repo.
 2. My feeling is that the real question is if we want to change our
stability guarantees of certain modules to be "just best effort".
 3. If we were to vote on this proposal I would vote -0. I am slightly
against this change, but wouldn't oppose.

Best,

Dawid

On 08/08/2019 13:23, Chesnay Schepler wrote:
> >  I would like to also raise an additional issue: currently quite
> some bugs (like release blockers [1]) are being discovered by ITCases
> of the connectors. It means that at least initially, the main
> repository will lose some test coverage.
>
> True, but I think this is more a symptom of us not properly testing
> the contracts that are exposed to connectors.
> That we lose lose test coverage is already a big red flag as it
> implies that issues were fixed and are now verified by a connector
> test, and not by a test in the Flink core.
> We could also look into tooling surrounding the CI bot for running the
> connectors tests on-demand, although this is very much long-term.
>
> On 08/08/2019 13:14, Piotr Nowojski wrote:
>> Hi,
>>
>> Thanks for proposing and writing this down Chesney.
>>
>> Generally speaking +1 from my side for the idea. It will create
>> additional pain for cross repository development, like some new
>> feature in connectors that need some change in the main repository.
>> I’ve worked in such setup before and the teams then regretted having
>> such split. But I agree that we should try this to try solve the
>> stability/build time issues.
>>
>> I have no experience in making such kind of splits so I can not help
>> here.
>>
>> I would like to also raise an additional issue: currently quite some
>> bugs (like release blockers [1]) are being discovered by ITCases of
>> the connectors. It means that at least initially, the main repository
>> will lose some test coverage.
>>
>> Piotrek
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-13593
>> 
>>
>>> On 7 Aug 2019, at 13:14, Chesnay Schepler  wrote:
>>>
>>> Hello everyone,
>>>
>>> The Flink project sees an ever-increasing amount of dev activity,
>>> both in terms of reworked and new features.
>>>
>>> This is of course an excellent situation to be in, but we are
>>> getting to a point where the associate downsides are becoming
>>> increasingly troublesome.
>>>
>>> The ever increasing build times, in addition to unstable tests,
>>> significantly slow down the develoment process.
>>> Additionally, pull req

Re: [VOTE] Apache Flink Release 1.9.0, release candidate #2

2019-08-12 Thread Dawid Wysakowicz
Hi Gyula,

As for the issues with mapr maven repository, you might have a look at
this message:
https://lists.apache.org/thread.html/77f4db930216e6da0d6121065149cef43ff3ea33c9ffe9b1a3047210@%3Cdev.flink.apache.org%3E

Try using the "unsafe-mapr-repo" profile.

Best,

Dawid

On 11/08/2019 19:31, Gyula Fóra wrote:
> Hi again,
>
> How do I build the RC locally with the hadoop version specified? Seems like
> no matter what I do I run into dependency problems with the shaded hadoop
> dependencies.
> This seems to have worked in the past.
>
> There might be some documentation somewhere that I couldnt find, so I would
> appreciate any pointers :)
>
> Thanks!
> Gyula
>
> On Sun, Aug 11, 2019 at 6:57 PM Gyula Fóra  wrote:
>
>> Hi!
>>
>> I am trying to build 1.9.0-rc2 with the -Pvendor-repos profile enabled. I
>> get the following error:
>>
>> mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=2.6.0
>> -Pinclude-hadoop (ignore that the hadoop version is not a vendor hadoop
>> version)
>>
>> [ERROR] Failed to execute goal on project flink-hadoop-fs: Could not
>> resolve dependencies for project
>> org.apache.flink:flink-hadoop-fs:jar:1.9.0: Failed to collect dependencies
>> at org.apache.flink:flink-shaded-hadoop-2:jar:2.6.0-7.0: Failed to read
>> artifact descriptor for
>> org.apache.flink:flink-shaded-hadoop-2:jar:2.6.0-7.0: Could not transfer
>> artifact org.apache.flink:flink-shaded-hadoop-2:pom:2.6.0-7.0 from/to
>> mapr-releases (https://repository.mapr.com/maven/):
>> sun.security.validator.ValidatorException: PKIX path building failed:
>> sun.security.provider.certpath.SunCertPathBuilderException: unable to find
>> valid certification path to requested target -> [Help 1]
>>
>> This looks like a TLS error. Might not be related to the release but it
>> could be good to know.
>>
>> Cheers,
>> Gyula
>>
>> On Fri, Aug 9, 2019 at 6:26 PM Tzu-Li (Gordon) Tai 
>> wrote:
>>
>>> Please note that the unresolved issues that are still tagged with a fix
>>> version "1.9.0", as seen in the JIRA release notes [1], are issues to
>>> update documents for new features.
>>> I've left them still associated with 1.9.0 since these should still be
>>> updated for 1.9.0 soon along with the official release.
>>>
>>> [1]
>>>
>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344601
>>>
>>> On Fri, Aug 9, 2019 at 6:17 PM Tzu-Li (Gordon) Tai 
>>> wrote:
>>>
 Hi all,

 Release candidate #2 for Apache Flink 1.9.0 is now ready for your
>>> review.
 This is the first voting candidate for 1.9.0, following the preview
 candidates RC0 and RC1.

 Please review and vote on release candidate #2 for version 1.9.0, as
 follows:
 [ ] +1, Approve the release
 [ ] -1, Do not approve the release (please provide specific comments)

 The complete staging area is available for your review, which includes:
 * JIRA release notes [1],
 * the official Apache source release and binary convenience releases to
>>> be
 deployed to dist.apache.org [2], which are signed with the key with
 fingerprint 1C1E2394D3194E1944613488F320986D35C33D6A [3],
 * all artifacts to be deployed to the Maven Central Repository [4],
 * source code tag “release-1.9.0-rc2” [5].

 Robert is also preparing a pull request for the announcement blog post
>>> in
 the works, and will update this voting thread with a link to the pull
 request shortly afterwards.

 The vote will be open for *at least 72 hours*.
 Please cast your votes before *Aug. 14th (Wed.) 2019, 17:00 PM CET*.It
>>> is
 adopted by majority approval, with at least 3 PMC affirmative votes.
 Thanks,
 Gordon[1]

>>> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12344601
 [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.9.0-rc2/
 [3] https://dist.apache.org/repos/dist/release/flink/KEYS
 [4]
>>> https://repository.apache.org/content/repositories/orgapacheflink-1234
 [5]

>>> https://gitbox.apache.org/repos/asf?p=flink.git;a=tag;h=refs/tags/release-1.9.0-rc2



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] Flink Project Bylaws

2019-08-13 Thread Dawid Wysakowicz
+1(non-binding)

I really think this will improve many aspects of how our community
operates. Thank you Becket for kicking it off!

Best,

Dawid

On 11/08/2019 10:07, Becket Qin wrote:
> Hi all,
>
> I would like to start a voting thread on the project bylaws of Flink. It
> aims to help the community coordinate more smoothly. Please see the bylaws
> wiki page below for details.
>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
>
> The discussion thread is following:
>
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-project-bylaws-td30409.html
>
> The vote will be open for at least 6 days. PMC members' votes are
> considered as binding. The vote requires 2/3 majority of the binding +1s to
> pass.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] Apache Flink Release 1.9.0, release candidate #2

2019-08-15 Thread Dawid Wysakowicz
>>>>>> Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I just find an issue when testing connector DDLs against
>>> blink
>>>>>>>>>>> planner
>>>>>>>>>>>>> for
>>>>>>>>>>>>>> rc2.
>>>>>>>>>>>>>> This issue lead to the DDL doesn't work when containing
>>>>>>>>>>>>> timestamp/date/time
>>>>>>>>>>>>>> type.
>>>>>>>>>>>>>> I have created an issue FLINK-13699[1] and a pull request for
>>>>>>>>> this.
>>>>>>>>>>>>>> IMO, this can be a blocker issue of 1.9 release. Because
>>>>>>>>>>>>>> timestamp/date/time are primitive types, and this will break
>>> the
>>>>>>>>>> DDL
>>>>>>>>>>>>>> feature.
>>>>>>>>>>>>>> However, I want to hear more thoughts from the community
>>> whether
>>>>>>>>> we
>>>>>>>>>>>>> should
>>>>>>>>>>>>>> recognize it as a blocker.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>> Jark
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> [1]: https://issues.apache.org/jira/browse/FLINK-13699
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, 12 Aug 2019 at 22:46, Becket Qin <
>>> becket@gmail.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> Thanks Gordon, will do that.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Mon, Aug 12, 2019 at 4:42 PM Tzu-Li (Gordon) Tai <
>>>>>>>>>>>>> tzuli...@apache.org
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Concerning FLINK-13231:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Since this is a @PublicEvolving interface, technically it
>>> is
>>>>>>>>> ok
>>>>>>>>>>> to
>>>>>>>>>>>>>> break
>>>>>>>>>>>>>>>> it across releases (including across bugfix releases?).
>>>>>>>>>>>>>>>> So, @Becket if you do merge it now, please mark the fix
>>>>>>>>> version
>>>>>>>>>>> as
>>>>>>>>>>>>>> 1.9.1.
>>>>>>>>>>>>>>>> During the voting process, in the case a new RC is created,
>>>>>>>>> we
>>>>>>>>>>>>> usually
>>>>>>>>>>>>>>>> check the list of changes compared to the previous RC, and
>>>>>>>>>>> correct
>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> "Fix
>>>>>>>>>>>>>>>> Version" of the corresponding JIRAs to be the right version
>>>>>>>>> (in
>>>>>>>>>>> the
>>>>>>>>>>>>>> case,
>>>>>>>>>>>>>>>> it would be corrected to 1.9.0 instead of 1.9.1).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Mon, Aug 12, 2019 at 4:25 PM Till Rohrmann <
>>>>>>>>>>>> trohrm...@apache.org>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I agree that it would be nicer. Not sure whet

Re: [VOTE] FLIP-51: Rework of the Expression Design

2019-08-16 Thread Dawid Wysakowicz
+1 from my side

Best,

Dawid

On 16/08/2019 10:31, Jark Wu wrote:
> +1 from my side.
>
> Thanks Jingsong for driving this.
>
> Best,
> Jark
>
> On Thu, 15 Aug 2019 at 22:09, Timo Walther  wrote:
>
>> +1 for this.
>>
>> Thanks,
>> Timo
>>
>> Am 15.08.19 um 15:57 schrieb JingsongLee:
>>> Hi Flink devs,
>>>
>>> I would like to start the voting for FLIP-51 Rework of the Expression
>>>   Design.
>>>
>>> FLIP wiki:
>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-51%3A+Rework+of+the+Expression+Design
>>> Discussion thread:
>>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-51-Rework-of-the-Expression-Design-td31653.html
>>> Google Doc:
>>>
>> https://docs.google.com/document/d/1yFDyquMo_-VZ59vyhaMshpPtg7p87b9IYdAtMXv5XmM/edit?usp=sharing
>>> Thanks,
>>>
>>> Best,
>>> Jingsong Lee
>>
>>



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] Flink Project Bylaws

2019-08-16 Thread Dawid Wysakowicz
AFAIK this voting scheme is described in the "Modifying Bylaws" section,
in the end introducing bylaws is a modify operation ;) . I think it is a
valid point to CC priv...@flink.apache.org in the future. I wouldn't say
it is a must though. The voting scheme requires that every PMC member
has to be reached out directly, via a private address if he/she did not
vote in a thread. So every PMC member should be aware of the voting thread.

Best,

Dawid

On 16/08/2019 12:38, Chesnay Schepler wrote:
> I'm very late to the party, but isn't it a bit weird that we're using
> a voting scheme that isn't laid out in the bylaws?
>
> Additionally, I would heavily suggest to CC priv...@flink.apache.org,
> as we want as many PMC as possible to look at this.
> (I would regard the this point as a reason for delaying  the vote
> conclusion)
>
> On 11/08/2019 10:07, Becket Qin wrote:
>> Hi all,
>>
>> I would like to start a voting thread on the project bylaws of Flink. It
>> aims to help the community coordinate more smoothly. Please see the
>> bylaws
>> wiki page below for details.
>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=120731026
>>
>>
>> The discussion thread is following:
>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-project-bylaws-td30409.html
>>
>>
>> The vote will be open for at least 6 days. PMC members' votes are
>> considered as binding. The vote requires 2/3 majority of the binding
>> +1s to
>> pass.
>>
>> Thanks,
>>
>> Jiangjie (Becket) Qin
>>
>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

2019-08-18 Thread Dawid Wysakowicz
Hi Stephan,

Thank you for your opinion.

Actually list/composite types are the topics we spent the most of the
time. I understand that from a perspective of a full blown type system,
a field like isList may look weird. Please let me elaborate a bit more
on the reason behind it though. Maybe we weren't clear enough about it
in the FLIP. The key feature of all the conifg options is that they must
have a string representation as they might come from a configuration
file. Moreover it must be a human readable format, so that the values
might be manually adjusted. Having that in mind we did not want to add a
support of an arbitrary nesting and we decided to allow for lists only
(and flat objects - I think though in the current design there is a
mistake around the Configurable interface). I think though you have a
point here and it would be better to have a ListConfigOption instead of
this field. Does it make sense to you?

As for the second part of your message. I am not sure if I understood
it. The validators work with parse/deserialized values from
Configuration that means they can be bound to the generic parameter of
Configuration. You can have a RangeValidator. I don't think the type hierarchy in the ConfigOption
has anything to do with the validation logic. Could you elaborate a bit
more what did you mean?

Best,

Dawid

On 18/08/2019 16:42, Stephan Ewen wrote:
> I like the idea of enhancing the configuration and to do early validation.
>
> I feel that some of the ideas in the FLIP seem a bit ad hoc, though. For
> example, having a boolean "isList" is a clear indication of not having
> thought through the type/category system.
> Also, having a more clear category system makes validation simpler.
>
> For example, I have seen systems distinguishing between numeric parameters
> (valid ranges), category parameters (set of possible values), quantities
> like duration and memory size (need measure and unit), which results in an
> elegant system for validation.
>
>
> On Fri, Aug 16, 2019 at 5:22 PM JingsongLee 
> wrote:
>
>> +1 to this, thanks Timo and Dawid for the design.
>> This allows the currently cluttered configuration of various
>>  modules to be unified.
>> This is also first step of one of the keys to making new unified
>> TableEnvironment available for production.
>>
>> Previously, we did encounter complex configurations, such as
>> specifying the skewed values of column in DDL. The skew may
>>  be a single field or a combination of multiple fields. So the
>>  configuration is very troublesome. We used JSON string to
>>  configure it.
>>
>> Best,
>> Jingsong Lee
>>
>>
>>
>> --
>> From:Jark Wu 
>> Send Time:2019年8月16日(星期五) 16:44
>> To:dev 
>> Subject:Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration
>>
>> Thanks for starting this design Timo and Dawid,
>>
>> Improving ConfigOption has been hovering in my mind for a long time.
>> We have seen the benefit when developing blink configurations and connector
>> properties in 1.9 release.
>> Thanks for bringing it up and make such a detailed design.
>> I will leave my thoughts and comments there.
>>
>> Cheers,
>> Jark
>>
>>
>> On Fri, 16 Aug 2019 at 22:30, Zili Chen  wrote:
>>
>>> Hi Timo,
>>>
>>> It looks interesting. Thanks for preparing this FLIP!
>>>
>>> Client API enhancement benefit from this evolution which
>>> hopefully provides a better view of configuration of Flink.
>>> In client API enhancement, we likely make the deployment
>>> of cluster and submission of job totally defined by configuration.
>>>
>>> Will take a look at the document in days.
>>>
>>> Best,
>>> tison.
>>>
>>>
>>> Timo Walther  于2019年8月16日周五 下午10:12写道:
>>>
 Hi everyone,

 Dawid and I are working on making parts of ExecutionConfig and
 TableConfig configurable via config options. This is necessary to make
 all properties also available in SQL. Additionally, with the new SQL
>> DDL
 based on properties as well as more connectors and formats coming up,
 unified configuration becomes more important.

 We need more features around string-based configuration in the future,
 which is why Dawid and I would like to propose FLIP-54 for evolving the
 ConfigOption and Configuration classes:



>> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit
 In summary it adds:
 - documented types and validation
 - more common types such as memory size, duration, list
 - simple non-nested object types

 Looking forward to your feedback,
 Timo


>>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS][CODE STYLE] Usage of Java Optional

2019-08-20 Thread Dawid Wysakowicz
Hi Andrey,

Just wanted to quickly elaborate on my opinion. I wouldn't say I am -1,
just -0 for the Optionals in private methods. I am ok with not
forbidding them there. I just think in all cases there is a better
solution than passing the Optionals around, even in private methods. I
just hope the outcome of the discussion won't be that it is no longer
allowed to suggest those during review.

Best,

Dawid

On 19/08/2019 17:53, Andrey Zagrebin wrote:
> Hi all,
>
> Sorry for not getting back to this discussion for some time.
> It looks like in general we agree on the initially suggested points:
>
>- Always use Optional only to return nullable values in the API/public
>methods
>   - Only if you can prove that Optional usage would lead to a
>   performance degradation in critical code then return nullable value
>   directly and annotate it with @Nullable
>- Passing an Optional argument to a method can be allowed if it is
>within a private helper method and simplifies the code
>- Optional should not be used for class fields
>
> The first point can be also elaborated by explicitly forbiding
> Optional/Nullable parameters in public methods.
> In general we can always avoid Optional parameters by either overloading
> the method or using a pojo with a builder to pass a set of parameters.
>
> The third point does not prevent from using @Nullable/@Nonnull for class
> fields.
> If we agree to not use Optional for fields then not sure I see any use case
> for SerializableOptional (please comment on that if you have more details).
>
> @Jingsong Lee
> Using Optional in Maps.
> I can see this as a possible use case.
> I would leave this decision to the developer/reviewer to reason about it
> and keep the scope of this discussion to the variables/parameters as it
> seems to be the biggest point of friction atm.
>
> Finally, I see a split regarding the second point:  argument to a method can be allowed if it is within a private helper method
> and simplifies the code>.
> There are people who have a strong opinion against allowing it.
> Let's vote then for whether to allow it or not.
> So far as I see we have the following votes (correct me if wrong and add
> more if you want):
> Piotr+1
> Biao+1
> Timo   -1
> Yu   -1
> Aljoscha -1
> Till  +1
> Igal+1
> Dawid-1
> Me +1
>
> So far: +5 / -4 (Optional arguments in private methods)
>
> Best,
> Andrey
>
>
> On Tue, Aug 6, 2019 at 8:53 AM Piotr Nowojski  wrote:
>
>> Hi Qi,
>>
>>> For example, SingleInputGate is already creating Optional for every
>> BufferOrEvent in getNextBufferOrEvent(). How much performance gain would we
>> get if it’s replaced by null check?
>>
>> When I was introducing it there, I have micro-benchmarked this change, and
>> there was no visible throughput change in a pure network only micro
>> benchmark (with whole Flink running around it any changes would be even
>> less visible).
>>
>> Piotrek
>>
>>> On 5 Aug 2019, at 15:20, Till Rohrmann  wrote:
>>>
>>> I'd be in favour of
>>>
>>> - Optional for method return values if not performance critical
>>> - Optional can be used for internal methods if it makes sense
>>> - No optional fields
>>>
>>> Cheers,
>>> Till
>>>
>>> On Mon, Aug 5, 2019 at 12:07 PM Aljoscha Krettek 
>>> wrote:
>>>
 Hi,

 I’m also in favour of using Optional only for method return values. I
 could be persuaded to allow them for parameters of internal methods but
>> I’m
 sceptical about that.

 Aljoscha

> On 2. Aug 2019, at 15:35, Yu Li  wrote:
>
> TL; DR: I second Timo that we should use Optional only as method return
> type for non-performance critical code.
>
> From the example given on our AvroFactory [1] I also noticed that
 Jetbrains
> marks the OptionalUsedAsFieldOrParameterType inspection as a warning.
 It's
> relatively easy to understand why it's not suggested to use (java.util)
> Optional as a field since it's not serializable. What made me feel
 curious
> is that why we shouldn't use it as a parameter type, so I did some
> investigation and here is what I found:
>
> There's a JB blog talking about java8 top tips [2] where we could find
 the
> advice around Optional, there I found another blog telling about the
> pragmatic approach of using Optional [3]. Reading further we could see
 the
> reason why we shouldn't use Optional as parameter type, please allow me
 to
> quote here:
>
> It is often the case that domain objects hang about in memory for a
>> fair
> while, as processing in the application occurs, making each optional
> instance rather long-lived (tied to the lifetime of the domain object).
 By
> contrast, the Optionalinstance returned from the getter is likely to be
> very short-lived. The caller will call the getter, interpret the
>> result,
> and then move on. If you know anything a

Re: [CODE-STYLE] Builder pattern

2019-08-26 Thread Dawid Wysakowicz
Hi Gyula,

A few comments from my side.

Ad. 1 Personally I also prefer a static method in the "built" class. Not
sure if I would be that strict about the "Builder" suffix, though. It is
usually quite easy to realize the method returns a builder rather than
the object itself. In my opinion the suffix might be redundant at times.

Ad. 2 Here I also don't have a strict preference. I like the b) approach
the least though. Whenever I see setXXX I expect this is an old style
setter without return type. For that reason for a generic name I always
prefer withXXX. I see though lots of benefits for the option c), as this
might be the most descriptive option. Some examples that I like the
usage of option c) are:

* org.apache.flink.api.common.state.StateTtlConfig.Builder#useProcessingTime

*
org.apache.flink.api.common.state.StateTtlConfig.Builder#cleanupInBackground

*
org.apache.flink.api.common.state.StateTtlConfig.Builder#cleanupInRocksdbCompactFilter()

To sum up on this topic I would not enforce a strict policy on this
topic. But I am ok with one, if the community prefers it.

Ad. 3 I agree that mutable Builders are way more easier to implement and
usually it does not harm to have them mutable as long as they are not
passed around.


Some other topics that I think are worth considering:

4. Setting the default values:

a) The static method/constructor should accept all the required
parameters (that don't have a reasonable defaults). So that
newBuilder(...).build() construct a valid object.

b) Validation in the build() method. newBuilder().build() might throw an
Exception if some values were not set.

Personally I think the option a) should be strongly preferred. However I
believe there are cases were b) could be acceptable.

5. Building the end object:

a) Always with build() method

b) Allow building object with arbitrary methods

I think the option a) is the most common approach. I think though option
b) should also be allowed if there is a good reason for that. What I can
imagine is when we need some additional information from the build
method (e.g. generic type) or if the method modifies the internal state
of the Builder in a way that it is unsafe to continue setting values on
the Builder.

An overall comment. I think it is good to share opinions on this topic,
but I am afraid there is too many sides to the issue to standardize it
very strictly. I might be wrong though. Really looking forward to the
outcome of this discussion.

Best,

Dawid

On 26/08/2019 14:18, Gyula Fóra wrote:
> Hi All!
>
> I would like to start a code-style related discussion regarding how we
> implement the builder pattern in the Flink project.
>
> It would be the best to have a common approach, there are some aspects of
> the pattern that come to my mind please feel free to add anything I missed:
>
> 1. Creating the builder objects:
>
> a) Always create using static method in "built" class:
>Here we should have naming guidelines: .builder(..) or
> .xyzBuilder(...)
> b) Always use builder class constructor
> c) Mix: Again we should have some guidelines when to use which
>
> I personally prefer option a) to always have a static method to create the
> builder with static method names that end in builder.
>
> 2. Setting properties on the builder:
>
>  a) withSomething(...)
>  b) setSomething(...)
>  c) other
>
> I don't really have a preference but either a or b for consistency.
>
>
> 3. Implementing the builder object:
>
>  a) Immutable -> Creates a new builder object after setting a property
>  b) Mutable -> Returns (this) after setting the property
>
> I personally prefer the mutable version as it keeps the builder
> implementation much simpler and it seems to be a very common way of doing
> it.
>
> What do you all think?
>
> Regards,
> Gyula
>



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] FLIP-54: Evolve ConfigOption and Configuration

2019-08-27 Thread Dawid Wysakowicz
+1 to the FLIP

Also I think we should mention that the voting will last at least 72
hours as requested by the bylaws until 30 Aug 14:00 CEST. (Correct me if
I am wrong Timo)

On 27/08/2019 13:32, Jark Wu wrote:
> +1 to the FLIP.
>
>
> Regards,
> Jark
>
>> 在 2019年8月27日,19:28,Timo Walther  写道:
>>
>> Hi everyone,
>>
>> thanks for the great feedback we have received for the draft of FLIP-54. The 
>> discussion seems to have reached an agreement. Of course this doesn't mean 
>> that we can't propose further improvements on ConfigOption's and Flink 
>> configuration in general in the future. It is just one step towards having a 
>> better unified configuration for the project.
>>
>> Please vote for the following design document:
>>
>> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit#
>>
>> I will convert it to a Wiki page afterwards.
>>
>> Thanks,
>> Timo
>>



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] FLIP-54: Evolve ConfigOption and Configuration

2019-08-27 Thread Dawid Wysakowicz
Actually I wanted to propose a slight change to the proposal. Therefore
I want to change my vote to -1 for now.

I suggest to change the Configurable interface to ConfigurableFactory:

public interface ConfigurableFactory {


    /**

    * Creates an instance from the given configuration.

    */

    TfromConfiguration(ConfigurationReader configuration);


    /**

    * Writes this instance to the given configuration.

    */

    void toConfiguration(T value, ConfigurationWriter configuration);

}

And the corresponding method in the builder to:

 TypedConfigOptionBuilder configurableType(Class> clazz) {

   return new TypedConfigOptionBuilder<>(key, clazz);

}
This way we can keep the "configurable" objects immutable.

Best,

Dawid

On 27/08/2019 13:28, Timo Walther wrote:
> Hi everyone,
>
> thanks for the great feedback we have received for the draft of
> FLIP-54. The discussion seems to have reached an agreement. Of course
> this doesn't mean that we can't propose further improvements on
> ConfigOption's and Flink configuration in general in the future. It is
> just one step towards having a better unified configuration for the
> project.
>
> Please vote for the following design document:
>
> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit#
>
>
> I will convert it to a Wiki page afterwards.
>
> Thanks,
> Timo
>
>


signature.asc
Description: OpenPGP digital signature


[DISCUSS] FLIP-59: Enable execution configuration from Configuration object

2019-08-29 Thread Dawid Wysakowicz
Hi,

I wanted to propose a new, additional way of configuring execution
parameters that can currently be set only on such objects like
ExecutionConfig, CheckpointConfig and StreamExecutionEnvironment. This
poses problems such as:

  * no easy way to configure those from a file
  * there is no easy way to pass a configuration from layers built on
top of StreamExecutionEnvironment. (e.g. when we want to configure
those options from TableEnvironment)
  * they are not automatically documented

Note that there are a few concepts from FLIP-54[1] that this FLIP is
based on.

Would be really grateful to know if you think this would be a valuable
addition and any other feedback.

Best,

Dawid

Wiki page:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object

Google doc:
https://docs.google.com/document/d/1l8jW2NjhwHH1mVPbLvFolnL2vNvf4buUMDZWMfN_hFM/edit?usp=sharing


[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration




signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

2019-08-30 Thread Dawid Wysakowicz
o Walther:
>>>>>>>> Hi Stephan,
>>>>>>>>
>>>>>>>> thanks for your suggestions. Let me give you some background about
>>>>>>>> the decisions made in this FLIP:
>>>>>>>>
>>>>>>>> 1. Goal: The FLIP is labelled "evolve" not "rework" because we did
>>>>>>>> not want to change the entire configuration infrastructure.
>>>>>>>> Both for
>>>>>>>> backwards-compatibility reasons and the amount of work that
>>>>>>>> would be
>>>>>>>> required to update all options. If our goal is to rework the
>>>>>>>> configuration option entirely, I might suggest to switch to JSON
>>>>>>>> format with JSON schema and JSON validator. However, setting
>>>>>>>> properties in a CLI or web interface becomes more tricky the more
>>>>>>>> nested structures are allowed.
>>>>>>>>
>>>>>>>> 2. Class-based Options: The current ConfigOption class is
>>>>>>>> centered
>>>>>>>> around Java classes where T is determined by the default value.
>>>>>>>> The
>>>>>>>> FLIP just makes this more explicit by offering an explicit
>>>>>>>> `intType()` method etc. The current design of validators centered
>>>>>>>> around Java classes makes it possible to have typical domain
>>>>>>>> validators baked by generics as you suggested. If we introduce
>>>>>>>> types
>>>>>>>> such as "quantity with measure and unit" we still need to get a
>>>>>>>> class
>>>>>>>> out of this option at the end, so why changing a proven concept?
>>>>>>>>
>>>>>>>> 3. List Options: The `isList` prevents having arbitrary
>>>>>>>> nesting. As
>>>>>>>> Dawid mentioned, we kept human readability in mind. For every
>>>>>>>> atomic
>>>>>>>> option like "key=12" can be represented by a list "keys=12;13".
>>>>>>>> But
>>>>>>>> we don't want to go further; esp. no nesting. A dedicated list
>>>>>>>> option
>>>>>>>> would start making this more complicated such as
>>>>>>>> "ListOption(ObjectOption(ListOption(IntOption, ...),
>>>>>>>> StringOption(...)))", do we want that?
>>>>>>>>
>>>>>>>> 4. Correlation: The correlation part is one of the suggestions
>>>>>>>> that I
>>>>>>>> like least in the document. We can also discuss removing it
>>>>>>>> entirely,
>>>>>>>> but I think it solves the use case of relating options with each
>>>>>>>> other in a flexible way right next to the actual option.
>>>>>>>> Instead of
>>>>>>>> being hidden in some component initialization, we should put it
>>>>>>>> close
>>>>>>>> to the option to also perform validation eagerly instead of
>>>>>>>> failing
>>>>>>>> at runtime when the option is accessed the first time.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Timo
>>>>>>>>
>>>>>>>>
>>>>>>>> Am 18.08.19 um 23:32 schrieb Stephan Ewen:
>>>>>>>>> A "List Type" sounds like a good direction to me.
>>>>>>>>>
>>>>>>>>> The comment on the type system was a bit brief, I agree. The
>>>>>>>>> idea is
>>>>>>>>> to see
>>>>>>>>> if something like that can ease validation. Especially the
>>>>> correlation
>>>>>>>>> system seems quite complex (proxies to work around order of
>>>>>>>>> initialization).
>>>>>>>>>
>>>>>>>>> For example, let's assume we don't think primarily about "java
>>>>>>>>> types" but
>>>>>>>>> would define types as one of the following (just exa

Re: [DISCUSS] FLIP-59: Enable execution configuration from Configuration object

2019-08-30 Thread Dawid Wysakowicz
Hi Gyula,

Thank you for the support on those changes.

I am not sure if I understood your idea for the "reconfiguration" logic.

The configure method on those objects would take ConfigurationReader. So
user can provide a thin wrapper around Configuration for e.g. filtering
certain logic, changing values based on other parameters etc. Is that
what you had in mind?

Best,

Dawid

On 29/08/2019 19:21, Gyula Fóra wrote:
> Hi!
>
> Huuuge +1 from me, this has been an operational pain for years.
> This would also introduce a nice and simple way to extend it in the future
> if we need.
>
> Ship it!
>
> Gyula
>
> On Thu, Aug 29, 2019 at 5:05 PM Dawid Wysakowicz 
> wrote:
>
>> Hi,
>>
>> I wanted to propose a new, additional way of configuring execution
>> parameters that can currently be set only on such objects like
>> ExecutionConfig, CheckpointConfig and StreamExecutionEnvironment. This
>> poses problems such as:
>>
>>- no easy way to configure those from a file
>>- there is no easy way to pass a configuration from layers built on
>>top of StreamExecutionEnvironment. (e.g. when we want to configure those
>>options from TableEnvironment)
>>- they are not automatically documented
>>
>> Note that there are a few concepts from FLIP-54[1] that this FLIP is based
>> on.
>>
>> Would be really grateful to know if you think this would be a valuable
>> addition and any other feedback.
>>
>> Best,
>>
>> Dawid
>>
>> Wiki page:
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
>>
>> Google doc:
>> https://docs.google.com/document/d/1l8jW2NjhwHH1mVPbLvFolnL2vNvf4buUMDZWMfN_hFM/edit?usp=sharing
>>
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
>>
>>
>>



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] FLIP-54: Evolve ConfigOption and Configuration

2019-08-30 Thread Dawid Wysakowicz
+1 to the design

On 29/08/2019 15:53, Timo Walther wrote:
> I converted the mentioned Google doc into a wiki page:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
>
>
> The core semantics have not changed.
>
> Happy voting,
> Timo
>
> On 29.08.19 04:30, Zili Chen wrote:
>> The design looks good to me.
>>
>> +1 go ahead!
>>
>> Best,
>> tison.
>>
>>
>> Jark Wu  于2019年8月28日周三 下午6:08写道:
>>
>>> Hi Timo,
>>>
>>> The new changes looks good to me.
>>>
>>> +1 to the FLIP.
>>>
>>>
>>> Cheers,
>>> Jark
>>>
>>> On Wed, 28 Aug 2019 at 16:02, Timo Walther  wrote:
>>>
 Hi everyone,

 after some last minute changes yesterday, I would like to start a new
 vote on FLIP-54. The discussion seems to have reached an agreement. Of
 course this doesn't mean that we can't propose further improvements on
 ConfigOption's and Flink configuration in general in the future. It is
 just one step towards having a better unified configuration for the
 project.

 Please vote for the following design document:



>>> https://docs.google.com/document/d/1IQ7nwXqmhCy900t2vQLEL3N2HIdMg-JO8vTzo1BtyKU/edit#
>>>
 The discussion can be found at:



>>> https://lists.apache.org/thread.html/a56c6b52e5f828d4a737602b031e36b5dd6eaa97557306696a8063a9@%3Cdev.flink.apache.org%3E
>>>
 This voting will be open for at least 72 hours. I'll try to close
 it on
 2019-09-02 8:00 UTC, unless there is an objection or not enough votes.

 I will convert it to a Wiki page afterwards.

 Thanks,

 Timo


>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-60: Restructure the Table API & SQL documentation

2019-08-30 Thread Dawid Wysakowicz
+1 to the idea of restructuring the docs.

My only suggestion to consider is how about moving the
User-Defined-Extensions subpages to corresponding broader topics?

Sources & Sinks >> Connect to external systems

Catalogs >> Connect to external systems

and then have a Functions sections with subsections:

functions

    |- built in functions

    |- user defined functions


Best,

Dawid

On 30/08/2019 10:59, Timo Walther wrote:
> Hi everyone,
>
> the Table API & SQL documentation was already in a very good shape in
> Flink 1.8. However, in the past it was mostly presented as an addition
> to DataStream API. As the Table and SQL world is growing quickly,
> stabilizes in its concepts, and is considered as another top-level API
> and closed ecosystem, it is time to restructure the docs a little bit
> to represent the vision of FLIP-32.
>
> Current state:
> https://ci.apache.org/projects/flink/flink-docs-master/dev/table/
>
> We would like to propose the following FLIP-60 for a new structure:
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=127405685
>
>
> Looking forward to feedback.
>
> Thanks,
>
> Timo
>
>
>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] Simplify Flink's cluster level RestartStrategy configuration

2019-08-30 Thread Dawid Wysakowicz
Also +1 in general.

I have a few questions though:

- does it only apply to the logic in
org.apache.flink.runtime.executiongraph.restart.RestartStrategyFactory#createRestartStrategyFactory,
which is only the cluster side configuration? Or do you want to change
the logic also on the job side in ExecutionConfig?

- if the latter, does that mean deprecated methods in ExecutionConfig
like: setNumberOfExecutionRetries, setExecutionRetryDelay will have no
effect? I think this would be a good idea, but would suggest to remove
the corresponding fields and methods. This is not that simple though. I
tried to do that for other parameters that have no effect already like
codeAnalysisMode & failTaskOnCheckpointError. The are two problems:

    1) setNumberOfExecutionRetires are effectively marked with @Public
annotation (the codeAnalysisMode & failTaskOnCheckpointError don't have
this problem). Therefore this would be a binary incompatible change.

    2) ExecutionConfig is stored in state as part of PojoSerializer in
pre flink 1.7. It should not be a problem for numberOfExecutionRetries &
executionRetryDelays as they are of primitive types. It is a problem for
codeAnalysisMode (we cannot remove the class, as this breaks
serialization). I wanted to mention that anyway, just to be aware of that.

Best,

Dawid

On 30/08/2019 14:48, Stephan Ewen wrote:
> +1 in general
>
> What is the default in batch, though? No restarts? I always found that
> somewhat uncommon.
> Should we also change that part, if we are changing the default anyways?
>
>
> On Fri, Aug 30, 2019 at 2:35 PM Till Rohrmann  wrote:
>
>> Hi everyone,
>>
>> I wanted to discuss how to simplify Flink's cluster level RestartStrategy
>> configuration [1]. Currently, Flink's behaviour with respect to configuring
>> the {{RestartStrategies}} is quite complicated and convoluted. The reason
>> for this is that we evolved the way it has been configured and wanted to
>> keep it backwards compatible. Due to this, we have currently the following
>> behaviour:
>>
>> * If the config option `restart-strategy` is configured, then Flink uses
>> this `RestartStrategy` (so far so simple)
>> * If the config option `restart-strategy` is not configured, then
>> ** If `restart-strategy.fixed-delay.attempts` or
>> `restart-strategy.fixed-delay.delay` are defined, then instantiate
>> `FixedDelayRestartStrategy(restart-strategy.fixed-delay.attempts,
>> restart-strategy.fixed-delay.delay)`
>> ** If `restart-strategy.fixed-delay.attempts` and
>> `restart-strategy.fixed-delay.delay` are not defined, then
>> *** If checkpointing is disabled, then choose `NoRestartStrategy`
>> *** If checkpointing is enabled, then choose
>> `FixedDelayRestartStrategy(Integer.MAX_VALUE, "0 s")`
>>
>> I would like to simplify the configuration by removing the "If
>> `restart-strategy.fixed-delay.attempts` or
>> `restart-strategy.fixed-delay.delay`, then" condition. That way, the logic
>> would be the following:
>>
>> * If the config option `restart-strategy` is configured, then Flink uses
>> this `RestartStrategy`
>> * If the config option `restart-strategy` is not configured, then
>> ** If checkpointing is disabled, then choose `NoRestartStrategy`
>> ** If checkpointing is enabled, then choose
>> `FixedDelayRestartStrategy(Integer.MAX_VALUE, "0 s")`
>>
>> That way we retain the user friendliness that jobs restart if the user
>> enabled checkpointing and we make it clear that any `
>> restart-strategy.fixed-delay.xyz` setting will only be respected if
>> `restart-strategy` has been set to `fixed-delay`.
>>
>> This simplification would, however, change Flink's behaviour and might
>> break existing setups. Since we introduced `RestartStrategies` with Flink
>> 1.0.0 and deprecated the prior configuration mechanism which enables
>> restarting if either the `attempts` or the `delay` has been set, I think
>> that the number of broken jobs should be minimal if not non-existent.
>>
>> I'm sure that one can simplify the way RestartStrategies are
>> programmatically configured as well but for the sake of simplicity/scoping
>> I'd like to not touch it right away.
>>
>> What do you think about this behaviour change?
>>
>> [1] https://issues.apache.org/jira/browse/FLINK-13921
>>
>> Cheers,
>> Till
>>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-59: Enable execution configuration from Configuration object

2019-09-02 Thread Dawid Wysakowicz
Hi Gyula,

Yes you are right, we were also considering the external configurer. The
reason we suggest the built in method is that it is more tightly coupled
with the place the options are actually set. Therefore our hope is that,
whenever somebody e.g. adds new fields to the ExecutionConfig he/she
updates also the configure method. I am not entirely against your
suggestion though, if this is the preferred way in the community.

Does anyone has any comments regarding the option keys?

Best,

Dawid

On 30/08/2019 14:57, Gyula Fóra wrote:
> Hi Dawid,
>
> Sorry I misread one of the interfaces a little (Configuration instead of
> ConfigurationReader), you are right.
> I was referring to:
>
>
>-
>
>void StreamExecutionEnvironment.configure(ConfigurationReader)
>
>
> This might be slightly orthogonal to the changes that you made here but
> what I meant is that instead of adding methods to the
> StreamExecutionEnvironment we could make this an external interface:
>
> EnvironmentConfigurer {
>   void configure(StreamExecutionEnvironment, ConfigurationReader)
> }
>
> We could then have a default implementation of the EnvironmentConfigurer
> that would understand built in options.  We could also allow users to pass
> custom implementations of this, which could configure the
> StreamExecutionEnvironment based on user defined config options. This is
> just a rough idea for extensibility and probably out of scope at first.
>
> Cheers,
> Gyula
>
> On Fri, Aug 30, 2019 at 12:13 PM Dawid Wysakowicz 
> wrote:
>
>> Hi Gyula,
>>
>> Thank you for the support on those changes.
>>
>> I am not sure if I understood your idea for the "reconfiguration" logic.
>>
>> The configure method on those objects would take ConfigurationReader. So
>> user can provide a thin wrapper around Configuration for e.g. filtering
>> certain logic, changing values based on other parameters etc. Is that
>> what you had in mind?
>>
>> Best,
>>
>> Dawid
>>
>> On 29/08/2019 19:21, Gyula Fóra wrote:
>>> Hi!
>>>
>>> Huuuge +1 from me, this has been an operational pain for years.
>>> This would also introduce a nice and simple way to extend it in the
>> future
>>> if we need.
>>>
>>> Ship it!
>>>
>>> Gyula
>>>
>>> On Thu, Aug 29, 2019 at 5:05 PM Dawid Wysakowicz >>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I wanted to propose a new, additional way of configuring execution
>>>> parameters that can currently be set only on such objects like
>>>> ExecutionConfig, CheckpointConfig and StreamExecutionEnvironment. This
>>>> poses problems such as:
>>>>
>>>>- no easy way to configure those from a file
>>>>- there is no easy way to pass a configuration from layers built on
>>>>top of StreamExecutionEnvironment. (e.g. when we want to configure
>> those
>>>>options from TableEnvironment)
>>>>- they are not automatically documented
>>>>
>>>> Note that there are a few concepts from FLIP-54[1] that this FLIP is
>> based
>>>> on.
>>>>
>>>> Would be really grateful to know if you think this would be a valuable
>>>> addition and any other feedback.
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>> Wiki page:
>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object
>>>> Google doc:
>>>>
>> https://docs.google.com/document/d/1l8jW2NjhwHH1mVPbLvFolnL2vNvf4buUMDZWMfN_hFM/edit?usp=sharing
>>>>
>>>> [1]
>>>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-54%3A+Evolve+ConfigOption+and+Configuration
>>>>
>>>>
>>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

2019-09-02 Thread Dawid Wysakowicz
hipping to the cluster and accessing options
>>> multiple times. The same for configurable objects. We put the pure
>>> objects into the map without any serialization/deserialization. The
>>> provided factory allows to convert the Object into a Configuration and
>>> we know how to serialize/deserializise a configuration because it is
>>> just a key/value map.
>>>
>>> 2. Yes, this is what we had in mind. It should still be the same
>>> configuration option. We would like to avoid specialized option keys
>>> across components (exec.max-para and table.exec.max-para) if they are
>>> describing basically the same thing. But adding some more description
>>> like "TableOptions.MAX_PARALLELISM with description_1 + description_2"
>>> does not hurt.
>>>
>>> 3. They should restore the original object given that the
>>> toConfiguration/fromConfiguration methods have been implemented
>>> correctly. I will extend the example to make the logic clearer while
>>> fixing the bug.
>>>
>>> Thanks for the healthy discussion,
>>> Timo
>>>
>>>
>>> On 30.08.19 15:29, Becket Qin wrote:
>>>> Hi Timo,
>>>>
>>>> Thanks again for the clarification. Please see a few more questions
>>> below.
>>>> Re: 1
>>>>
>>>>> Please also keep in mind that Configuration must not consist of only
>>>>> strings, it manages a Map for efficient access. Every
>>>>> map entry can have a string representation for persistence, but in
>>>>> most
>>>>> cases consists of unserialized objects.
>>>> I'd like to understand this a bit more. The reason we have a
>>>> Map>>> Object> in Configuration was because that Object could be either a
>>> String,
>>>> a List or a Map, right? But they eventually always boil down to
>>>> Strings,
>>> or
>>>> maybe one of the predefined type that we know how to serialize. In the
>>>> current design, can the Object also be Configurable?
>>>> If the value in the config Map can be Configurable
>>> objects,
>>>> how do we serialize them? Calling toConfiguration() seems not quite
>>> working
>>>> because there might be some other internal fields that are not part of
>>> the
>>>> configuration. The modification to those fields will be lost if we
>>>> simply
>>>> use toConfiguration(). So the impact of modifying those Configurable
>>>> objects seems a little undefined. And it would be difficult to prevent
>>>> users from doing that.
>>>> If the value in the config Map cannot be Configurable
>>>> objects, then it seems a little weird to have all the other ConfigType
>>>> stored in the ConfigMap in their own native type and accessed via
>>>> getInteger() / getBoolean(), etc, while having ConfigurableType to be
>>>> different from others because one have to use ConfigurableFactory
>>>> to get
>>>> the configurations.
>>>>
>>>> Re: 2
>>>>
>>>>> I think about the withExtendedDescription as a helper getter in a
>>>>> different place, so that the option is easier to find in a different
>>>>> module from it was defined.
>>>>> The MAX_PARALLELISM option in TableOptions would conceptually be
>>>>> equal
>>> to:
>>>>> public ConfigOption getMaxParallelismOption() {
>>>>>   return CoreOptions.MAX_PARALLELISM;
>>>>> }
>>>> Just to make sure I understand it correctly, does that mean users will
>>> see
>>>> something like following?
>>>>    - CoreOptions.MAX_PARALLELISM with description_1;
>>>>    - TableOptions.MAX_PARALLELISM with description_1 + description_2.
>>>>    - DataStreamOptions.MAX_PARALLELISM with description_1 +
>>>> description_3.
>>>> And users will only configure exactly one MAX_PARALLELISM cross the
>>> board.
>>>> So they won't be confused by setting two MAX_PARALLELISM config for
>>>> two
>>>> different modules, while they are actually the same config. If that is
>>> the
>>>> case, I don't have further concern.
>>>>
>>>> Re: 3
>>>> Maybe I am thinking too much. I thought toBytes() / fromBytes()
>>>> actually
>>>> restore the original Object. But fromConfiguration() and

Re: Please add me as contributor

2019-09-03 Thread Dawid Wysakowicz
Hi Jan,

Recently the community changed the contribution process a bit and there
are no longer contributor privileges. The jira issues are supposed to be
assigned by committers that are willing to help you with getting the
contribution in. Please look at the contribution guidelines[1]. Do you
have some particular jira ticket in mind that you are interested in
working on?

Best,

Dawid


[1] https://flink.apache.org/contributing/contribute-code.html

On 03/09/2019 10:18, Jan Lukavský wrote:
> Hi,
>
> I'd like to be able to assign JIRAs to myself, can I be added as
> contributor, please? My JIRA ID is 'janl'.
>
> Thanks,
>
>  Jan
>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-03 Thread Dawid Wysakowicz
Hi all,

Just an opinion on the built-in <> temporary functions resolution and
NAMING issue. I think we should not allow overriding the built-in
functions, as this may pose serious issues and to be honest is rather
not feasible and would require major rework. What happens if a user
wants to override CAST? Calls to that function are generated at
different layers of the stack that unfortunately does not always go
through the Catalog API (at least yet). Moreover from what I've checked
no other systems allow overriding the built-in functions. All the
systems I've checked so far register temporary functions in a
database/schema (either special database for temporary functions, or
just current database). What I would suggest is to always register
temporary functions with a 3 part identifier. The same way as tables,
views etc. This effectively means you cannot override built-in
functions. With such approach it is natural that the temporary functions
end up a step lower in the resolution order:

1. built-in functions (1 part, maybe 2? - this is still under discussion)

2. temporary functions (always 3 part path)

3. catalog functions (always 3 part path)

Let me know what do you think.

Best,

Dawid

On 04/09/2019 06:13, Bowen Li wrote:
> Hi,
>
> I agree with Xuefu that the main controversial points are mainly the two
> places. My thoughts on them:
>
> 1) Determinism of referencing Hive built-in functions. We can either remove
> Hive built-in functions from ambiguous function resolution and require
> users to use special syntax for their qualified names, or add a config flag
> to catalog constructor/yaml for turning on and off Hive built-in functions
> with the flag set to 'false' by default and proper doc added to help users
> make their decisions.
>
> 2) Flink temp functions v.s. Flink built-in functions in ambiguous function
> resolution order. We believe Flink temp functions should precede Flink
> built-in functions, and I have presented my reasons. Just in case if we
> cannot reach an agreement, I propose forbid users registering temp
> functions in the same name as a built-in function, like MySQL's approach,
> for the moment. It won't have any performance concern, since built-in
> functions are all in memory and thus cost of a name check will be really
> trivial.
>
>
> On Tue, Sep 3, 2019 at 8:01 PM Xuefu Z  wrote:
>
>> From what I have seen, there are a couple of focal disagreements:
>>
>> 1. Resolution order: temp function --> flink built-in function --> catalog
>> function vs flink built-in function --> temp function -> catalog function.
>> 2. "External" built-in functions: how to treat built-in functions in
>> external system and how users reference them
>>
>> For #1, I agree with Bowen that temp function needs to be at the highest
>> priority because that's how a user might overwrite a built-in function
>> without referencing a persistent, overwriting catalog function with a fully
>> qualified name. Putting built-in functions at the highest priority
>> eliminates that usage.
>>
>> For #2, I saw a general agreement on referencing "external" built-in
>> functions such as those in Hive needs to be explicit and deterministic even
>> though different approaches are proposed. To limit the scope and simply the
>> usage, it seems making sense to me to introduce special syntax for user  to
>> explicitly reference an external built-in function such as hive1::sqrt or
>> hive1._built_in.sqrt. This is a DML syntax matching nicely Catalog API call
>> hive1.getFunction(ObjectPath functionName) where the database name is
>> absent for bulit-in functions available in that catalog hive1. I understand
>> that Bowen's original proposal was trying to avoid this, but this could
>> turn out to be a clean and simple solution.
>>
>> (Timo's modular approach is great way to "expand" Flink's built-in function
>> set, which seems orthogonal and complementary to this, which could be
>> tackled in further future work.)
>>
>> I'd be happy to hear further thoughts on the two points.
>>
>> Thanks,
>> Xuefu
>>
>> On Tue, Sep 3, 2019 at 7:11 PM Kurt Young  wrote:
>>
>>> Thanks Timo & Bowen for the feedback. Bowen was right, my proposal is the
>>> same
>>> as Bowen's. But after thinking about it, I'm currently lean to Timo's
>>> suggestion.
>>>
>>> The reason is backward compatibility. If we follow Bowen's approach,
>> let's
>>> say we
>>> first find function in Flink's built-in functions, and then hive's
>>> built-in. For example, `foo`
>>> is not supported by Flink, but hive has such built-in function. So user
>>> will have hive's
>>> behavior for function `foo`. And in next release, Flink realize this is a
>>> very popular function
>>> and add it into Flink's built-in functions, but with different behavior
>> as
>>> hive's. So in next
>>> release, the behavior changes.
>>>
>>> With Timo's approach, IIUC user have to tell the framework explicitly
>> what
>>> kind of
>>> built-in functions he would like to use. He can just tell framework

Re: [DISCUSS] FLIP-54: Evolve ConfigOption and Configuration

2019-09-04 Thread Dawid Wysakowicz
Hi Becket,

You are right, that what we had in mind for
ExecutionConfig/CheckpointConfig etc. is the option b) from your email.
In the context of the FLIP-54, those objects are not Configurable. What
we understood as a Configurable by the FLIP-54 are a simple pojos, that
are stored under a single key. Such as the examples either from the ML
thread (Host) or from the design doc (CacheFile). So when configuring
the host user can provide a host like this:

connector.host: address:localhost, port:1234

rather than

connector.host.address: localhost

connector.host.port: 1234

This is important especially if one wants to configure lists of such
objects:

connector.hosts: address:localhost,port:1234;address:localhost,port:4567

The intention was definitely not to store whole complex objects, such as
ExecutionConfig, CheckpointConfig etc. that contain multiple different
options Maybe it makes sense to call it ConfigObject as Aljosha
suggested? What do you think? Would that make it more understandable?

For the initialization/configuration of objects such as ExecutionConfig,
CheckpointConfig you may have a look at FLIP-59[1] where we suggest to
add a configure method to those classes and we pretty much describe the
process you outline in the last message.

Best,

Dawid

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-59%3A+Enable+execution+configuration+from+Configuration+object

On 04/09/2019 03:37, Becket Qin wrote:
> Hi Timo, Dawid and Aljoscha,
>
> Thanks for clarifying the goals. It is very helpful to understand the
> motivation here. It would be great to add them to the FLIP wiki.
>
> I agree that the current FLIP design achieves the two goals it wants to
> achieve. But I am trying to see is if the current approach is the most
> reasonable approach.
>
> Please let me check if I understand this correctly. From end users'
> perspective, they will do the following when they want to configure their
> Flink Jobs.
> 1. Create a Configuration instance, and call setters of Configuration with
> the ConfigOptions defined in different components.
> 2. The Configuration created in step 1 will be passed around, and each
> component will just exact their own options from it.
> 3. ExecutionConfig, CheckpointConfig (and other Config classes) will become
> a Configurable, which is responsible for extracting the configuration
> values from the Configuration set by users in step 1.
>
> The confusion I had was that in step 1, how users are going to set the
> configs for the ExecutionConfig / CheckpointConfig? There may be two ways:
> a) Users will call setConfigurable(ExectionConfigConfigurableOption,
> "config1:v1,config2:v2,config3:v3"), i.e. the entire ExecutionConfig is
> exposed as a Configurable to the users.
> b) Users will call setInteger(MAX_PARALLELISM, 1),
> setInteger(LATENCY_TRACKING_INTERVAL, 1000), etc.. This means users will
> set individual ConfigOptions for the ExecutionConfig. And they do not see
> ExecutionConfig as a Configurable.
>
> I assume we are following b), then do we need to expose Configurable to the
> users in this FLIP? My concern is that the Configurable may be related to
> other mechanism such as plugin which we have not really thought through in
> this FLIP.
>
> I know Becket at least has some thoughts about immutability and loading
>> objects via the configuration but maybe they could be put into a follow-up
>> FLIP if they are needed.
> I am perfectly fine to leave something out of the scope of this FLIP to
> later FLIPs. But I think it is important to avoid introducing something in
> this FLIP that will be shortly changed by the follow-up FLIPs.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
> On Tue, Sep 3, 2019 at 8:47 PM Aljoscha Krettek  wrote:
>
>> Hi,
>>
>> I think it’s important to keep in mind the original goals of this FLIP and
>> not let the scope grow indefinitely. As I recall it, the goals are:
>>
>>  - Extend the ConfigOption system enough to allow the Table API to
>> configure options that are right now only available on
>> CheckpointingOptions, ExecutionConfig, and StreamExecutionEnvironment. We
>> also want to do this without manually having to “forward” all the available
>> configuration options by introducing equivalent setters in the Table API
>>
>>  - Do the above while keeping in mind that eventually we want to allow
>> users to configure everything from either the flink-conf.yaml, vie command
>> line parameters, or via a Configuration.
>>
>> I think the FLIP achieves this, with the added side goals of making
>> validation a part of ConfigOptions, making them type safe, and making the
>> validation constraints documentable (via automatic doc generation.) All
>> this without breaking backwards compatibility, if I’m not mistaken.
>>
>> I think we should first agree what the basic goals are so that we can
>> quickly converge to consensus on this FLIP because it blocks other
>> people/work. Among other things FLIP-59 depends on this. What are other
>> opinions that peop

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-04 Thread Dawid Wysakowicz
on't think hive functions deserves be a function module
>
> Our goal is not to create a Hive clone. We need to think forward and
> Hive is just one of many systems that we can support. Not every
> built-in function behaves and will behave exactly like Hive.
>
> > regarding temporary functions, there are few systems that support it
>
> IMHO Spark and Hive are not always the best examples for consistent
> design. Systems like Postgres, Presto, or SQL Server should be used as
> a reference. I don't think that a user can overwrite a built-in
> function there.
>
> Regards,
> Timo
>
> [1] https://www.postgresql.org/docs/10/extend-extensions.html
> [2] https://prestodb.github.io/docs/current/develop/functions.html
>
>
> On 04.09.19 13:44, Jark Wu wrote:
>> Hi all,
>>
>> Regarding #1 temp function <> built-in function and naming.
>> I'm fine with temp functions should precede built-in function and can
>> override built-in functions (we already support to override built-in
>> function in 1.9).
>> If we don't allow the same name as a built-in function, I'm afraid we
>> will
>> have compatibility issues in the future.
>> Say users register a user defined function named "explode" in 1.9,
>> and we
>> support a built-in "explode" function in 1.10.
>> Then the user's jobs which call the registered "explode" function in 1.9
>> will all fail in 1.10 because of naming conflict.
>>
>> Regarding #2 "External" built-in functions.
>> I think if we store external built-in functions in catalog, then
>> "hive1::sqrt" is a good way to go.
>> However, I would prefer to support a discovery mechanism (e.g. SPI) for
>> built-in functions as Timo suggested above.
>> This gives us the flexibility to add Hive or MySQL or Geo or whatever
>> function set as built-in functions in an easy way.
>>
>> Best,
>> Jark
>>
>> On Wed, 4 Sep 2019 at 17:47, Xuefu Z  wrote:
>>
>>> Hi David,
>>>
>>> Thank you for sharing your findings. It seems to me that there is no
>>> SQL
>>> standard regarding temporary functions. There are few systems that
>>> support
>>> it. Here are what I have found:
>>>
>>> 1. Hive: no DB qualifier allowed. Can overwrite built-in.
>>> 2. Spark: basically follows Hive (
>>>
>>> https://docs.databricks.com/spark/latest/spark-sql/language-manual/create-function.html
>>>
>>> )
>>> 3. SAP SQL Anywhere Server: can have owner (db?). Not sure of
>>> overwriting
>>> behavior. (
>>> http://dcx.sap.com/sqla170/en/html/816bdf316ce210148d3acbebf6d39b18.html)
>>>
>>>
>>> Because of lack of standard, it's perfectly fine for Flink to define
>>> whatever it sees appropriate. Thus, your proposal (no overwriting
>>> and must
>>> have DB as holder) is one option. The advantage is simplicity, The
>>> downside
>>> is the deviation from Hive, which is popular and de facto standard
>>> in big
>>> data world.
>>>
>>> However, I don't think we have to follow Hive. More importantly, we
>>> need a
>>> consensus. I have no objection if your proposal is generally agreed
>>> upon.
>>>
>>> Thanks,
>>> Xuefu
>>>
>>> On Tue, Sep 3, 2019 at 11:58 PM Dawid Wysakowicz
>>> 
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Just an opinion on the built-in <> temporary functions resolution and
>>>> NAMING issue. I think we should not allow overriding the built-in
>>>> functions, as this may pose serious issues and to be honest is rather
>>>> not feasible and would require major rework. What happens if a user
>>>> wants to override CAST? Calls to that function are generated at
>>>> different layers of the stack that unfortunately does not always go
>>>> through the Catalog API (at least yet). Moreover from what I've
>>>> checked
>>>> no other systems allow overriding the built-in functions. All the
>>>> systems I've checked so far register temporary functions in a
>>>> database/schema (either special database for temporary functions, or
>>>> just current database). What I would suggest is to always register
>>>> temporary functions with a 3 part identifier. The same way as tables,
>>>> views etc. This effectively means you cannot override built-in
>>>> functions. With such approach it 

[DISCUSS] FLIP-64: Support for Temporary Objects in Table module

2019-09-04 Thread Dawid Wysakowicz
Hi all,

As part of FLIP-30
a
Catalog API was introduced that enables storing table meta objects
permanently. At the same time the majority of current APIs create
temporary objects that cannot be serialized. We should clarify the
creation of meta objects (tables, views, functions) in a unified way.

Another current problem in the API is that all the temporary objects are
stored in a special built-in catalog, which is not very intuitive for
many users, as they must be aware of that catalog to reference temporary
objects.

Lastly, different APIs have different ways of providing object paths:

  * String path…, 
  * String path, String pathContinued…
  * String name

We should choose one approach and unify it across all APIs.

I suggest a FLIP to address the above issues.

Looking forward to your opinions.

FLIP link:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-04 Thread Dawid Wysakowicz
Hi Xuefu,

Just wanted to summarize my opinion on the one topic (temporary functions).

My preference would be to make temporary functions always 3-part
qualified (as a result that would prohibit overriding built-in
functions). Having said that if the community decides that it's better
to allow overriding built-in functions I am fine with it and can commit
to that decision.

I wanted to ask if you could clarify a few points for me around that option.

 1. Would you enforce temporary functions to be always just a single
name (without db & cat) as hive does, or would you allow also 3 or
even 2 part identifiers?
 2. Assuming 2/3-part paths. How would you register a function from a
following statement: CREATE TEMPORARY FUNCTION db.func? Would that
shadow all functions named 'func' in all databases named 'db' in all
catalogs? Or would you shadow only function 'func' in database 'db'
in current catalog?
 3. This point is still under discussion, but was mentioned a few times,
that maybe we want to enable syntax cat.func for "external built-in
functions". How would that affect statement from previous point?
Would 'db.func' shadow "external built-in function" in 'db' catalog
or user functions as in point 2? Or maybe both?
 4. Lastly in fact to summarize the previous points. Assuming 2/3-part
paths. Would the function resolution be actually as follows?:
 1. temporary functions (1-part path)
 2. built-in functions
 3. temporary functions (2-part path)
 4. 2-part catalog functions a.k.a. "external built-in functions"
(cat + func) - this is still under discussion, if we want that
in the other focal point
 5. temporary functions (3-part path)
 6. 3-part catalog functions a.k.a. user functions

I would be really grateful if you could explain me those questions, thanks.

BTW, Thank you all for a healthy discussion.

Best,

Dawid

On 04/09/2019 23:25, Xuefu Z wrote:
> Thank all for the sharing thoughts. I think we have gathered some useful
> initial feedback from this long discussion with a couple of focal points
> sticking out.
>
>  We will go back to do more research and adapt our proposal. Once it's
> ready, we will ask for a new round of review. If there is any disagreement,
> we will start a new discussion thread on each rather than having a mega
> discussion like this.
>
> Thanks to everyone for participating.
>
> Regards,
> Xuefu
>
>
> On Thu, Sep 5, 2019 at 2:52 AM Bowen Li  wrote:
>
>> Let me try to summarize and conclude the long thread so far:
>>
>> 1. For order of temp function v.s. built-in function:
>>
>> I think Dawid's point that temp function should be of fully qualified path
>> is a better reasoning to back the newly proposed order, and i agree we
>> don't need to follow Hive/Spark.
>>
>> However, I'd rather not change fundamentals of temporary functions in this
>> FLIP. It belongs to a bigger story of how temporary objects should be
>> redefined and be handled uniformly - currently temporary tables and views
>> (those registered from TableEnv#registerTable()) behave different than what
>> Dawid propose for temp functions, and we need a FLIP to just unify their
>> APIs and behaviors.
>>
>> I agree that backward compatibility is not an issue w.r.t Jark's points.
>>
>> ***Seems we do have consensus that it's acceptable to prevent users
>> registering a temp function in the same name as a built-in function. To
>> help us move forward, I'd like to propose setting such a restraint on temp
>> functions in this FLIP to simplify the design and avoid disputes.*** It
>> will also leave rooms for improvements in the future.
>>
>>
>> 2. For Hive built-in function:
>>
>> Thanks Timo for providing the Presto and Postgres examples. I feel modular
>> built-in functions can be a good fit for the geo and ml example as a native
>> Flink extension, but not sure if it fits well with external integrations.
>> Anyway, I think modular built-in functions is a bigger story and can be on
>> its own thread too, and our proposal doesn't prevent Flink from doing that
>> in the future.
>>
>> ***Seems we have consensus that users should be able to use built-in
>> functions of Hive or other external systems in SQL explicitly and
>> deterministically regardless of Flink built-in functions and the potential
>> modular built-in functions, via some new syntax like "mycat::func"? If so,
>> I'd like to propose removing Hive built-in functions from ambiguous
>> function resolution order, and empower users with such a syntax. Thi

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-06 Thread Dawid Wysakowicz
Hi Xuefu,

Thank you for your answers.

Let me summarize my understanding. In principle we differ only in
regards to the fact if a temporary function can be only 1-part or only
3-part identified. I can reconfirm that if the community decides it
prefers the 1-part approach I will commit to that, with the assumption
that we will force ONLY 1-part function names. (We will parse identifier
and throw exception if a user tries to register e.g. db.temp_func).

My preference is though the 3-part approach:

  * there are some functions that it makes no sense to override, e.g.
CAST, moreover I'm afraid that allowing overriding such will lead to
high inconsistency, similar to those that I mentioned spark has
  * you cannot shadow a fully-qualified function. (If a user fully
qualifies his/her objects in a SQL query, which is often considered
a good practice)
  * it does not differentiate between functions & temporary functions.
Temporary functions just differ with regards to their life-cycle.
The registration & usage is exactly the same.

As it can be seen, the proposed concept regarding temp function and
function resolution is quite simple.

Both approaches are equally simple. I would even say the 3-part approach
is slightly simpler as it does not have to care about some special
built-in functions such as CAST.

I don't want to express my opinion on the differentiation between
built-in functions and "external" built-in functions in this thread as
it is rather orthogonal, but I also like the modular approach and I
definitely don't like the special syntax "cat::function". I think it's
better to stick to a standard or at least other proved solutions from
other systems.

Best,

Dawid

On 05/09/2019 10:12, Xuefu Z wrote:
> Hi David,
>
> Thanks for sharing your thoughts and  request for clarifications. I believe
> that I fully understood your proposal, which does has its merit. However,
> it's different from ours. Here are the answers to your questions:
>
> Re #1: yes, the temp functions in the proposal are global and have just
> one-part names, similar to built-in functions. Two- or three-part names are
> not allowed.
>
> Re #2: not applicable as two- or three-part names are disallowed.
>
> Re #3: same as above. Referencing external built-in functions is achieved
> either implicitly (only the built-in functions in the current catalogs are
> considered) or via special syntax such as cat::function. However, we are
> looking into the modular approach that Time suggested with other feedback
> received from the community.
>
> Re #4: the resolution order goes like the following in our proposal:
>
> 1. temporary functions
> 2. bulit-in functions (including those augmented by add-on modules)
> 3. built-in functions in current catalog (this will not be needed if the
> special syntax "cat::function" is required)
> 4. functions in current catalog and db.
>
> If we go with the modular approach and make external built-in functions as
> an add-on module, the 2 and 3 above will be combined. In essence, the
> resolution order is equivalent in the two approaches.
>
> By the way, resolution order matters only for simple name reference. For
> names such as db.function (interpreted as current_cat/db/function) or
> cat.db.function, the reference is unambiguous, so on resolution is needed.
>
> As it can be seen, the proposed concept regarding temp function and
> function resolution is quite simple. Additionally, the proposed resolution
> order allows temp function to shadow a built-in function, which is
> important (though not decisive) in our opinion.
>
> I started liking the modular approach as the resolution order will only
> include 1, 2, and 4, which is simpler and more generic. That's why I
> suggested we look more into this direction.
>
> Please let me know if there are further questions.
>
> Thanks,
> Xuefu
>
>
>
>
> On Thu, Sep 5, 2019 at 2:42 PM Dawid Wysakowicz 
> wrote:
>
>> Hi Xuefu,
>>
>> Just wanted to summarize my opinion on the one topic (temporary functions).
>>
>> My preference would be to make temporary functions always 3-part qualified
>> (as a result that would prohibit overriding built-in functions). Having
>> said that if the community decides that it's better to allow overriding
>> built-in functions I am fine with it and can commit to that decision.
>>
>> I wanted to ask if you could clarify a few points for me around that
>> option.
>>
>>1. Would you enforce temporary functions to be always just a single
>>name (without db & cat) as hive does, or would you allow also 3 or even 2
>>part identifiers?
>>2. Assuming 2/3-part paths. How would you regi

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

2019-09-06 Thread Dawid Wysakowicz
Hi all,

@Jingsong Could you elaborate a bit more what do you mean by

/"//some Connectors//are difficult to convert all states to properties"/

All the Flink provided connectors will definitely be expressible with
properties (In the end you should be able to use them from DDL). I think
if a TableSource is complex enough that it handles filter push down,
partition support etc. should rather be made available both from DDL &
java/scala code. I'm happy to reconsider adding
registerTemporaryTable(String path, TableSource source) if you have some
concrete examples in mind.
//


@Xuefu: We also considered the ObjectIdentifier (or actually introducing
a new identifier representation to differentiate between resolved and
unresolved identifiers) with the same concerns. We decided to suggest
the string & parsing logic because of usability.

/    tEnv.from("cat.db.table")/

is shorter and easier to write than

/    tEnv.from(Identifier.for("cat", "db", "name")/

And also implicitly solves the problem what happens if a user (e.g. used
to other systems) uses that API in a following manner:

/    tEnv.from(Identifier.for("db.name")/

I'm happy to revisit it if the general consensus is that it's better to
use the OO aproach.

Best,

Dawid
//

On 06/09/2019 10:00, Xuefu Z wrote:
> Thanks to Dawid for starting the discussion and writeup. It looks pretty
> good to me except that I'm a little concerned about the object reference
> and string parsing in the code, which seems to an anti-pattern to OOP. Have
> we considered using ObjectIdenitifier with optional catalog and db parts,
> esp. if we are worried about arguments of variable length or method
> overloading? It's quite likely that the result of string parsing is an
> ObjectIdentifier instance any way.
>
> Having string parsing logic in the code is a little dangerous as it
> duplicates part of the DDL/DML parsing, and they can easily get out of sync.
>
> Thanks,
> Xuefu
>
> On Fri, Sep 6, 2019 at 1:57 PM JingsongLee 
> wrote:
>
>> Thanks dawid, +1 for this approach.
>>
>> One concern is the removal of registerTableSink & registerTableSource
>>  in TableEnvironment. It has two alternatives:
>> 1.the properties approach (DDL, descriptor).
>> 2.from/toDataStream.
>>
>> #1 can only be properties, not java states, and some Connectors
>>  are difficult to convert all states to properties.
>> #2 can contain java state. But can't use TableSource-related features,
>> like project & filter push down, partition support, etc..
>>
>> Any idea about this?
>>
>> Best,
>> Jingsong Lee
>>
>>
>> --
>> From:Dawid Wysakowicz 
>> Send Time:2019年9月4日(星期三) 22:20
>> To:dev 
>> Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module
>>
>> Hi all,
>> As part of FLIP-30 a Catalog API was introduced that enables storing table
>> meta objects permanently. At the same time the majority of current APIs
>> create temporary objects that cannot be serialized. We should clarify the
>> creation of meta objects (tables, views, functions) in a unified way.
>> Another current problem in the API is that all the temporary objects are
>> stored in a special built-in catalog, which is not very intuitive for many
>> users, as they must be aware of that catalog to reference temporary objects.
>> Lastly, different APIs have different ways of providing object paths:
>>
>> String path…,
>> String path, String pathContinued…
>> String name
>> We should choose one approach and unify it across all APIs.
>> I suggest a FLIP to address the above issues.
>> Looking forward to your opinions.
>> FLIP link:
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module
>>
>>


signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-06 Thread Dawid Wysakowicz
I agree the consequences of the decision are substantial. Let's see what
others think.

-- Catalog functions are defined by users, and we suppose they can
drop/alter it in any way they want. Thus, overwriting a catalog function
doesn't seem to be a strong use case that we should be concerned about.
Rather, there are known use case for overwriting built-in functions.

Not always users are in full control of the catalog functions. There is
also the case where different teams manage the catalog & use the
catalog. As for overriding built-in functions with 3-part approach user
can always use an equally named function from a catalog. E.g. to override

/    SELECT explode(arr) FROM .../

user can always write:

/    SELECT db.explode(arr) FROM .../

Best,

Dawid
//

On 06/09/2019 10:54, Xuefu Z wrote:
> Hi Dawid,
>
> Thank you for your summary. While the only difference in the two proposals
> is one- or three-part in naming, the consequence would be substantial.
>
> To me, there are two major use cases of temporary functions compared to
> persistent ones:
> 1. Temporary in nature and auto managed by the session. More often than
> not, admin doesn't even allow user to create persistent functions.
> 2. Provide an opportunity to overwriting system built-in functions.
>
> Since built-in functions has one-part name, requiring three-part name for
> temporary functions eliminates the overwriting opportunity.
>
> One-part naming essentially puts all temp functions under a single
> namespace and simplifies function resolution, such as we don't need to
> consider the case of a temp function and a persistent function with the
> same name under the same database.
>
> I agree having three-parts does have its merits, such as consistency with
> other temporary objects (table) and minor difference between temp vs
> catalog functions. However, there is a slight difference between tables and
> function in that there is no built-in table in SQL so there is no need to
> overwrite it.
>
> I'm not sure if I fully agree the benefits you listed as the advantages of
> the three-part naming of temp functions.
>   -- Allowing overwriting built-in functions is a benefit and the solution
> for disallowing certain overwriting shouldn't be totally banning it.
>   -- Catalog functions are defined by users, and we suppose they can
> drop/alter it in any way they want. Thus, overwriting a catalog function
> doesn't seem to be a strong use case that we should be concerned about.
> Rather, there are known use case for overwriting built-in functions.
>
> Thus, personally I would prefer one-part name for temporary functions. In
> lack of SQL standard on this, I certainly like to get opinions from others
> to see if a consensus can be eventually reached.
>
> (To your point on modular approach to support external built-in functions,
> we saw the value and are actively looking into it. Thanks for sharing your
> opinion on that.)
>
> Thanks,
> Xuefu
>
> On Fri, Sep 6, 2019 at 3:48 PM Dawid Wysakowicz 
> wrote:
>
>> Hi Xuefu,
>>
>> Thank you for your answers.
>>
>> Let me summarize my understanding. In principle we differ only in regards
>> to the fact if a temporary function can be only 1-part or only 3-part
>> identified. I can reconfirm that if the community decides it prefers the
>> 1-part approach I will commit to that, with the assumption that we will
>> force ONLY 1-part function names. (We will parse identifier and throw
>> exception if a user tries to register e.g. db.temp_func).
>>
>> My preference is though the 3-part approach:
>>
>>- there are some functions that it makes no sense to override, e.g.
>>CAST, moreover I'm afraid that allowing overriding such will lead to high
>>inconsistency, similar to those that I mentioned spark has
>>- you cannot shadow a fully-qualified function. (If a user fully
>>qualifies his/her objects in a SQL query, which is often considered a good
>>practice)
>>- it does not differentiate between functions & temporary functions.
>>Temporary functions just differ with regards to their life-cycle. The
>>registration & usage is exactly the same.
>>
>> As it can be seen, the proposed concept regarding temp function and
>> function resolution is quite simple.
>>
>> Both approaches are equally simple. I would even say the 3-part approach
>> is slightly simpler as it does not have to care about some special built-in
>> functions such as CAST.
>>
>> I don't want to express my opinion on the differentiation between built-in
>> functions and "external" built-in functions in this thread as it is 

Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

2019-09-06 Thread Dawid Wysakowicz
Congratulations Klou!

Best,

Dawid

On 06/09/2019 14:55, Fabian Hueske wrote:
> Hi everyone,
>
> I'm very happy to announce that Kostas Kloudas is joining the Flink PMC.
> Kostas is contributing to Flink for many years and puts lots of effort in
> helping our users and growing the Flink community.
>
> Please join me in congratulating Kostas!
>
> Cheers,
> Fabian
>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-11 Thread Dawid Wysakowicz
Hi Fabian,
Thank you for your response.
Regarding the temporary function, just wanted to clarify one thing: the
3-part identifier does not mean the user always has to provide the catalog
& database explicitly. The same way user does not have to provide them in
e.g. when creating permanent table, view etc. It means though functions are
always stored within a database. The same way as all the permanent objects
and other temporary objects(tables, views). If not given explicitly the
current catalog & database would be used, both in the create statement or
when using the function.

Point taken though your preference would be to support overriding built-in
functions.

Best,
Dawid

On Wed, 11 Sep 2019, 21:14 Fabian Hueske,  wrote:

> Hi all,
>
> I'd like to add my opinion on this topic as well ;-)
>
> In general, I think overriding built-in function with temp functions has a
> couple of benefits but also a few challenges:
>
> * Users can reimplement the behavior of a built-in functions of a different
> system, e.g., for backward compatibility after a migration.
> * I don't think that "accidental" overrides and surprising semantics are an
> issue or dangerous. The user registered the temp function in the same
> session and should therefore be aware of the changed semantics.
> * I see that not all built-in functions can be overridden, like the CAST
> example that Dawid gave. However, I think these should be a small fraction
> and such functions could be blacklisted. Sure, that's not super consistent,
> but should (IMO) not be a big issue in practice.
> * Temp functions should be easy to use. Requiring a 3-part addressing makes
> them a lot less user friendly, IMO. Users need to think about what catalog
> and db to choose when registering them. Also using a temp function in a
> query becomes less convenient. Moreover, I agree with Bowen's concerns that
> a 3-part addressing scheme reduces the temporal appearance of the function.
>
> From the three possible solutions, my preference order is
> 1) 1-part address with override of built-in
> 2) 1-part address without override of built-in
> 3) 3-part address
>
> Regarding the issue of external built-in functions, I don't think that
> Timo's proposal of modules is fully orthogonal to this discussion.
> A Hive function module could be an alternative to offering Hive functions
> as part of Hive's catalog.
> From a user's point of view, I think that modules would be a "cleaner"
> integration ("Why do I need a Hive catalog if all I want to do is apply a
> Hive function on a Kafka table?").
> However, the module approach clearly has the problem of dealing with
> same-named functions in different modules (e.g., a Hive function and a
> Flink built-in function).
> The catalog approach as the benefit that functions can be addressed like
> hiveCat::func (or a similar path).
>
> I'm not sure what's the best solution here.
>
> Cheers,
> Fabian
>
>
> Am Mo., 9. Sept. 2019 um 06:30 Uhr schrieb Bowen Li :
>
> > Hi,
> >
> > W.r.t temp functions, I feel both options have their benefits and can
> > theoretically achieve similar functionalities one way or another. In the
> > end, it's more about use cases, users habits, and trade-offs.
> >
> > Re> Not always users are in full control of the catalog functions. There
> is
> > also the case where different teams manage the catalog & use the catalog.
> >
> > Temp functions live within a session, and not within a catalog. Having
> > 3-part paths may implies temp functions are tied to a catalog in two
> > aspects.
> > 1) it may indicate each catalog manages their temp functions, which is
> not
> > true as we seem all agree they should reside at a central place, either
> in
> > FunctionCatalog or CatalogManager
> > 2) it may indicate there's some access control. When users are forbidden
> to
> > manipulate some objects in the catalog that's managed by other teams, but
> > are allowed to manipulate some other objects (temp functions in this
> case)
> > belonging to the catalog in namespaces, users may think we introduced
> extra
> > complexity and confusion with some kind of access control into the
> problem.
> > It doesn't feel intuitive enough for end users.
> >
> > Thus, I'd be in favor of 1-part path for temporary functions, and other
> > temp objects.
> >
> > Thanks,
> > Bowen
> >
> >
> >
> > On Fri, Sep 6, 2019 at 2:16 AM Dawid Wysakowicz 
> > wrote:
> >
> > > I agree the consequences of the decision are substantial. Let's see
> what
> > &

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-16 Thread Dawid Wysakowicz
Hi,
Another idea to consider on top of Timo's suggestion. How about we have a
special namespace (catalog + database) for built-in objects? This catalog
would be invisible for users as Xuefu was suggesting.

Then users could still override built-in functions, if they fully qualify
object with the built-in namespace, but by default the common logic of
current dB & cat would be used.

CREATE TEMPORARY FUNCTION func ...
registers temporary function in current cat & dB

CREATE TEMPORARY FUNCTION cat.db.func ...
registers temporary function in cat db

CREATE TEMPORARY FUNCTION system.system.func ...
Overrides built-in function with temporary function

The built-in/system namespace would not be writable for permanent objects.
WDYT?

This way I think we can have benefits of both solutions.

Best,
Dawid


On Tue, 17 Sep 2019, 07:24 Timo Walther,  wrote:

> Hi Bowen,
>
> I understand the potential benefit of overriding certain built-in
> functions. I'm open to such a feature if many people agree. However, it
> would be great to still support overriding catalog functions with
> temporary functions in order to prototype a query even though a
> catalog/database might not be available currently or should not be
> modified yet. How about we support both cases?
>
> CREATE TEMPORARY FUNCTION abs
> -> creates/overrides a built-in function and never consideres current
> catalog and database; inconsistent with other DDL but acceptable for
> functions I guess.
> CREATE TEMPORARY FUNCTION cat.db.fun
> -> creates/overrides a catalog function
>
> Regarding "Flink don't have any other built-in objects (tables, views)
> except functions", this might change in the near future. Take
> https://issues.apache.org/jira/browse/FLINK-13900 as an example.
>
> Thanks,
> Timo
>
> On 14.09.19 01:40, Bowen Li wrote:
> > Hi Fabian,
> >
> > Yes, I agree 1-part/no-override is the least favorable thus I didn't
> > include that as a voting option, and the discussion is mainly between
> > 1-part/override builtin and 3-part/not override builtin.
> >
> > Re > However, it means that temp functions are differently treated than
> > other db objects.
> > IMO, the treatment difference results from the fact that functions are a
> > bit different from other objects - Flink don't have any other built-in
> > objects (tables, views) except functions.
> >
> > Cheers,
> > Bowen
> >
>
>


Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Dawid Wysakowicz
42 AM Aljoscha Krettek 
> wrote:
>
> > Hi,
> >
> > I think this discussion and the one for FLIP-64 are very connected. To
> > resolve the differences, think we have to think about the basic
> principles
> > and find consensus there. The basic questions I see are:
> >
> >  - Do we want to support overriding builtin functions?
> >  - Do we want to support overriding catalog functions?
> >  - And then later: should temporary functions be tied to a
> > catalog/database?
> >
> > I don’t have much to say about these, except that we should somewhat
> stick
> > to what the industry does. But I also understand that the industry is
> > already very divided on this.
> >
> > Best,
> > Aljoscha
> >
> > > On 18. Sep 2019, at 11:41, Jark Wu  wrote:
> > >
> > > Hi,
> > >
> > > +1 to strive for reaching consensus on the remaining topics. We are
> > close to the truth. It will waste a lot of time if we resume the topic
> some
> > time later.
> > >
> > > +1 to “1-part/override” and I’m also fine with Timo’s “cat.db.fun” way
> > to override a catalog function.
> > >
> > > I’m not sure about “system.system.fun”, it introduces a nonexistent cat
> > & db? And we still need to do special treatment for the dedicated
> > system.system cat & db?
> > >
> > > Best,
> > > Jark
> > >
> > >
> > >> 在 2019年9月18日,06:54,Timo Walther  写道:
> > >>
> > >> Hi everyone,
> > >>
> > >> @Xuefu: I would like to avoid adding too many things incrementally.
> > Users should be able to override all catalog objects consistently
> according
> > to FLIP-64 (Support for Temporary Objects in Table module). If functions
> > are treated completely different, we need more code and special cases.
> From
> > an implementation perspective, this topic only affects the lookup logic
> > which is rather low implementation effort which is why I would like to
> > clarify the remaining items. As you said, we have a slight consenus on
> > overriding built-in functions; we should also strive for reaching
> consensus
> > on the remaining topics.
> > >>
> > >> @Dawid: I like your idea as it ensures registering catalog objects
> > consistent and the overriding of built-in functions more explicit.
> > >>
> > >> Thanks,
> > >> Timo
> > >>
> > >>
> > >> On 17.09.19 11:59, kai wang wrote:
> > >>> hi, everyone
> > >>> I think this flip is very meaningful. it supports functions that can
> be
> > >>> shared by different catalogs and dbs, reducing the duplication of
> > functions.
> > >>>
> > >>> Our group based on flink's sql parser module implements create
> function
> > >>> feature, stores the parsed function metadata and schema into mysql,
> and
> > >>> also customizes the catalog, customizes sql-client to support custom
> > >>> schemas and functions. Loaded, but the function is currently global,
> > and is
> > >>> not subdivided according to catalog and db.
> > >>>
> > >>> In addition, I very much hope to participate in the development of
> this
> > >>> flip, I have been paying attention to the community, but found it is
> > more
> > >>> difficult to join.
> > >>> thank you.
> > >>>
> > >>> Xuefu Z  于2019年9月17日周二 上午11:19写道:
> > >>>
> > >>>> Thanks to Tmo and Dawid for sharing thoughts.
> > >>>>
> > >>>> It seems to me that there is a general consensus on having temp
> > functions
> > >>>> that have no namespaces and overwrite built-in functions. (As a side
> > note
> > >>>> for comparability, the current user defined functions are all
> > temporary and
> > >>>> having no namespaces.)
> > >>>>
> > >>>> Nevertheless, I can also see the merit of having namespaced temp
> > functions
> > >>>> that can overwrite functions defined in a specific cat/db. However,
> > this
> > >>>> idea appears orthogonal to the former and can be added
> incrementally.
> > >>>>
> > >>>> How about we first implement non-namespaced temp functions now and
> > leave
> > >>>> the door open for namespaced ones for later releases as the
> > requirement
> &

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Dawid Wysakowicz
Last additional comment on Option 2. The reason why I prefer option 3 is
that in option 3 all objects internally are identified with 3 parts. This
makes it easier to handle at different locations e.g. while persisting
views, as all objects have uniform representation.

On Thu, 19 Sep 2019, 07:31 Dawid Wysakowicz, 
wrote:

> Hi,
> I think it makes sense to start voting at this point.
>
> Option 1: Only 1-part identifiers
> PROS:
> - allows shadowing built-in functions
> CONS:
> - incosistent with all the other objects, both permanent & temporary
> - does not allow shadowing catalog functions
>
> Option 2: Special keyword for built-in function
> I think this is quite similar to the special catalog/db. The thing I am
> strongly against in this proposal is the GLOBAL keyword. This keyword has a
> meaning in rdbms systems and means a function that is present for a
> lifetime of a session in which it was created, but available in all other
> sessions. Therefore I really don't want to use this keyword in a different
> context.
>
> Option 3: Special catalog/db
>
> PROS:
> - allows shadowing built-in functions
> - allows shadowing catalog functions
> - consistent with other objects
> CONS:
> - we introduce a special namespace for built-in functions
>
> I don't see a problem with introducing the special namespace. In the end
> it is very similar to the keyword approach. In this case the catalog/db
> combination would be the "keyword"
>
> Therefore my votes:
> Option 1: -0
> Option 2: -1 (I might change to +0 if we can come up with a better keyword)
> Option 3: +1
>
> Best,
> Dawid
>
>
> On Thu, 19 Sep 2019, 05:12 Xuefu Z,  wrote:
>
>> Hi Aljoscha,
>>
>> Thanks for the summary and these are great questions to be answered. The
>> answer to your first question is clear: there is a general agreement to
>> override built-in functions with temp functions.
>>
>> However, your second and third questions are sort of related, as a
>> function
>> reference can be either just function name (like "func") or in the form or
>> "cat.db.func". When a reference is just function name, it can mean either
>> a
>> built-in function or a function defined in the current cat/db. If we
>> support overriding a built-in function with a temp function, such
>> overriding can also cover a function in the current cat/db.
>>
>> I think what Timo referred as "overriding a catalog function" means a temp
>> function defined as "cat.db.func" overrides a catalog function "func" in
>> cat/db even if cat/db is not current. To support this, temp function has
>> to
>> be tied to a cat/db. What's why I said above that the 2nd and 3rd
>> questions
>> are related. The problem with such support is the ambiguity when user
>> defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
>> Here "func" can means a global temp function, or a temp function in
>> current
>> cat/db. If we can assume the former, this creates an inconsistency because
>> "CREATE FUNCTION func" actually means a function in current cat/db. If we
>> assume the latter, then there is no way for user to create a global temp
>> function.
>>
>> Giving a special namespace for built-in functions may solve the ambiguity
>> problem above, but it also introduces artificial catalog/database that
>> needs special treatment and pollutes the cleanness of  the code. I would
>> rather introduce a syntax in DDL to solve the problem, like "CREATE
>> [GLOBAL] TEMPORARY FUNCTION func".
>>
>> Thus, I'd like to summarize a few candidate proposals for voting purposes:
>>
>> 1. Support only global, temporary functions without namespace. Such temp
>> functions overrides built-in functions and catalog functions in current
>> cat/db. The resolution order is: temp functions -> built-in functions ->
>> catalog functions. (Partially or fully qualified functions has no
>> ambiguity!)
>>
>> 2. In addition to #1, support creating and referencing temporary functions
>> associated with a cat/db with "GLOBAL" qualifier in DDL for global temp
>> functions. The resolution order is: global temp functions -> built-in
>> functions -> temp functions in current cat/db -> catalog function.
>> (Resolution for partially or fully qualified function reference is: temp
>> functions -> persistent functions.)
>>
>> 3. In addition to #1, support creating and referencing temporary functions
>> associated with a cat/db with a special namespace for

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-18 Thread Dawid Wysakowicz
@Bowen I am not suggesting introducing additional catalog. I think we need
to get rid of the current built-in catalog.

@Xuefu in option #3 we also don't need additional referencing the special
catalog anywhere else besides in the CREATE statement. The resolution
behaviour is exactly the same in both options.

On Thu, 19 Sep 2019, 08:17 Xuefu Z,  wrote:

> Hi Dawid,
>
> "GLOBAL" is a temporary keyword that was given to the approach. It can be
> changed to something else for better.
>
> The difference between this and the #3 approach is that we only need the
> keyword for this create DDL. For other places (such as function
> referencing), no keyword or special namespace is needed.
>
> Thanks,
> Xuefu
>
> On Wed, Sep 18, 2019 at 4:32 PM Dawid Wysakowicz <
> wysakowicz.da...@gmail.com>
> wrote:
>
> > Hi,
> > I think it makes sense to start voting at this point.
> >
> > Option 1: Only 1-part identifiers
> > PROS:
> > - allows shadowing built-in functions
> > CONS:
> > - incosistent with all the other objects, both permanent & temporary
> > - does not allow shadowing catalog functions
> >
> > Option 2: Special keyword for built-in function
> > I think this is quite similar to the special catalog/db. The thing I am
> > strongly against in this proposal is the GLOBAL keyword. This keyword
> has a
> > meaning in rdbms systems and means a function that is present for a
> > lifetime of a session in which it was created, but available in all other
> > sessions. Therefore I really don't want to use this keyword in a
> different
> > context.
> >
> > Option 3: Special catalog/db
> >
> > PROS:
> > - allows shadowing built-in functions
> > - allows shadowing catalog functions
> > - consistent with other objects
> > CONS:
> > - we introduce a special namespace for built-in functions
> >
> > I don't see a problem with introducing the special namespace. In the end
> it
> > is very similar to the keyword approach. In this case the catalog/db
> > combination would be the "keyword"
> >
> > Therefore my votes:
> > Option 1: -0
> > Option 2: -1 (I might change to +0 if we can come up with a better
> keyword)
> > Option 3: +1
> >
> > Best,
> > Dawid
> >
> >
> > On Thu, 19 Sep 2019, 05:12 Xuefu Z,  wrote:
> >
> > > Hi Aljoscha,
> > >
> > > Thanks for the summary and these are great questions to be answered.
> The
> > > answer to your first question is clear: there is a general agreement to
> > > override built-in functions with temp functions.
> > >
> > > However, your second and third questions are sort of related, as a
> > function
> > > reference can be either just function name (like "func") or in the form
> > or
> > > "cat.db.func". When a reference is just function name, it can mean
> > either a
> > > built-in function or a function defined in the current cat/db. If we
> > > support overriding a built-in function with a temp function, such
> > > overriding can also cover a function in the current cat/db.
> > >
> > > I think what Timo referred as "overriding a catalog function" means a
> > temp
> > > function defined as "cat.db.func" overrides a catalog function "func"
> in
> > > cat/db even if cat/db is not current. To support this, temp function
> has
> > to
> > > be tied to a cat/db. What's why I said above that the 2nd and 3rd
> > questions
> > > are related. The problem with such support is the ambiguity when user
> > > defines a function w/o namespace, "CREATE TEMPORARY FUNCTION func ...".
> > > Here "func" can means a global temp function, or a temp function in
> > current
> > > cat/db. If we can assume the former, this creates an inconsistency
> > because
> > > "CREATE FUNCTION func" actually means a function in current cat/db. If
> we
> > > assume the latter, then there is no way for user to create a global
> temp
> > > function.
> > >
> > > Giving a special namespace for built-in functions may solve the
> ambiguity
> > > problem above, but it also introduces artificial catalog/database that
> > > needs special treatment and pollutes the cleanness of  the code. I
> would
> > > rather introduce a syntax in DDL to solve the problem, like "CREATE
> > > [GLOBAL] TEMPORARY FUNCTION func".
> > >
> > > Thus, I'd like 

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

2019-09-18 Thread Dawid Wysakowicz
Hi JingsongLee,
>From my understanding they can. Underneath they will be CatalogTables. The
difference is the lifetime of the tables. Plus some of the user facing
interfaces cannot be persisted e.g. datastream. Therefore we must have a
separate methods for that. In the end the temporary tables are held in
memory as CatalogTables.
Best,
Dawid

On Thu, 19 Sep 2019, 10:08 JingsongLee, 
wrote:

> Hi dawid:
> Can temporary tables achieve the same capabilities as catalog table?
> like statistics: CatalogTableStatistics, CatalogColumnStatistics,
> PartitionStatistics
> like partition support: we have added some catalog equivalent interfaces
> on TableSource/TableSink: getPartitions, getPartitionFieldNames
> Maybe it's not a good idea to add these interfaces to
> TableSource/TableSink. What do you think?
>
> Best,
> Jingsong Lee
>
>
> --
> From:Kurt Young 
> Send Time:2019年9月18日(星期三) 17:54
> To:dev 
> Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> module
>
> Hi all,
>
> Sorry to join this party late. Big +1 to this flip, especially for the
> dropping
> "registerTableSink & registerTableSource" part. These are indeed legacy
> and we should try to unify them through CatalogTable after we introduce
> the concept of Catalog.
>
> From my understanding, what we can registered should all be metadata,
> TableSource/TableSink should only be the one who is responsible to do
> the real work, i.e. reading and writing data according to the schema and
> other information like computed column, partition, .e.g.
>
> Best,
> Kurt
>
>
> On Wed, Sep 18, 2019 at 5:14 PM JingsongLee  .invalid>
> wrote:
>
> > After some development and thinking, I have a general understanding.
> > +1 to registering a source/sink does not fit into the SQL world.
> > I am OK to have a deprecated registerTemporarySource/Sink to compatible
> > with old ways.
> >
> > Best,
> > Jingsong Lee
> >
> >
> > --
> > From:Timo Walther 
> > Send Time:2019年9月17日(星期二) 08:00
> > To:dev 
> > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> > module
> >
> > Hi Dawid,
> >
> > thanks for the design document. It fixes big concept gaps due to
> > historical reasons with proper support for serializability and catalog
> > support in mind.
> >
> > I would not mind a registerTemporarySource/Sink, but the problem that I
> > see is that many people think that this is the recommended way of
> > registering a table source/sink which is not true. We should guide users
> > to either use connect() or DDL API which can be validated and stored in
> > catalog.
> >
> > Also from a concept perspective, registering a source/sink does not fit
> > into the SQL world. SQL does not know about source/sinks but only about
> > tables. If the responsibility of a TableSource/TableSink is just a pure
> > physical data consumer/producer that is not connected to the actual
> > logical table schema, we would need a possibility of defining time
> > attributes and interpreting/converting a changelog. This should be done
> > by the framework with information from the DDL/connect() and not be
> > defined in every table source.
> >
> > Regards,
> > Timo
> >
> >
> > On 09.09.19 14:16, JingsongLee wrote:
> > > Hi dawid:
> > >
> > > It is difficult to describe specific examples.
> > > Sometimes users will generate some java converters through some
> > >   Java code, or generate some Java classes through third-party
> > >   libraries. Of course, these can be best done through properties.
> > > But this requires additional work from users.My suggestion is to
> > >   keep this Java instance class way that is user-friendly.
> > >
> > > Best,
> > > Jingsong Lee
> > >
> > >
> > > --
> > > From:Dawid Wysakowicz 
> > > Send Time:2019年9月6日(星期五) 16:21
> > > To:dev 
> > > Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table
> > module
> > >
> > > Hi all,
> > > @Jingsong Could you elaborate a bit more what do you mean by
> > > "some Connectors are difficult to convert all states to properties"
> > > All the Flink provided connectors will definitely be expressible with
> > properties (In the end you should be able to use them from DDL). I think
> if
> > a TableSource is complex enough that it handles filter push down,
> partition
> > support etc. should rather be made available both from DDL & java/scala
> > code. I'm happy to reconsider adding registerTemporaryTable(String path,
> > TableSource source) if you have some concrete examples in mind.
> > >
> > >
> > > @Xuefu: We also considered the ObjectIdentifier (or actually
> introducing
> > a new identifier representation to differentiate between resolved and
> > unresolved identifiers) with the same concerns. We decided to suggest the
> > string & parsing logic because of usability.
> > >  tEnv.from("cat.db.table")
> >

Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints

2023-10-03 Thread Dawid Wysakowicz

Hey all,
My understanding was that from the first message we were discussing 
throwing an exception. Oracle was only shown as an example of a system 
that have a flag for hints behaviour.


Let's get back to the discussion and agree on the behavior. My 
suggestion is to introduce an enum instead of a boolean flag since 
apparently there are different requirements. My take is that it is worth 
to have an option to throw an exception if hints are disabled and are 
provided in the SQL query. This is the same behavior as disabling 
OPTIONS hint we already have[1]


Since you both @Jing and @Sergey would rather like to have an option to 
ignore them we can introduce


table.optimizer.query-options = ENABLED/DISABLED/IGNORED

ENABLED: hints just work

DISABLED: throw an exception

IGNORED: ignore hints

Are you two fine with that option @Jing @Sergey?

Since this thread had a few misunderstandings already, I'd suggest to 
convert it to a FLIP and follow with a VOTE shortly. @Bonnie would you 
like to help with that?


Best,

Dawid

[1] 
https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-dynamic-table-options-enabled


On 02/10/2023 18:18, Jing Ge wrote:

Hi,

I have the same concern as Sergey and would like to make sure the purpose
of this discussion is to globally ignore hints without changing any other
behaviours, if I am not mistaken. Thanks!

Best regards,
Jing

On Mon, Oct 2, 2023 at 3:40 PM Sergey Nuyanzin  wrote:


Hi Bonnie,

I think changing it to something like .enabled
could lead to misunderstanding
for instance when we set this flag to false what should it mean?
1. Just ignore hints and execute a query like the same query however with
removed hints
2. Fail on validation because hints are disabled
3. something else

I tend to think that we are talking about just ignoring hints, so option 1
(and Oracle follows option 1 as well as mentioned at [1]).
So I would suggest to keep ignore in property name to make it more clear

Please let me know if I miss anything

[1]

https://docs.oracle.com/en/database/oracle/oracle-database/23/refrn/OPTIMIZER_IGNORE_HINTS.html#GUID-D62CA6D8-D0D8-4A20-93EA-EEB4B3144347

On Fri, Sep 29, 2023 at 7:20 PM Bonnie Arogyam Varghese
 wrote:


Hi Jark,
  A minor suggestion. Would it be ok if we changed the config name to `
table.optimizer.query-options.enabled`?
This is inline with other existing configurations such as:

table.dynamic-table-options.enabled
table.optimizer.distinct-agg.split.enabled
table.optimizer.dynamic-filtering.enabled


On Wed, Sep 27, 2023 at 9:57 AM Bonnie Arogyam Varghese <
bvargh...@confluent.io> wrote:


Hi Martjin,
Yes, the suggestion for the configuration name made by Jark sounds

good.

Also, thanks to everyone who participated in this discussion.

On Tue, Sep 19, 2023 at 2:40 PM Martijn Visser <

martijnvis...@apache.org

wrote:


Hey Jark,

Sounds fine to me.
@Bonnie does that also work for you?

Best regards,

Martijn

On Fri, Sep 15, 2023 at 1:28 PM Jark Wu  wrote:

Hi Martijn,

Thanks for the investigation. I found the blog[1] shows a use case
that they use "OPTIMIZER_IGNORE_HINTS" to check if embedded
hints can be removed in legacy code. I think this is a useful tool

to

improve queries without complex hints strewn throughout the code.
Therefore, I'm fine to support this now.

Maybe we can follow Oracle to name the config
"table.optimizer.ignore-query-hints=false"?

Best,
Jark

[1]: https://dbaora.com/optimizer_ignore_hints-oracle-database-18c/

On Fri, 15 Sept 2023 at 17:57, Martijn Visser <

martijnvis...@apache.org

wrote:


Hi Jark,

Oracle has/had a generic "OPTIMIZER_IGNORE_HINTS" [1]. It looks

like

MSSQL

has configuration options to disable specific hints [2] instead

of a

generic solution.

[1]



https://docs.oracle.com/en/database/oracle/oracle-database/23/refrn/OPTIMIZER_IGNORE_HINTS.html#GUID-D62CA6D8-D0D8-4A20-93EA-EEB4B3144347

[2]



https://www.mssqltips.com/sqlservertip/4175/disabling-sql-server-optimizer-rules-with-queryruleoff/

Best regards,

Martijn

On Fri, Sep 15, 2023 at 10:53 AM Jark Wu 

wrote:

Hi Martijn,

I can understand that.
Is there any database/system that supports disabling/enabling

query

hints

  with a configuration? This might help us to understand the use

case,

and follow the approach.

Best,
Jark

On Fri, 15 Sept 2023 at 15:34, Martijn Visser <

martijnvis...@apache.org>

wrote:


Hi all,

I think Jark has a valid point with:


Does this mean that in the future we might add an option to

disable

each

feature?

I don't think that's a reasonable outcome indeed, but we are

currently

in a

situation where we don't have clear guidelines on when to add

a

configuration option, and when not to add one. From a platform

perspective,

there might not be an imminent or obvious security

implication,

but you

want to minimize a potential attack surface.


We should try to remove the unnecessary configuration from

the

list

in

Flink 2.0.

I agree with that too.

With t

Re: [DISCUSS] [FLINK-32873] Add a config to allow disabling Query hints

2023-10-04 Thread Dawid Wysakowicz

Hey Jing,

If you went through the discussion, you would see it has never been 
shifted towards "ignore". The only concern in the discussion was we'd 
have too many options and that lookup joins require them. It was never 
questioned we should not throw an exception that was suggested in the 
first message:


   Description: Enable or disable the QUERY hint, if disabled, an
   exception would be thrown if any QUERY hints are specified
   Note: The default value will be set to true.

until you commented on the PR, which confused me. The oracle's 
OPTIMIZER_IGNORE_HINTS was shown as an example of a system that does let 
you disable hints. It was never said let's support the same behaviour. 
Just that, yeah, I am fine now with the adding the option. Let's name it 
the same way.


   On one
   hand, there is no clear reason why we should disable(throwing exception) it
   globally, and on the other hand, some functionalities, e.g. lookup join
   pointed out by Ron, are depending on it.

What's the difference between ignore and throw in this case? They 
wouldn't work in either case.


   Would you like to elaborate the
   must-have requirement for the "disabled" scenario?

As platform provider we'd like to take as much control of the query as 
possible. Query hints expose internals which we would like to take 
control of. Moreover we don't suggest to disable them by default. We 
just propose to have that possibility if you need it. If you don't want 
to, you don't need to use it. I also don't get the argument we'd have 
too many options. We already have many detailed options and on the other 
hand query hints themselves make the system more complicated to operate.


I think my proposal to support both THROW and IGNORE semantics should 
satisfy both mine requirements and your concerns, so to be honest I am 
disappointed that you don't want to reach a compromise, but you just 
block the effort.


Best,

Dawid


On 05/10/2023 05:08, Jing Ge wrote:

Hi Dawid,

Thanks for the clarification. If you could go through the discussion, you
would be aware that the focus has been moved from "disable" to "ignore".
There was an alignment only on "ignore hints". Your suggestion bypassed the
alignment and mixed everything together. That confused me a bit. On one
hand, there is no clear reason why we should disable(throwing exception) it
globally, and on the other hand, some functionalities, e.g. lookup join
pointed out by Ron, are depending on it. Would you like to elaborate the
must-have requirement for the "disabled" scenario? Thanks!

Best regards,
Jing

On Thu, Oct 5, 2023 at 12:23 AM Sergey Nuyanzin  wrote:


Hi Dawid,

Thanks for bringing this.
I would agree with enum approach
ignored option would allow to follow Oracle's behavior as well


table.optimizer.query-options = ENABLED/DISABLED/IGNORED

nit: Can we have "hint" in config option name
e.g. table.optimizer.query-options-hints ?


On Tue, Oct 3, 2023 at 5:58 PM Dawid Wysakowicz
wrote:


Hey all,
My understanding was that from the first message we were discussing
throwing an exception. Oracle was only shown as an example of a system
that have a flag for hints behaviour.

Let's get back to the discussion and agree on the behavior. My
suggestion is to introduce an enum instead of a boolean flag since
apparently there are different requirements. My take is that it is worth
to have an option to throw an exception if hints are disabled and are
provided in the SQL query. This is the same behavior as disabling
OPTIONS hint we already have[1]

Since you both @Jing and @Sergey would rather like to have an option to
ignore them we can introduce

table.optimizer.query-options = ENABLED/DISABLED/IGNORED

ENABLED: hints just work

DISABLED: throw an exception

IGNORED: ignore hints

Are you two fine with that option @Jing @Sergey?

Since this thread had a few misunderstandings already, I'd suggest to
convert it to a FLIP and follow with a VOTE shortly. @Bonnie would you
like to help with that?

Best,

Dawid

[1]



https://nightlies.apache.org/flink/flink-docs-master/docs/dev/table/config/#table-dynamic-table-options-enabled

On 02/10/2023 18:18, Jing Ge wrote:

Hi,

I have the same concern as Sergey and would like to make sure the

purpose

of this discussion is to globally ignore hints without changing any

other

behaviours, if I am not mistaken. Thanks!

Best regards,
Jing

On Mon, Oct 2, 2023 at 3:40 PM Sergey Nuyanzin

wrote:

Hi Bonnie,

I think changing it to something like .enabled
could lead to misunderstanding
for instance when we set this flag to false what should it mean?
1. Just ignore hints and execute a query like the same query however

with

removed hints
2. Fail on validation because hints are disabled
3. something else

I tend to think that we are talking about just ignoring hints, so

option 1

(and Oracle 

Re: [VOTE] FLIP-376: Add DISTRIBUTED BY clause

2023-11-06 Thread Dawid Wysakowicz
+1
Best,
Dawid

On Mon, 6 Nov 2023 at 12:38, Timo Walther  wrote:

> Hi everyone,
>
> I'd like to start a vote on FLIP-376: Add DISTRIBUTED BY clause[1] which
> has been discussed in this thread [2].
>
> The vote will be open for at least 72 hours unless there is an objection
> or not enough votes.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-376%3A+Add+DISTRIBUTED+BY+clause
> [2] https://lists.apache.org/thread/z1p4f38x9ohv29y4nlq8g1wdxwfjwxd1
>
> Cheers,
> Timo
>


[DISCUSS] FLIP-393: Make QueryOperations SQL serializable

2023-11-15 Thread Dawid Wysakowicz
Hi,
I would like to propose a follow-up improvement to some of the work that
has been done over the years to the Table API. I posted the proposed
changes in the FLIP-393. I'd like to get to know what others think of
choosing SQL as the serialization format for QueryOperations.
Regards,
Dawid Wysakowicz

[1] https://cwiki.apache.org/confluence/x/vQ2ZE


Re: [DISCUSS] FLIP-393: Make QueryOperations SQL serializable

2023-11-16 Thread Dawid Wysakowicz
I think the FLIP covers all public contracts that are necessary to be
discussed at that level.

If you meant you could not find a method that would be called to trigger
the translation then it is already there. It's just not implemented yet:
QueryOperation#asSerializableString[1]. As I mentioned this is mostly a
follow up to previous work.

Regards,
Dawid

[1]
https://github.com/apache/flink/blob/d18a4bfe596fc580f8280750fa3bfa22007671d9/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/operations/QueryOperation.java#L46

On Wed, 15 Nov 2023 at 16:36, Benchao Li  wrote:

> +1 for the idea of choosing SQL as the serialization format for
> QueryOperation, thanks for Dawid for driving this FLIP.
>
> Regarding the implementation, I didn't see the proposal for how to
> translate QueryOperation to SQL yet, am I missing something? Or the
> FLIP is still in preliminary state, you just want to gather ideas
> about whether to use SQL or something else as the serialization format
> for QueryOperation?
>
> Dawid Wysakowicz  于2023年11月15日周三 19:34写道:
> >
> > Hi,
> > I would like to propose a follow-up improvement to some of the work that
> > has been done over the years to the Table API. I posted the proposed
> > changes in the FLIP-393. I'd like to get to know what others think of
> > choosing SQL as the serialization format for QueryOperations.
> > Regards,
> > Dawid Wysakowicz
> >
> > [1] https://cwiki.apache.org/confluence/x/vQ2ZE
>
>
>
> --
>
> Best,
> Benchao Li
>


Re: [DISCUSS] FLIP-393: Make QueryOperations SQL serializable

2023-11-16 Thread Dawid Wysakowicz
Yes, the idea is to convert the QueryOperation tree into a
proper/compilable query. To be honest I didn't think it could be done
differently, sorry if I wasn't clear enough. Yes, it is very much like
SqlNode#toSqlString you mentioned. I'll add an example of a single
QueryOperation tree to the FLIP.

I tried to focus only on the public contracts, not on the implementation
details. I mentioned Expressions, because this requires changing
semi-public interfaces in BuiltinFunctionDefinitions.

Hope this makes it clearer.

Regards,
Dawid

On Thu, 16 Nov 2023 at 12:12, Benchao Li  wrote:

> Sorry that I wasn't expressing it clearly.
>
> Since the FLIP talks about two things: ResolvedExpression and
> QueryOperation, and you have illustrated how to serialize
> ResolvedExpression into SQL string. I'm wondering how you'll gonna to
> convert QueryOperation into SQL string.
>
> I was thinking that you proposed to convert the QueryOperation tree
> into a "complete runnable SQL statement", e.g.
>
> ProjectQueryOperation(x,y)->FilterQueryOperation(z>10)->TableSourceQueryOperation(T),
> we'll get "SELECT x, y FROM T WHERE z > 10".
> Maybe I misread it, maybe you just meant to convert each
> QueryOperation into a row-level SQL string instead the whole tree into
> a complete SQL statement.
>
> The idea of translating whole QueryOperation tree into SQL statement
> may come from my experience of Apache Calcite, there is a
> SqlImplementor[1] which convert a RelNode tree into SqlNode, and
> further we can use  SqlNode#toSqlString to unparse it into SQL string.
> I would assume that most of our QueryOperations are much like the
> abstraction of Calcite's RelNode, with some exceptions such as
> PlannerQueryOperation.
>
> [1]
> https://github.com/apache/calcite/blob/153796f8994831ad015af4b9036aa01ebf78/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L141
>
> Dawid Wysakowicz  于2023年11月16日周四 16:24写道:
> >
> > I think the FLIP covers all public contracts that are necessary to be
> > discussed at that level.
> >
> > If you meant you could not find a method that would be called to trigger
> > the translation then it is already there. It's just not implemented yet:
> > QueryOperation#asSerializableString[1]. As I mentioned this is mostly a
> > follow up to previous work.
> >
> > Regards,
> > Dawid
> >
> > [1]
> >
> https://github.com/apache/flink/blob/d18a4bfe596fc580f8280750fa3bfa22007671d9/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/operations/QueryOperation.java#L46
> >
> > On Wed, 15 Nov 2023 at 16:36, Benchao Li  wrote:
> >
> > > +1 for the idea of choosing SQL as the serialization format for
> > > QueryOperation, thanks for Dawid for driving this FLIP.
> > >
> > > Regarding the implementation, I didn't see the proposal for how to
> > > translate QueryOperation to SQL yet, am I missing something? Or the
> > > FLIP is still in preliminary state, you just want to gather ideas
> > > about whether to use SQL or something else as the serialization format
> > > for QueryOperation?
> > >
> > > Dawid Wysakowicz  于2023年11月15日周三 19:34写道:
> > > >
> > > > Hi,
> > > > I would like to propose a follow-up improvement to some of the work
> that
> > > > has been done over the years to the Table API. I posted the proposed
> > > > changes in the FLIP-393. I'd like to get to know what others think of
> > > > choosing SQL as the serialization format for QueryOperations.
> > > > Regards,
> > > > Dawid Wysakowicz
> > > >
> > > > [1] https://cwiki.apache.org/confluence/x/vQ2ZE
> > >
> > >
> > >
> > > --
> > >
> > > Best,
> > > Benchao Li
> > >
>
>
>
> --
>
> Best,
> Benchao Li
>


Re: [DISCUSS] FLIP-393: Make QueryOperations SQL serializable

2023-11-20 Thread Dawid Wysakowicz
@Benchao I added an example to the page.

If there are no further comments, I'll start a vote on the FLIP tomorrow or
the next day.

Best,
Dawid

On Fri, 17 Nov 2023 at 12:20, xiangyu feng  wrote:

> >After this FLIP is done, FLINK-25015() can utilize this ability to set
> > job name for queries.
>
> +1 for this. Currently, when users submit sql jobs through table api, we
> can't see the complete SQL string on flink ui. It would be easy for us to
> finish this feature if we can get serialized sql from QueryOperation
> directly.
>
> So +1 for the overall proposal.
>
> Regards,
> Xiangyu
>
> Benchao Li  于2023年11月17日周五 19:07写道:
>
> > That sounds good to me, I'm looking forward to it!
> >
> > After this FLIP is done, FLINK-25015 can utilize this ability to set
> > job name for queries.
> >
> > Dawid Wysakowicz  于2023年11月16日周四 21:16写道:
> > >
> > > Yes, the idea is to convert the QueryOperation tree into a
> > > proper/compilable query. To be honest I didn't think it could be done
> > > differently, sorry if I wasn't clear enough. Yes, it is very much like
> > > SqlNode#toSqlString you mentioned. I'll add an example of a single
> > > QueryOperation tree to the FLIP.
> > >
> > > I tried to focus only on the public contracts, not on the
> implementation
> > > details. I mentioned Expressions, because this requires changing
> > > semi-public interfaces in BuiltinFunctionDefinitions.
> > >
> > > Hope this makes it clearer.
> > >
> > > Regards,
> > > Dawid
> > >
> > > On Thu, 16 Nov 2023 at 12:12, Benchao Li  wrote:
> > >
> > > > Sorry that I wasn't expressing it clearly.
> > > >
> > > > Since the FLIP talks about two things: ResolvedExpression and
> > > > QueryOperation, and you have illustrated how to serialize
> > > > ResolvedExpression into SQL string. I'm wondering how you'll gonna to
> > > > convert QueryOperation into SQL string.
> > > >
> > > > I was thinking that you proposed to convert the QueryOperation tree
> > > > into a "complete runnable SQL statement", e.g.
> > > >
> > > >
> >
> ProjectQueryOperation(x,y)->FilterQueryOperation(z>10)->TableSourceQueryOperation(T),
> > > > we'll get "SELECT x, y FROM T WHERE z > 10".
> > > > Maybe I misread it, maybe you just meant to convert each
> > > > QueryOperation into a row-level SQL string instead the whole tree
> into
> > > > a complete SQL statement.
> > > >
> > > > The idea of translating whole QueryOperation tree into SQL statement
> > > > may come from my experience of Apache Calcite, there is a
> > > > SqlImplementor[1] which convert a RelNode tree into SqlNode, and
> > > > further we can use  SqlNode#toSqlString to unparse it into SQL
> string.
> > > > I would assume that most of our QueryOperations are much like the
> > > > abstraction of Calcite's RelNode, with some exceptions such as
> > > > PlannerQueryOperation.
> > > >
> > > > [1]
> > > >
> >
> https://github.com/apache/calcite/blob/153796f8994831ad015af4b9036aa01ebf78/core/src/main/java/org/apache/calcite/rel/rel2sql/SqlImplementor.java#L141
> > > >
> > > > Dawid Wysakowicz  于2023年11月16日周四
> 16:24写道:
> > > > >
> > > > > I think the FLIP covers all public contracts that are necessary to
> be
> > > > > discussed at that level.
> > > > >
> > > > > If you meant you could not find a method that would be called to
> > trigger
> > > > > the translation then it is already there. It's just not implemented
> > yet:
> > > > > QueryOperation#asSerializableString[1]. As I mentioned this is
> > mostly a
> > > > > follow up to previous work.
> > > > >
> > > > > Regards,
> > > > > Dawid
> > > > >
> > > > > [1]
> > > > >
> > > >
> >
> https://github.com/apache/flink/blob/d18a4bfe596fc580f8280750fa3bfa22007671d9/flink-table/flink-table-api-java/src/main/java/org/apache/flink/table/operations/QueryOperation.java#L46
> > > > >
> > > > > On Wed, 15 Nov 2023 at 16:36, Benchao Li 
> > wrote:
> > > > >
> > > > > > +1 for the idea of choosing SQL as the serialization fo

[VOTE] FLIP-393: Make QueryOperations SQL serializable

2023-11-21 Thread Dawid Wysakowicz
Hi everyone,

Thank you to everyone for the feedback on FLIP-393: Make QueryOperations
SQL serializable[1]
which has been discussed in this thread [2].

I would like to start a vote for it. The vote will be open for at least 72
hours unless there is an objection or not enough votes.

[1] https://cwiki.apache.org/confluence/x/vQ2ZE
[2] https://lists.apache.org/thread/ztyk68brsbmwwo66o1nvk3f6fqqhdxgk


Re: [VOTE] FLIP-393: Make QueryOperations SQL serializable

2023-11-24 Thread Dawid Wysakowicz
+1(binding)

And closing this vote thread, results will be announced in a separate email.

Best,
Dawid

On Wed, 22 Nov 2023 at 14:17, Martijn Visser 
wrote:

> +1 (binding)
>
> On Wed, Nov 22, 2023 at 2:05 AM Lincoln Lee 
> wrote:
> >
> > +1 (binding)
> >
> > Best,
> > Lincoln Lee
> >
> >
> > Sergey Nuyanzin  于2023年11月22日周三 07:55写道:
> >
> > > +1 (binding)
> > >
> > > On Tue, Nov 21, 2023 at 3:17 PM Jim Hughes
> 
> > > wrote:
> > >
> > > > +1 (non-binding)
> > > >
> > > > Thanks, Dawid!
> > > >
> > > > Jim
> > > >
> > > > On Tue, Nov 21, 2023 at 7:20 AM Gyula Fóra 
> wrote:
> > > >
> > > > > +1 (binding)
> > > > >
> > > > > Gyula
> > > > >
> > > > > On Tue, 21 Nov 2023 at 13:11, xiangyu feng 
> > > wrote:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Thanks for driving this.
> > > > > >
> > > > > > Best,
> > > > > > Xiangyu Feng
> > > > > >
> > > > > >
> > > > > > Ferenc Csaky  于2023年11月21日周二
> 20:07写道:
> > > > > >
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > > Lookgin forward to this!
> > > > > > >
> > > > > > > Best,
> > > > > > > Ferenc
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Tuesday, November 21st, 2023 at 12:21, Martijn Visser <
> > > > > > > martijnvis...@apache.org> wrote:
> > > > > > >
> > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > +1 (binding)
> > > > > > > >
> > > > > > > > Thanks for driving this.
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > >
> > > > > > > > Martijn
> > > > > > > >
> > > > > > > > On Tue, Nov 21, 2023 at 12:18 PM Benchao Li
> libenc...@apache.org
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 (binding)
> > > > > > > > >
> > > > > > > > > Dawid Wysakowicz wysakowicz.da...@gmail.com 于2023年11月21日周二
> > > > > 18:56写道:
> > > > > > > > >
> > > > > > > > > > Hi everyone,
> > > > > > > > > >
> > > > > > > > > > Thank you to everyone for the feedback on FLIP-393: Make
> > > > > > > QueryOperations
> > > > > > > > > > SQL serializable[1]
> > > > > > > > > > which has been discussed in this thread [2].
> > > > > > > > > >
> > > > > > > > > > I would like to start a vote for it. The vote will be
> open
> > > for
> > > > at
> > > > > > > least 72
> > > > > > > > > > hours unless there is an objection or not enough votes.
> > > > > > > > > >
> > > > > > > > > > [1] https://cwiki.apache.org/confluence/x/vQ2ZE
> > > > > > > > > > [2]
> > > > > > https://lists.apache.org/thread/ztyk68brsbmwwo66o1nvk3f6fqqhdxgk
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Benchao Li
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best regards,
> > > Sergey
> > >
>


[RESULT][VOTE] FLIP-393: Make QueryOperations SQL serializable

2023-11-24 Thread Dawid Wysakowicz
FLIP-393 [1] has been accepted and voted through this thread [2].

The proposal received 10 approving votes, 7 of which are binding, and
there is no disapproval.

Benchao Li (binding)
Martijn Visser (binding)
Gyula Fora (binding)
Timo Walther (binding)
Sergey Nuyanzin (binding)
Lincoln Lee (binding)
Dawid Wysakowicz (binding)
Ferenc Csaky (non-binding)
Xiangyu Feng (non-binding)
Jim Hughes (non-binding)

Thanks to all involved.

[1] https://cwiki.apache.org/confluence/x/vQ2ZE
[2] https://lists.apache.org/thread/5txkmqx9wfj0lzg02vnrw99cj63b5zvj

Best,
Dawid


Re: [VOTE] Release flink-shaded 18.0, release candidate #1

2024-01-08 Thread Dawid Wysakowicz
+1 (binding)

- Validated hashes
- Verified signature
- Verified that no binaries exist in the source archive

Best,
Dawid

On Tue, 28 Nov 2023 at 23:10, Sergey Nuyanzin  wrote:

> Hi everyone,
> Please review and vote on the release candidate #1 for the version 18.0, as
> follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release to be deployed to dist.apache.org
> [2],
> which are signed with the key with fingerprint 1596BBF0726835D8 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-18.0-rc1" [5],
> * website pull request listing the new release [6].
>
> The vote will be open for at least 72 hours. It is adopted by majority
> approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Sergey
>
> [1] https://issues.apache.org/jira/projects/FLINK/versions/12353081
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-shaded-18.0-rc1
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4]
> https://repository.apache.org/content/repositories/orgapacheflink-1676/
> [5] https://github.com/apache/flink-shaded/releases/tag/release-18.0-rc1
> [6] https://github.com/apache/flink-web/pull/701
>


Re: [DISCUSS][FLINK-31830] Align the Nullability Handling of ROW between SQL and TableAPI

2024-01-09 Thread Dawid Wysakowicz
Hey all,
First of all, sorry I have not read the entire thread. I just wanted to
make sure you take this one case into consideration.

As far as I know, we map java classes to SQL ROWs? E.g. it is possible to
have a POJO as a parameter to a UDF.
*class MyUDF {*
*  eval(MyPojo a)*
*}*




*class MyPojo { int f0; Integer f1;}*
I remember the problem with how nullability of a ROW's fields is handled
was mostly because in calcite its only goal is to support IS NULL function.

If we try to map a ROW to a POJO with the proposed strategy, that all
fields are nullable if the outer ROW is nullable we cannot represent the
above POJO.

The POJO's f0 field should be NOT NULL which is telling it is represented
as int . With the new proposed logic, all fields of a POJO would need to be
boxed, because all would be NULLABLE

Hope you can still incorporate this example into your design.
Best,
Dawid

On Thu, 4 Jan 2024 at 03:15, Jane Chan  wrote:

> Hi Timo,
>
> Thanks for your valuable feedback.
>
> How about we work together on this topic and create a FLIP for this? We
> > need more examples in a unified document. Currently, the proposal is
> split
> > across multiple Flink and Calcite JIRA issues and a ML discussion.
>
>
> That sounds like a great idea. A FLIP would provide a more precise and
> better-organized document, and let's fix it together.
>
> Towards several points you mentioned, here are my cents
>
> RelDataType is similarly just a type declaration. Please correct me if
> > I'm wrong but RelDataType itself also allows `ROW NOT
> > NULL`.
> >
>
> In the context of RelDataType, `ROW NOT NULL` is a
> legitimate type specification. However, I presume that the type you intend
> to refer to is `ROW`.
>
> Theoretically speaking, the answer is no and yes.
> **NO** It would not be able to create a type like `RecordType(INTEGER NOT
> NULL f0by using Calcite fluent API[1]. If the record nullability is true,
> then the inner field's nullability is set to true implicitly.
> **YES** It's conceptually viable to create a type like `RecordType(INTEGER
> NOT NULL f0)` by directly calling the constructor of RelRecordType.
> Nevertheless, the JavaDoc constructor summary[2] emphasizes that
> ctor#(StructKind, List, boolean) should only be called
> from a factory method.
>
> The following code snippet demonstrates the difference at the API level.
>
> @Test
> void testRelRecordType() {
>   // create an inner type INT NOT NULL
>   RelDataType innerFieldType =
>   new BasicSqlType(RelDataTypeSystem.DEFAULT, SqlTypeName.INTEGER);
>   RelDataTypeField f0 = new RelDataTypeFieldImpl("f0", 0, innerFieldType);
>
>   // Directly call RelRecordType ctor to specify the record level
> nullability
>
> // However, in practice, users are not recommended to do so
>   RelDataType r1 =
>   new RelRecordType(StructKind.FULLY_QUALIFIED,
> Collections.singletonList(f0), true);
>   // This will print r1: RecordType(INTEGER NOT NULL f0)
>   System.out.println("r1: " + r1.getFullTypeString());
>
>   // Use Calcite fluent API
>   RelDataTypeFactory calciteTypeFactory = new
> SqlTypeFactoryImpl(RelDataTypeSystem.DEFAULT);
>
>   RelDataType r2 =
>   new RelDataTypeFactory.Builder(calciteTypeFactory)
>   .add(f0)
>   .nullable(false) // field nullability will be overridden
> by record nullability
>   .nullableRecord(true)
>   .build();
>   // This will print r2: RecordType(INTEGER f0)
>   System.out.println("r2: " + r2.getFullTypeString());
>
>   // NOTE: replace type factory with flinkTypeFactory also get
> RecordType(INTEGER f0)
>   FlinkTypeFactory flinkTypeFactory =
>   ((PlannerBase) (((TableEnvironmentImpl)
> tableEnv).getPlanner())).getTypeFactory();
> }
>
>
> It's the factory or optimizer that performs necessary changes.
> > - It's up to the framework (i.e. planner or Table API) to decide what to
> > do with these declarations.
> >
>
> You're correct; theoretically, the responsibility for type declaration
> resides with the optimizer and framework. However, Flink allows users to
> create types like `ROW` through the public API, like
> `DataTypes.ROW(DataTypes.FIELD("f0", DataTypes.INT.notNull())).nullable()`.
> In contrast, Calcite restricts such actions(suppose they follow the best
> practice and use fluent API). Perhaps we can take a reference from
> Calcite's RelDataTypeFactory.Builder to align the behavior of table API to
> SQL.
>
>
> > MyPojo can be nullable, but i cannot. This is the reason why we decided
> > to introduce the current behavior. Complex structs are usually generated
> > from Table API or from the catalog (e.g. when mapping to schema registry
> > or some other external system). It could lead to other downstream
> > inconsistencies if we change the method above.
> >
>
> Correct me if I'm mistaken, but if `MyPojo myPojo = null`, we cannot
> conclude that `myPojo.i` is not null because an NPE will be thrown. And we
> can only safely say `myPo

Re: Re: [VOTE] Accept Flink CDC into Apache Flink

2024-01-10 Thread Dawid Wysakowicz
+1 (binding)
Best,
Dawid

On Wed, 10 Jan 2024 at 11:54, Piotr Nowojski  wrote:

> +1 (binding)
>
> śr., 10 sty 2024 o 11:25 Martijn Visser 
> napisał(a):
>
> > +1 (binding)
> >
> > On Wed, Jan 10, 2024 at 4:43 AM Xingbo Huang  wrote:
> > >
> > > +1 (binding)
> > >
> > > Best,
> > > Xingbo
> > >
> > > Dian Fu  于2024年1月10日周三 11:35写道:
> > >
> > > > +1 (binding)
> > > >
> > > > Regards,
> > > > Dian
> > > >
> > > > On Wed, Jan 10, 2024 at 5:09 AM Sharath 
> wrote:
> > > > >
> > > > > +1 (non-binding)
> > > > >
> > > > > Best,
> > > > > Sharath
> > > > >
> > > > > On Tue, Jan 9, 2024 at 1:02 PM Venkata Sanath Muppalla <
> > > > sanath...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > +1 (non-binding)
> > > > > >
> > > > > > Thanks,
> > > > > > Sanath
> > > > > >
> > > > > > On Tue, Jan 9, 2024 at 11:16 AM Peter Huang <
> > > > huangzhenqiu0...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > +1 (non-binding)
> > > > > > >
> > > > > > >
> > > > > > > Best Regards
> > > > > > > Peter Huang
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Jan 9, 2024 at 5:26 AM Jane Chan <
> qingyue@gmail.com>
> > > > wrote:
> > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Jane
> > > > > > > >
> > > > > > > > On Tue, Jan 9, 2024 at 8:41 PM Lijie Wang <
> > > > wangdachui9...@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 (non-binding)
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > > Lijie
> > > > > > > > >
> > > > > > > > > Jiabao Sun  于2024年1月9日周二
> > > > 19:28写道:
> > > > > > > > >
> > > > > > > > > > +1 (non-binding)
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Jiabao
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On 2024/01/09 09:58:04 xiangyu feng wrote:
> > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > > Xiangyu Feng
> > > > > > > > > > >
> > > > > > > > > > > Danny Cranmer  于2024年1月9日周二 17:50写道:
> > > > > > > > > > >
> > > > > > > > > > > > +1 (binding)
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Danny
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Jan 9, 2024 at 9:31 AM Feng Jin <
> > ji...@gmail.com>
> > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > > > >
> > > > > > > > > > > > > Best,
> > > > > > > > > > > > > Feng Jin
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, Jan 9, 2024 at 5:29 PM Yuxin Tan <
> > > > ta...@gmail.com>
> > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > Yuxin
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Márton Balassi  于2024年1月9日周二
> > 17:25写道:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > +1 (binding)
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > On Tue, Jan 9, 2024 at 10:15 AM Leonard Xu <
> > > > > > > xb...@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > +1(binding)
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > Leonard
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > 2024年1月9日 下午5:08,Yangze Guo <
> ka...@gmail.com
> > >
> > > > 写道:
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > +1 (non-binding)
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > Best,
> > > > > > > > > > > > > > > > > Yangze Guo
> > > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > > On Tue, Jan 9, 2024 at 5:06 PM Robert
> > Metzger <
> > > > > > > > > > > > rmetz...@apache.org
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> +1 (binding)
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >> On Tue, Jan 9, 2024 at 9:54 AM Guowei Ma <
> > > > > > > > gu...@gmail.com
> > > > > > > > > >
> > > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >>
> > > > > > > > > > > > > > > > >>> +1 (binding)
> > > > > > > > > > > > > > > > >>> Best,
> > > > > > > > > > > > > > > > >>> Guowei
> > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > > >>> On Tue, Jan 9, 2024 at 4:49 PM Rui Fan <
> > > > > > > > 19...@gmail.com>
> > > > > > > > > > > > > > wrote:
> > > > > > > > > > > > > > > > >>>
> > > > > > > > > > > > > > > >  +1 (non-binding)
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > >  Best,
> > > > > > > > > > > > > > > >  Rui
> > > > > > > > > > > > > > > > 
> > > > > > > > > > > > > > > >  On Tue, Jan 9, 2024 at 4:41 PM Hang
> Ruan <
> > > > > > > > > > > > > ruanhang1...@gmail.com>
> > > > > > > > > > > > > > > > wrote:
> > > > 

[VOTE] FLIP-55: Introduction of a Table API Java Expression DSL

2020-02-09 Thread Dawid Wysakowicz
Hi all,

I wanted to resurrect the thread about introducing a Java Expression
DSL. Please see the updated flip page[1]. Most of the flip was concluded
in previous discussion thread. The major changes since then are:

* accepting java.lang.Object in the Java DSL

* adding $ interpolation for a column in the Scala DSL

I think it's important to move those changes forward as it makes it
easier to transition to the new type system (Java parser supports only
the old type system stack for now) that we are working on for the past
releases.

Because the previous discussion thread was rather conclusive I want to
start already with a vote. If you think we need another round of
discussion, feel free to say so.


The vote will last for at least 72 hours, following the consensus voting
process.

FLIP wiki:

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL


Discussion thread:

https://lists.apache.org/thread.html/eb5e7b0579e5f1da1e9bf1ab4e4b86dba737946f0261d94d8c30521e@%3Cdev.flink.apache.org%3E






signature.asc
Description: OpenPGP digital signature


[DISCUSS] Drop connectors for Elasticsearch 2.x and 5.x

2020-02-10 Thread Dawid Wysakowicz
Hi all,

As described in this https://issues.apache.org/jira/browse/FLINK-11720
ticket our elasticsearch 5.x connector does not work out of the box on
some systems and requires a version bump. This also happens for our e2e.
We cannot bump the version in es 5.x connector, because 5.x connector
shares a common class with 2.x that uses an API that was replaced in 5.2.

Both versions are already long eol: https://www.elastic.co/support/eol

I suggest to drop both connectors 5.x and 2.x. If it is too much to drop
both of them, I would strongly suggest dropping at least 2.x connector
and update the 5.x line to a working es client module.

What do you think? Should we drop both versions? Drop only the 2.x
connector? Or keep them both?

Best,

Dawid




signature.asc
Description: OpenPGP digital signature


Re: [VOTE] FLIP-55: Introduction of a Table API Java Expression DSL

2020-02-11 Thread Dawid Wysakowicz
Hi,

To answer some of the questions:

@Jingsong We use Objects in the java API to make it possible to use raw
Objects without the need to wrap them in literals. If an expression is
passed it is used as is. If anything else is used, it is assumed to be
an literal and is wrapped into a literal. This way we can e.g. write
$("f0").plus(1).

@Jark I think it makes sense to shorten them, I will do it I hope people
that already voted don't mind.

@Dian That's a valid concern. I would not discard the '$' as a column
expression for java and scala. I think once we introduce the expression
DSL for python we can add another alias to java/scala. Personally I'd be
in favor of col.

On 11/02/2020 10:41, Dian Fu wrote:
> Hi Dawid,
>
> Thanks for driving this feature. The design looks very well for me overall.
>
> I have only one concern: $ is not allowed to be used in the identifier of 
> Python and so we have to come out with another symbol when aligning this 
> feature in the Python Table API. I noticed that there are also other options 
> proposed in the discussion thread, e.g. ref, col, etc. I think it would be 
> great if the proposed symbol could be supported in both the Java/Scala and 
> Python Table API. What's your thoughts?
>
> Regards,
> Dian
>
>> 在 2020年2月11日,上午11:13,Jark Wu  写道:
>>
>> +1 for this.
>>
>> I have some minor comments:
>> - I'm +1 to use $ in both Java and Scala API.
>> - I'm +1 to use lit(), Spark also provides lit() function to create a
>> literal value.
>> - Is it possible to have `isGreater` instead of `isGreaterThan` and
>> `isGreaterOrEqual` instead of `isGreaterThanOrEqualTo` in BaseExpressions?
>>
>> Best,
>> Jark
>>
>> On Tue, 11 Feb 2020 at 10:21, Jingsong Li  wrote:
>>
>>> Hi Dawid,
>>>
>>> Thanks for driving.
>>>
>>> - adding $ in scala api looks good to me.
>>> - Just a question, what should be expected to java.lang.Object? literal
>>> object or expression? So the Object is the grammatical sugar of literal?
>>>
>>> Best,
>>> Jingsong Lee
>>>
>>> On Mon, Feb 10, 2020 at 9:40 PM Timo Walther  wrote:
>>>
>>>> +1 for this.
>>>>
>>>> It will also help in making a TableEnvironment.fromElements() possible
>>>> and reduces technical debt. One entry point of TypeInformation less in
>>>> the API.
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>>
>>>> On 10.02.20 08:31, Dawid Wysakowicz wrote:
>>>>> Hi all,
>>>>>
>>>>> I wanted to resurrect the thread about introducing a Java Expression
>>>>> DSL. Please see the updated flip page[1]. Most of the flip was
>>> concluded
>>>>> in previous discussion thread. The major changes since then are:
>>>>>
>>>>> * accepting java.lang.Object in the Java DSL
>>>>>
>>>>> * adding $ interpolation for a column in the Scala DSL
>>>>>
>>>>> I think it's important to move those changes forward as it makes it
>>>>> easier to transition to the new type system (Java parser supports only
>>>>> the old type system stack for now) that we are working on for the past
>>>>> releases.
>>>>>
>>>>> Because the previous discussion thread was rather conclusive I want to
>>>>> start already with a vote. If you think we need another round of
>>>>> discussion, feel free to say so.
>>>>>
>>>>>
>>>>> The vote will last for at least 72 hours, following the consensus
>>> voting
>>>>> process.
>>>>>
>>>>> FLIP wiki:
>>>>>
>>>>> [1]
>>>>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-55%3A+Introduction+of+a+Table+API+Java+Expression+DSL
>>>>>
>>>>> Discussion thread:
>>>>>
>>>>>
>>> https://lists.apache.org/thread.html/eb5e7b0579e5f1da1e9bf1ab4e4b86dba737946f0261d94d8c30521e@%3Cdev.flink.apache.org%3E
>>>>>
>>>>>
>>>>>
>>>>
>>> --
>>> Best, Jingsong Lee
>>>



signature.asc
Description: OpenPGP digital signature


[VOTE][RESULT] FLIP-55: Introduction of a Table API Java Expression DSL

2020-02-13 Thread Dawid Wysakowicz
Thanks all for the votes.
So far, we have

   - 3 binding +1 votes (Timo, Jark, Aljoscha)
   - 2 non-binding +1 votes (Jingsong, Dian)
   - No -1 votes

The voting time has past and there is enough +1 votes to consider the FLIP-55 
approved.
Thank you all.
Best,
Dawid

On 12/02/2020 06:21, Dian Fu wrote:
> Hi Dawid,
>
> Thanks for your reply. I'm also in favor of "col" as a column expression in 
> the Python Table API. Regarding to use "$" in the Java/Scala Table API, I'm 
> fine with it. So +1 from my side.
>
> Thanks,
> Dian
>
>> 在 2020年2月11日,下午9:48,Aljoscha Krettek  写道:
>>
>> +1
>>
>> Best,
>> Aljoscha
>>
>> On 11.02.20 11:17, Jingsong Li wrote:
>>> Thanks Dawid for your explanation,
>>> +1 for vote.
>>> So I am big +1 to accepting java.lang.Object in the Java DSL, without
>>> scala implicit conversion, a lot of "lit" look unfriendly to users.
>>> Best,
>>> Jingsong Lee
>>> On Tue, Feb 11, 2020 at 6:07 PM Dawid Wysakowicz 
>>> wrote:
>>>> Hi,
>>>>
>>>> To answer some of the questions:
>>>>
>>>> @Jingsong We use Objects in the java API to make it possible to use raw
>>>> Objects without the need to wrap them in literals. If an expression is
>>>> passed it is used as is. If anything else is used, it is assumed to be
>>>> an literal and is wrapped into a literal. This way we can e.g. write
>>>> $("f0").plus(1).
>>>>
>>>> @Jark I think it makes sense to shorten them, I will do it I hope people
>>>> that already voted don't mind.
>>>>
>>>> @Dian That's a valid concern. I would not discard the '$' as a column
>>>> expression for java and scala. I think once we introduce the expression
>>>> DSL for python we can add another alias to java/scala. Personally I'd be
>>>> in favor of col.
>>>>
>>>> On 11/02/2020 10:41, Dian Fu wrote:
>>>>> Hi Dawid,
>>>>>
>>>>> Thanks for driving this feature. The design looks very well for me
>>>> overall.
>>>>> I have only one concern: $ is not allowed to be used in the identifier
>>>> of Python and so we have to come out with another symbol when aligning this
>>>> feature in the Python Table API. I noticed that there are also other
>>>> options proposed in the discussion thread, e.g. ref, col, etc. I think it
>>>> would be great if the proposed symbol could be supported in both the
>>>> Java/Scala and Python Table API. What's your thoughts?
>>>>> Regards,
>>>>> Dian
>>>>>
>>>>>> 在 2020年2月11日,上午11:13,Jark Wu  写道:
>>>>>>
>>>>>> +1 for this.
>>>>>>
>>>>>> I have some minor comments:
>>>>>> - I'm +1 to use $ in both Java and Scala API.
>>>>>> - I'm +1 to use lit(), Spark also provides lit() function to create a
>>>>>> literal value.
>>>>>> - Is it possible to have `isGreater` instead of `isGreaterThan` and
>>>>>> `isGreaterOrEqual` instead of `isGreaterThanOrEqualTo` in
>>>> BaseExpressions?
>>>>>> Best,
>>>>>> Jark
>>>>>>
>>>>>> On Tue, 11 Feb 2020 at 10:21, Jingsong Li 
>>>> wrote:
>>>>>>> Hi Dawid,
>>>>>>>
>>>>>>> Thanks for driving.
>>>>>>>
>>>>>>> - adding $ in scala api looks good to me.
>>>>>>> - Just a question, what should be expected to java.lang.Object? literal
>>>>>>> object or expression? So the Object is the grammatical sugar of
>>>> literal?
>>>>>>> Best,
>>>>>>> Jingsong Lee
>>>>>>>
>>>>>>> On Mon, Feb 10, 2020 at 9:40 PM Timo Walther 
>>>> wrote:
>>>>>>>> +1 for this.
>>>>>>>>
>>>>>>>> It will also help in making a TableEnvironment.fromElements() possible
>>>>>>>> and reduces technical debt. One entry point of TypeInformation less in
>>>>>>>> the API.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Timo
>>>>>>>>
>>>>>>>>
>>>>>>>&

[DISCUSS][TABLE] Issue with package structure in the Table API

2020-02-13 Thread Dawid Wysakowicz
Hi devs,

I wanted to bring up a problem that we have in our package structure.

As a result of https://issues.apache.org/jira/browse/FLINK-13045 we
started advertising importing two packages in the scala API:
import org.apache.flink.table.api._
import org.apache.flink.table.api.scala._

The intention was that the first package (org.apache.flink.table.api)
would contain all api classes that are required to work with the unified
TableEnvironment. Such as TableEnvironment, Table, Session, Slide and
expressionDsl. The second package (org.apache.flink.table.api.scala._)
would've been an optional package that contain bridging conversions
between Table and DataStream/DataSet APIs including the more specific
StreamTableEnvironment and BatchTableEnvironment.

The part missing in the original plan was to move all expressions
implicit conversions to the org.apache.flink.table.api package. Without
that step users of pure table program (that do not use the
table-api-scala-bridge module) cannot use the Expression DSL. Therefore
we should try to move those expressions as soon as possible.

The problem with this approach is that it clashes with common imports of
classes from java.* and scala.* packages. Users are forced to write:

import org.apache.flink.table.api._
import org.apache.flink.table.api.scala_
import _root_.scala.collection.mutable.ArrayBuffer
import _root_.java.lang.Integer

Besides being cumbersome, it also messes up the macro based type
extraction (org.apache.flink.api.scala#createTypeInformation) for all
classes from scala.* packages. I don't fully understand the reasons for
it, but the createTypeInformation somehow drops the _root_ for
WeakTypeTags. So e.g. for a call:
createTypeInformation[_root_.scala.collection.mutable.ArrayBuffer] it
actually tries to construct a TypeInformation for
org.apache.flink.table.api.scala.collection.mutable.ArrayBuffer, which
obviously fails.



What I would suggest for a target solution is to have:

1. for users of unified Table API with Scala ExpressionDSL

import org.apache.flink.table.api._ (for TableEnvironment, Tumble etc.
and expressions)

2. for users of Table API with scala's bridging conversions

import org.apache.flink.table.api._ (for Tumble etc. and expressions)
import org.apache.flink.table.api.bridge.scala._ (for bridging
conversions and StreamTableEnvironment)

3. for users of unified Table API with Java ExpressionDSL

import org.apache.flink.table.api.* (for TableEnvironment, Tumble etc.)
import org.apache.flink.table.api.Expressions.* (for Expression dsl)

4. for users of Table API with java's bridging conversions

import org.apache.flink.table.api.* (for Tumble etc.)
import org.apache.flink.table.api.Expressions.* (for Expression dsl)
import org.apache.flink.table.api.bridge.java.*

To have that working we need to:
* move the scala expression DSL to org.apache.flink.table.api package in
table-api-scala module
* move all classes from org.apache.flink.table.api.scala and
org.apache.flink.table.api.java packages to
org.apache.flink.table.api.bridge.scala and
org.apache.flink.table.api.bridge.java accordingly and drop the former
packages

The biggest question I have is how do we want to perform that
transition. If we do it in one go we will break existing user programs
that uses classes from org.apache.flink.table.api.java/scala. Those
packages were present from day one of Table API. Nevertheless this would
be my preffered way to move forward as we annoy users only once, even if
one big time :(

Different option would be to make that transition gradually in 3 releases.
 1. In the first we introduce the
org.apache.flink.table.api.bridge.java/scala, and we have
StreamTableEnvironment etc. as well as expression DSL in both. We ask
users to migrate to the new package.
 2. We drop the org.apache.flink.table.api.java/scala and ask users to
import additionally org.apache.flink.table.api.* for expressions (this
is the same as we did in 1.9.0, the thing though it was extremely hard
to do it then)
 3. We finally move the expression DSL from
org.apache.flink.table.api.bridge.scala to org.apache.flink.table.api
 
What do you think about it?

Best,

Dawid




signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] Drop connectors for Elasticsearch 2.x and 5.x

2020-02-13 Thread Dawid Wysakowicz
Sorry for late reply,

@all I think there is a general consensus that we want to drop ES 2.x
support. I created https://issues.apache.org/jira/browse/FLINK-16046 to
track it.


@Stephan @Chesnay @Itamar In our connectors we use Java High Level Rest
Client. ES promises to maintain compatibility of it with any newer minor
version of ES. So if we have 6.1 client we can use it with any 6.2, 6.3 etc.

ES provides also a low level rest client which does not include any
direct es dependencies and can work with any version of ES. It does not
provide any marshalling unmarshalling or higher level features as
Chesnay said.

Correct me if I am wrong @Itamar but your HTTP client is a simplified
version of the ES's high level rest client with a subset of its
features. I think it will still have the same problems as ES's High
Level Rest Client's because ES does not guarantee that newer message
formats will be compatible with older versions of ES or that message
formats are compatible across major versions at all.


@Stephan @Danny As for the 5.x connector. Any ideas how can we get
user's feedback about it? I cross posted on the user mailing list with
no luck so far. Personally I would be in favor of dropping the
connector. Worst case scenario users still have the possibility of
building the connector themselves from source with just bumping the
flink's versions. As far as I can tell there were no changes to the code
base for quite some time.

Best,

Dawid

On 11/02/2020 10:46, Chesnay Schepler wrote:
> I suppose the downside in an HTTP ES sink is that you don't get _any_
> form of high-level API from ES, and we'd have to manually build an
> HTTP request that matches the ES format. Of course you also lose any
> client-side verification that the clients did, if there is any (but I
> guess the API itself prevented certain errors).
>
> On 11/02/2020 09:32, Stephan Ewen wrote:
>> +1 to drop ES 2.x - unsure about 5.x (makes sense to get more user input
>> for that one).
>>
>> @Itamar - if you would be interested in contributing a "universal" or
>> "cross version" ES connector, that could be very interesting. Do you
>> know
>> if there are known performance issues or feature restrictions with that
>> approach?
>> @dawid what do you think about that?
>>
>>
>> On Tue, Feb 11, 2020 at 6:28 AM Danny Chan  wrote:
>>
>>> 5.x seems to have a lot of users, is the 6.x completely compatible with
>>> 5.x ~
>>>
>>> Best,
>>> Danny Chan
>>> 在 2020年2月10日 +0800 PM9:45,Dawid Wysakowicz
>>> ,写道:
>>>> Hi all,
>>>>
>>>> As described in this https://issues.apache.org/jira/browse/FLINK-11720
>>>> ticket our elasticsearch 5.x connector does not work out of the box on
>>>> some systems and requires a version bump. This also happens for our
>>>> e2e.
>>>> We cannot bump the version in es 5.x connector, because 5.x connector
>>>> shares a common class with 2.x that uses an API that was replaced
>>>> in 5.2.
>>>>
>>>> Both versions are already long eol: https://www.elastic.co/support/eol
>>>>
>>>> I suggest to drop both connectors 5.x and 2.x. If it is too much to
>>>> drop
>>>> both of them, I would strongly suggest dropping at least 2.x connector
>>>> and update the 5.x line to a working es client module.
>>>>
>>>> What do you think? Should we drop both versions? Drop only the 2.x
>>>> connector? Or keep them both?
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>>
>



signature.asc
Description: OpenPGP digital signature


Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer

2020-02-23 Thread Dawid Wysakowicz
Congratulations Jingsong!

Best,

Dawid

On 24/02/2020 08:12, zhenya Sun wrote:
> Congratulations!!!
> | |
> zhenya Sun
> |
> |
> toke...@126.com
> |
> 签名由网易邮箱大师定制
>
>
> On 02/24/2020 14:35,Yu Li wrote:
> Congratulations Jingsong! Well deserved.
>
> Best Regards,
> Yu
>
>
> On Mon, 24 Feb 2020 at 14:10, Congxian Qiu  wrote:
>
> Congratulations Jingsong!
>
> Best,
> Congxian
>
>
> jincheng sun  于2020年2月24日周一 下午1:38写道:
>
> Congratulations Jingsong!
>
> Best,
> Jincheng
>
>
> Zhu Zhu  于2020年2月24日周一 上午11:55写道:
>
> Congratulations Jingsong!
>
> Thanks,
> Zhu Zhu
>
> Fabian Hueske  于2020年2月22日周六 上午1:30写道:
>
> Congrats Jingsong!
>
> Cheers, Fabian
>
> Am Fr., 21. Feb. 2020 um 17:49 Uhr schrieb Rong Rong <
> walter...@gmail.com>:
>
> Congratulations Jingsong!!
>
> Cheers,
> Rong
>
> On Fri, Feb 21, 2020 at 8:45 AM Bowen Li  wrote:
>
> Congrats, Jingsong!
>
> On Fri, Feb 21, 2020 at 7:28 AM Till Rohrmann 
> wrote:
>
> Congratulations Jingsong!
>
> Cheers,
> Till
>
> On Fri, Feb 21, 2020 at 4:03 PM Yun Gao 
> wrote:
>
> Congratulations Jingsong!
>
> Best,
> Yun
>
> --
> From:Jingsong Li 
> Send Time:2020 Feb. 21 (Fri.) 21:42
> To:Hequn Cheng 
> Cc:Yang Wang ; Zhijiang <
> wangzhijiang...@aliyun.com>; Zhenghua Gao ;
> godfrey
> he ; dev ; user <
> u...@flink.apache.org>
> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>
> Thanks everyone~
>
> It's my pleasure to be part of the community. I hope I can make a
> better
> contribution in future.
>
> Best,
> Jingsong Lee
>
> On Fri, Feb 21, 2020 at 2:48 PM Hequn Cheng 
> wrote:
> Congratulations Jingsong! Well deserved.
>
> Best,
> Hequn
>
> On Fri, Feb 21, 2020 at 2:42 PM Yang Wang 
> wrote:
> Congratulations!Jingsong. Well deserved.
>
>
> Best,
> Yang
>
> Zhijiang  于2020年2月21日周五 下午1:18写道:
> Congrats Jingsong! Welcome on board!
>
> Best,
> Zhijiang
>
> --
> From:Zhenghua Gao 
> Send Time:2020 Feb. 21 (Fri.) 12:49
> To:godfrey he 
> Cc:dev ; user 
> Subject:Re: [ANNOUNCE] Jingsong Lee becomes a Flink committer
>
> Congrats Jingsong!
>
>
> *Best Regards,*
> *Zhenghua Gao*
>
>
> On Fri, Feb 21, 2020 at 11:59 AM godfrey he 
> wrote:
> Congrats Jingsong! Well deserved.
>
> Best,
> godfrey
>
> Jeff Zhang  于2020年2月21日周五 上午11:49写道:
> Congratulations!Jingsong. You deserve it
>
> wenlong.lwl  于2020年2月21日周五 上午11:43写道:
> Congrats Jingsong!
>
> On Fri, 21 Feb 2020 at 11:41, Dian Fu 
> wrote:
>
> Congrats Jingsong!
>
> 在 2020年2月21日,上午11:39,Jark Wu  写道:
>
> Congratulations Jingsong! Well deserved.
>
> Best,
> Jark
>
> On Fri, 21 Feb 2020 at 11:32, zoudan  wrote:
>
> Congratulations! Jingsong
>
>
> Best,
> Dan Zou
>
>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>
>
>
> --
> Best, Jingsong Lee
>
>
>
>
>
>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] Support notFollowedBy with interval as the last part of a Pattern

2020-02-24 Thread Dawid Wysakowicz
Hi Shuai,

First of all let me apologize for a late reply. Unfortunately I don't
have enough capacity to properly review and help with the proposal at
this time. If there is another committer in the community willing to
shepherd the proposal feel free to proceed with it.

Another suggestion I may have is you could try with hosting a fork of
the library in flink-packages.org to see if it attracts the interest of
users.

Best,

Dawid

On 11/02/2020 10:10, Shuai Xu wrote:
> Hi all,
> CEP is broadly used in more and more applications now. In may cases, users
> need to use the pattern CEP.begin().notFollowedBy(). For example, they may
> want to get the uses who created an oder but didn't pay in 10 minutes and
> so on.
>
> However, CEP doesn't support notFollowBy() as the last part of a pattern
> now. So I propose to enable it as following:
>
> If the pattern is ended with notFollowBy() with a time interval within(t),
> we take it as a valid pattern. This pattern will be triggered after time t
> from the begin stage if the previous pattern is matched and the
> notFollowBy() pattern doesn't appear during the interval.
>
> For example, Pattern.begin("start").where(event.getId() ==
> 1).notFollowBy("not").where(event.getId() == 2).within(Time.minutes(10)) is
> a valid pattern now.
> If the ids of the input events are 1, 3, 3..., then after 10 minutes from
> getting event with id 1, it will get a match with 1.
>
> This change will not add any new public interface, it only makes some
> patterns not to be invalid any more.
>
> The detail implement design is in:
> https://docs.google.com/document/d/1swUSHcVxbkWm7EPdOfOQXWj-A4gGDA8Y8R1DOUjokds/edit#
>
> Similar requirements from users can be found in:
> https://issues.apache.org/jira/browse/FLINK-9431?filter=12347662
>
> Please let me know if you have any questions or suggestions to improve this
> proposal.
>



signature.asc
Description: OpenPGP digital signature


Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API

2020-02-26 Thread Dawid Wysakowicz
Hi all,

@NiYanchun Thank you for reporting this. Yes I think we could improve
the behaviour of the JSON format.

@Jark First of all I do agree we could/should improve the
"user-friendliness" of the JSON format (and unify the behavior across
text based formats). I am not sure though if it is as simple as just
ignore the time zone here.

My suggestion would be rather to apply the logic of parsing a SQL
timestamp literal (if the expected type is of
LogicalTypeFamily.TIMESTAMP), which would actually also derive the
"stored" type of the timestamp (either WITHOUT TIMEZONE or WITH
TIMEZONE) and then apply a proper sql conversion. Therefore if the

parsed type |    requested type        | behaviour

WITHOUT TIMEZONE    | WITH TIMEZONE | store the local
timezone with the data

WITHOUT TIMEZONE    | WITH LOCAL TIMEZONE  | do nothing in the data,
interpret the time in local timezone

WITH TIMEZONE  | WITH LOCAL TIMEZONE   | convert the
timestamp to local timezone and drop the time zone information

WITH TIMEZONE          | WITHOUT TIMEZONE   | drop the time zone
information 

It might just boil down to what you said "being more lenient with
regards to parsing the time zone". Nevertheless I think this way it is a
bit better defined behaviour, especially as it has a defined behaviour
when converting between representation with or without time zone.

An implementation note. I think we should aim to base the implementation
on the DataTypes already rather than going back to the TypeInformation.

I would still try to leave the RFC 3339 compatibility mode, but maybe
for that mode it would make sense to not support any types WITHOUT
TIMEZONE? This would be enabled with a switch (disabled by default). As
I understand the RFC, making the time zone mandatory is actually a big
part of the standard as it makes time types unambiguous.

What do you think?

Ps. I cross posted this on the dev ML.

Best,

Dawid


On 26/02/2020 03:45, Jark Wu wrote:
> Yes, I'm also in favor of loosen the datetime format constraint. 
> I guess most of the users don't know there is a JSON standard which
> follows RFC 3339.
>
> Best,
> Jark
>
> On Wed, 26 Feb 2020 at 10:06, NiYanchun  <mailto:niyanc...@outlook.com>> wrote:
>
> Yes, these Types definition are general. As a user/developer, I
> would support “loosen it for usability”. If not, may add
> some explanation about JSON.
>
>
>
>  Original Message 
> *Sender:* Jark Wumailto:imj...@gmail.com>>
> *Recipient:* Outlook <mailto:niyanc...@outlook.com>>; Dawid
> Wysakowiczmailto:dwysakow...@apache.org>>
> *Cc:* godfrey he <mailto:godfre...@gmail.com>>; Leonard Xu <mailto:xbjt...@gmail.com>>; user <mailto:u...@flink.apache.org>>
> *Date:* Wednesday, Feb 26, 2020 09:55
> *Subject:* Re: TIME/TIMESTAMP parse in Flink TABLE/SQL API
>
> Hi Outlook,
>
> The explanation in DataTypes is correct, it is compliant to SQL
> standard. The problem is that JsonRowDeserializationSchema only
> support  RFC-3339. 
> On the other hand, CsvRowDeserializationSchema supports to parse
> "2019-07-09 02:02:00.040".
>
> So the question is shall we insist on the RFC-3339 "standard"?
> Shall we loosen it for usability? 
> What do you think @Dawid Wysakowicz <mailto:dwysakow...@apache.org> ?
>
> Best,
> Jark
>
> On Wed, 26 Feb 2020 at 09:29, Outlook  <mailto:niyanc...@outlook.com>> wrote:
>
> Thanks Godfrey and Leonard, I tried your answers, result is OK. 
>
>
> BTW, I think if only accept such format for a long time, the
>  TIME and TIMESTAMP methods' doc in
> `org.apache.flink.table.api.DataTypes` may be better to update,
>
> because the document now is not what the method really
> support. For example, 
>
>
> ```
>
> /**
> * Data type of a time WITHOUT time zone {@code TIME} with no
> fractional seconds by default.
> *
> * An instance consists of {@code hour:minute:second} with
> up to second precision
> * and values ranging from {@code 00:00:00} to {@code 23:59:59}.
> *
> * Compared to the SQL standard, leap seconds (23:59:60 and
> 23:59:61) are not supported as the
> * semantics are closer to {@link java.time.LocalTime}. A time
> WITH time zone is not provided.
> *
> * @see #TIME(int)
> * @see TimeType
> */
> public static DataType TIME() {
> return new AtomicDataType(new TimeType());
>
> }```
>
>
>

[DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-03-01 Thread Dawid Wysakowicz
Hi,

I would like to propose an improvement that would enable reading table
columns from different parts of source records. Besides the main payload
majority (if not all of the sources) expose additional information. It
can be simply a read-only metadata such as offset, ingestion time or a
read and write  parts of the record that contain data but additionally
serve different purposes (partitioning, compaction etc.), e.g. key or
timestamp in Kafka.

We should make it possible to read and write data from all of those
locations. In this proposal I discuss reading partitioning data, for
completeness this proposal discusses also the partitioning when writing
data out.

I am looking forward to your comments.

You can access the FLIP here:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Reading+table+columns+from+different+parts+of+source+records?src=contextnavpagetreemode

Best,

Dawid




signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-03-02 Thread Dawid Wysakowicz
Hi Jark,

Ad. 2 I added a section to discuss relation to FLIP-63

Ad. 3 Yes, I also tried to somewhat keep hierarchy of properties.
Therefore you have the key.format.type.

I also considered exactly what you are suggesting (prefixing with
connector or kafka). I should've put that into an Option/Rejected
alternatives.

I agree timestamp, key.*, value.* are connector properties. Why I wanted
to suggest not adding that prefix in the first version is that actually
all the properties in the WITH section are connector properties. Even
format is in the end a connector property as some of the sources might
not have a format, imo. The benefit of not adding the prefix is that it
makes the keys a bit shorter. Imagine prefixing all the properties with
connector (or if we go with FLINK-12557: elasticsearch):

elasticsearch.key.format.type: csv

elasticsearch.key.format.field: 

elasticsearch.key.format.delimiter: 

elasticsearch.key.format.*: 

I am fine with doing it though if this is a preferred approach in the
community.

Ad in-line comments:

I forgot to update the `value.fields.include` property. It should be
*value.fields-include. *Which I think you also suggested in the comment,
right?

As for the cast vs declaring output type of computed column. I think
it's better not to use CAST, but declare a type of an expression and
later on infer the output type of SYSTEM_METADATA. The reason is I think
this way it will be easier to implement e.g. filter push downs when
working with the native types of the source, e.g. in case of Kafka's
offset, i think it's better to pushdown long rather than string. This
could let us push expression like e.g. offset > 12345 & offset < 59382.
Otherwise we would have to push down cast(offset, long) > 12345 &&
cast(offset, long) < 59382.  Moreover I think we need to introduce the
type for computed columns anyway to support functions that infer output
type based on expected return type.

As for the computed column push down. Yes, SYSTEM_METADATA would have to
be pushed down to the source. If it is not possible the planner should
fail. As far as I know computed columns push down will be part of source
rework, won't it? ;)

As for the persisted computed column. I think it is completely
orthogonal. In my current proposal you can also partition by a computed
column. The difference between using a udf in partitioned by vs
partitioned by a computed column is that when you partition by a
computed column this column must be also computed when reading the
table. If you use a udf in the partitioned by, the expression is
computed only when inserting into the table.

Hope this answers some of your questions. Looking forward for further
suggestions.

Best,

Dawid

**


On 02/03/2020 05:18, Jark Wu wrote:
> Hi,
>
> Thanks Dawid for starting such a great discussion. Reaing metadata and
> key-part information from source is an important feature for streaming
> users.
>
> In general, I agree with the proposal of the FLIP.
> I will leave my thoughts and comments here:
>
> 1) +1 to use connector properties instead of introducing HEADER keyword as
> the reason you mentioned in the FLIP.
> 2) we already introduced PARTITIONED BY in FLIP-63. Maybe we should add a
> section to explain what's the relationship between them.
> Do their concepts conflict? Could INSERT PARTITION be used on the
> PARTITIONED table in this FLIP?
> 3) Currently, properties are hierarchical in Flink SQL. Shall we make the
> new introduced properties more hierarchical?
> For example, "timestamp" => "connector.timestamp"? (actually, I prefer
> "kafka.timestamp" which is another improvement for properties FLINK-12557)
> A single "timestamp" in properties may mislead users that the field is
> a rowtime attribute.
>
> I also left some minor comments in the FLIP.
>
> Thanks,
> Jark
>
>
>
> On Sun, 1 Mar 2020 at 22:30, Dawid Wysakowicz 
> wrote:
>
>> Hi,
>>
>> I would like to propose an improvement that would enable reading table
>> columns from different parts of source records. Besides the main payload
>> majority (if not all of the sources) expose additional information. It
>> can be simply a read-only metadata such as offset, ingestion time or a
>> read and write  parts of the record that contain data but additionally
>> serve different purposes (partitioning, compaction etc.), e.g. key or
>> timestamp in Kafka.
>>
>> We should make it possible to read and write data from all of those
>> locations. In this proposal I discuss reading partitioning data, for
>> completeness this proposal discusses also the partitioning when writing
>> data out.
>>
>> I am looking forward to your comments.
>>
>> You can access the FLIP here:
>>
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-107%3A+Reading+table+columns+from+different+parts+of+source+records?src=contextnavpagetreemode
>>
>> Best,
>>
>> Dawid
>>
>>
>>


signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-107: Reading table columns from different parts of source records

2020-03-03 Thread Dawid Wysakowicz
Hi,

1. I thought a bit more on how the source would emit the columns and I
now see its not exactly the same as regular columns. I see a need to
elaborate a bit more on that in the FLIP as you asked, Jark.

I do agree mostly with Danny on how we should do that. One additional
things I would introduce is an

interface SupportsMetadata {

   boolean supportsMetadata(Set metadataFields);

   TableSource generateMetadataFields(Set metadataFields);

}

This way the source would have to declare/emit only the requested
metadata fields. In order not to clash with user defined fields. When
emitting the metadata field I would prepend the column name with
__system_{property_name}. Therefore when requested
SYSTEM_METADATA("partition") the source would append a field
__system_partition to the schema. This would be never visible to the
user as it would be used only for the subsequent computed columns. If
that makes sense to you, I will update the FLIP with this description.

2. CAST vs explicit type in computed columns

Here I agree with Danny. It is also the current state of the proposal.

3. Partitioning on computed column vs function

Here I also agree with Danny. I also think those are orthogonal. I would
leave out the STORED computed columns out of the discussion. I don't see
how do they relate to the partitioning. I already put both of those
cases in the document. We can either partition on a computed column or
use a udf in a partioned by clause. I am fine with leaving out the
partitioning by udf in the first version if you still have some concerns.

As for your question Danny. It depends which partitioning strategy you use.

For the HASH partitioning strategy I thought it would work as you
explained. It would be N = MOD(expr, num). I am not sure though if we
should introduce the PARTITIONS clause. Usually Flink does not own the
data and the partitions are already an intrinsic property of the
underlying source e.g. for kafka we do not create topics, but we just
describe pre-existing pre-partitioned topic.

4. timestamp vs timestamp.field vs connector.field vs ...

I am fine with changing it to timestamp.field to be consistent with
other value.fields and key.fields. Actually that was also my initial
proposal in a first draft I prepared. I changed it afterwards to shorten
the key.

Best,

Dawid

On 03/03/2020 09:00, Danny Chan wrote:
> Thanks Dawid for bringing up this discussion, I think it is a useful feature ~
>
> About how the metadata outputs from source
>
> I think it is completely orthogonal, computed column push down is another 
> topic, this should not be a blocker but a promotion, if we do not have any 
> filters on the computed column, there is no need to do any pushings; the 
> source node just emit the complete record with full metadata with the 
> declared physical schema, then when generating the virtual columns, we would 
> extract the metadata info and output as full columns(with full schema).
>
> About the type of metadata column
>
> Personally i prefer explicit type instead of CAST, they are symantic 
> equivalent though, explict type is more straight-forward and we can declare 
> the nullable attribute there.
>
> About option A: partitioning based on acomputed column VS option B: 
> partitioning with just a function
>
> From the FLIP, it seems that B's partitioning is just a strategy when writing 
> data, the partiton column is not included in the table schema, so it's just 
> useless when reading from that.
>
> - Compared to A, we do not need to generate the partition column when 
> selecting from the table(but insert into)
> - For A we can also mark the column as STORED when we want to persist that
>
> So in my opition they are orthogonal, we can support both, i saw that 
> MySQL/Oracle[1][2] would suggest to also define the PARTITIONS num, and the 
> partitions are managed under a "tablenamespace", the partition in which the 
> record is stored is partition number N, where N = MOD(expr, num), for your 
> design, which partiton the record would persist ?
>
> [1] https://dev.mysql.com/doc/refman/5.7/en/partitioning-hash.html
> [2] 
> https://docs.oracle.com/database/121/VLDBG/GUID-F023D3ED-262F-4B19-950A-D3C8F8CDB4F4.htm#VLDBG1270
>
> Best,
> Danny Chan
> 在 2020年3月2日 +0800 PM6:16,Dawid Wysakowicz ,写道:
>> Hi Jark,
>> Ad. 2 I added a section to discuss relation to FLIP-63
>> Ad. 3 Yes, I also tried to somewhat keep hierarchy of properties. Therefore 
>> you have the key.format.type.
>> I also considered exactly what you are suggesting (prefixing with connector 
>> or kafka). I should've put that into an Option/Rejected alternatives.
>> I agree timestamp, key.*, value.* are connector properties. Why I wanted to 
>> suggest not adding that prefix in the first version is that actually all t

[DISCUSS] FLIP-110: Support LIKE clause in CREATE TABLE

2020-03-03 Thread Dawid Wysakowicz
Hi devs,

I wanted to bring another improvement proposal up for a discussion.
Often users need to adjust existing tables slightly. This is especially
useful when users need to enhance a table created from an external tool
(e.g. HIVE) with Flink's specific information such as e.g watermarks. It
can also be a useful tool for ETL processes, e.g. merging two tables
into a single one with a different connector.  My suggestion would be to
support an optional *Feature T171, “LIKE clause in table definition” *of
SQL standard 2008.

You can see the description of the proposal here:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE

Looking forward for your comments.

Best,

Dawid



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-110: Support LIKE clause in CREATE TABLE

2020-03-03 Thread Dawid Wysakowicz
Hi Jark,
I did investigate the INHERITS clause, but it has a semantic that in my
opinion we definitely don't want to support. INHERITS creates a new table
with a "link" to the original table. Therefore if you e.g change the schema
of the original table it's also reflected in the child table. It's also
possible for tables like A inherits B query them like Select * from only A,
by default it returns results from both tables. I am pretty sure it's not
what we're looking for.

PostgreSQL implements both the LIKE clause and INHERITS. I am open for
discussion if we should support multiple LIKE statements or not. Standard
also allows declaring the clause after the schema part. We can also do it.
Nevertheless I think including multiple tables might be useful, e.g. when
you want to union two tables and output to the same Kafka cluster and just
change the target topic. I know it's not a very common use case but it's
not a big effort to support it.

Let me know what you think.

Best,
Dawid

On Wed, 4 Mar 2020, 04:55 Jark Wu,  wrote:

> Hi Dawid,
>
> Thanks for starting this discussion. I like the idea.
> Once we support more intergrated catalogs,
> e.g. ConfluentSchemaRegistryCatalog, this problem will be more urgent.
> Because it's very common to adjust existing tables in catalog slightly.
>
> My initial thought was introducing INHERITS keyword, which is also
> supported in PostgreSQL [1].
> This is also similar to the functionality of Hive CREATE TABLE LIKE [2].
>
> CREATE TEMPORARY TABLE MyTable (WATERMARK FOR ts) INHERITS
> cat.db.KafkoTopic
> CREATE TEMPORARY TABLE MyTable (WATERMARK FOR ts) INHERITS
> cat.db.KafkoTopic WITH ('k' = 'v')
>
> The INHERITS can inherit an existing table with all columns, watermark, and
> properties, but the properties and watermark and be overwrited explicitly.
>
> The reason I prefer INHERITS rather than LIKE is the keyword position. We
> are copying an existing table definition including the properties.
> However, LIKE appears in the schema part, it sounds like copying properties
> into schema part of DDL.
>
> Besides of that, I'm not sure whether the use case stands "merging two
> tables into a single one with a different connector".
> From my understanding, most use cases are just slightly adjusting on an
> existing catalog table with new properties or watermarks.
> Do we really need to merge two table definitions into a single one? For
> example, is it possible to merge a Kafka table definition and
> a Filesystem table definition into a new Kafka table, and the new Kafka
> table exactly matches the underlying physical data format?
>
> Best,
> Jark
>
> [1]: https://www.postgresql.org/docs/9.5/sql-createtable.html
> [2]:
>
> https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-CreateTableLike
>
>
> On Tue, 3 Mar 2020 at 21:12, Dawid Wysakowicz 
> wrote:
>
> > Hi devs,
> >
> > I wanted to bring another improvement proposal up for a discussion. Often
> > users need to adjust existing tables slightly. This is especially useful
> > when users need to enhance a table created from an external tool (e.g.
> > HIVE) with Flink's specific information such as e.g watermarks. It can
> also
> > be a useful tool for ETL processes, e.g. merging two tables into a single
> > one with a different connector.  My suggestion would be to support an
> > optional *Feature T171, “LIKE clause in table definition” *of SQL
> > standard 2008.
> >
> > You can see the description of the proposal here:
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
> >
> > Looking forward for your comments.
> >
> > Best,
> >
> > Dawid
> >
>


Re: [DISCUSS] FLIP-106: Support Python UDF in SQL Function DDL

2020-03-04 Thread Dawid Wysakowicz
Hi all,
I had a really quick look and from my perspective the proposal looks fine.
I share Jarks opinion that the instantiation could be done at a later
stage. I agree with Wei it requires some changes in the internal
implementation of the FunctionCatalog, to store temporary functions as
catalog functions instead of FunctionDefinitions, but we have that on our
agenda anyway. I would suggest investigating if we could do that as part of
this flip already. Nevertheless this in theory can be also done later.

Best,
Dawid

On Mon, 2 Mar 2020, 14:58 Jark Wu,  wrote:

> Thanks for the explanation, Wei!
>
> On Mon, 2 Mar 2020 at 20:59, Wei Zhong  wrote:
>
> > Hi Jark,
> >
> > Thanks for your suggestion.
> >
> > Actually, the timing of starting a Python process depends on the UDF
> type,
> > because the Python process is used to provide the necessary information
> to
> > instantiate the FunctionDefinition object of the Python UDF. For catalog
> > function, the FunctionDefinition will be instantiated when compiling the
> > job, which means the Python process is required during the compilation
> > instead of the registeration. For temporary system function and temporary
> > catalog function, the FunctionDefinition will be instantiated during the
> > UDF registeration, so the Python process need to be started at that time.
> >
> > But this FLIP will only support registering the temporary system function
> > and temporary catalog function in SQL DDL because registering Python UDF
> to
> > catalog is not supported yet. We plan to support the registeration of
> > Python catalog function (via Table API and SQL DDL) in a separate FLIP.
> > I'll add a non-goal section to the FLIP page to illustrate this.
> >
> > Best,
> > Wei
> >
> >
> > > 在 2020年3月2日,15:11,Jark Wu  写道:
> > >
> > > Hi Weizhong,
> > >
> > > Thanks for proposing this feature. In geneal, I'm +1 from the table's
> > view.
> > >
> > > I have one suggestion: I think the register python function into
> catalog
> > > doesn't need to startup python process (the "High Level Sequence
> Diagram"
> > > in your FLIP).
> > > Because only meta-information is persisted into catalog, we don't need
> to
> > > store "return type", "input types" into catalog.
> > > I guess the python process is required when compiling a SQL job.
> > >
> > > Best,
> > > Jark
> > >
> > >
> > >
> > > On Fri, 28 Feb 2020 at 19:04, Benchao Li  wrote:
> > >
> > >> Big +1 for this feature.
> > >>
> > >> We built our SQL platform on Java Table API, and most common UDF are
> > >> implemented in Java. However some python developers are not familiar
> > with
> > >> Java/Scala, and it's very inconvenient for these users to use UDF in
> > SQL.
> > >>
> > >> Wei Zhong  于2020年2月28日周五 下午6:58写道:
> > >>
> > >>> Thank for your reply Dan!
> > >>>
> > >>> By the way, this FLIP is closely related to the SQL API.  @Jark Wu <
> > >>> imj...@gmail.com> @Timo  could you please take a
> > >>> look?
> > >>>
> > >>> Thanks,
> > >>> Wei
> > >>>
> >  在 2020年2月25日,16:25,zoudan  写道:
> > 
> >  +1 for supporting Python UDF in Java/Scala Table API.
> >  This is a great feature and would be helpful for python users!
> > 
> >  Best,
> >  Dan Zou
> > 
> > 
> > >>>
> > >>>
> > >>
> > >> --
> > >>
> > >> Benchao Li
> > >> School of Electronics Engineering and Computer Science, Peking
> > University
> > >> Tel:+86-15650713730
> > >> Email: libenc...@gmail.com; libenc...@pku.edu.cn
> > >>
> > >>
> >
> >
>


Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-23 Thread Dawid Wysakowicz
Hi Timo,

Thank you for the proposal. I think it is an important improvement that
will benefit many parts of the Table API. The proposal looks really good
to me and personally I would be comfortable with voting on the current
state.

Best,

Dawid

On 23/03/2020 18:53, Timo Walther wrote:
> Hi everyone,
>
> I received some questions around how the new interfaces play together
> with formats and their factories.
>
> Furthermore, for MySQL or Postgres CDC logs, the format should be able
> to return a `ChangelogMode`.
>
> Also, I incorporated the feedback around the factory design in general.
>
> I added a new section `Factory Interfaces` to the design document.
> This should be helpful to understand the big picture and connecting
> the concepts.
>
> Please let me know what you think?
>
> Thanks,
> Timo
>
>
> On 18.03.20 13:43, Timo Walther wrote:
>> Hi Benchao,
>>
>> this is a very good question. I will update the FLIP about this.
>>
>> The legacy planner will not support the new interfaces. It will only
>> support the old interfaces. With the next release, I think the Blink
>> planner is stable enough to be the default one as well.
>>
>> Regards,
>> Timo
>>
>> On 18.03.20 08:45, Benchao Li wrote:
>>> Hi Timo,
>>>
>>> Thank you and others for the efforts to prepare this FLIP.
>>>
>>> The FLIP LGTM generally.
>>>
>>> +1 for moving blink data structures to table-common, it's useful to
>>> udf too
>>> in the future.
>>> A little question is, do we plan to support the new interfaces and data
>>> types in legacy planner?
>>> Or we only plan to support these new interfaces in blink planner.
>>>
>>> And using primary keys from DDL instead of derived key information from
>>> each query is also a good idea,
>>> we met some use cases where this does not works very well before.
>>>
>>> This FLIP also makes the dependencies of table modules more clear, I
>>> like
>>> it very much.
>>>
>>> Timo Walther  于2020年3月17日周二 上午1:36写道:
>>>
 Hi everyone,

 I'm happy to present the results of long discussions that we had
 internally. Jark, Dawid, Aljoscha, Kurt, Jingsong, me, and many more
 have contributed to this design document.

 We would like to propose new long-term table source and table sink
 interfaces:


 https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces


 This is a requirement for FLIP-105 and finalizing FLIP-32.

 The goals of this FLIP are:

 - Simplify the current interface architecture:
   - Merge upsert, retract, and append sinks.
   - Unify batch and streaming sources.
   - Unify batch and streaming sinks.

 - Allow sources to produce a changelog:
   - UpsertTableSources have been requested a lot by users. Now
 is the
 time to open the internal planner capabilities via the new interfaces.
   - According to FLIP-105, we would like to support changelogs for
 processing formats such as Debezium.

 - Don't rely on DataStream API for source and sinks:
   - According to FLIP-32, the Table API and SQL should be
 independent
 of the DataStream API which is why the `table-common` module has no
 dependencies on `flink-streaming-java`.
   - Source and sink implementations should only depend on the
 `table-common` module after FLIP-27.
   - Until FLIP-27 is ready, we still put most of the interfaces in
 `table-common` and strictly separate interfaces that communicate
 with a
 planner and actual runtime reader/writers.

 - Implement efficient sources and sinks without planner dependencies:
   - Make Blink's internal data structures available to connectors.
   - Introduce stable interfaces for data structures that can be
 marked as `@PublicEvolving`.
   - Only require dependencies on `flink-table-common` in the
 future

 It finalizes the concept of dynamic tables and consideres how all
 source/sink related classes play together.

 We look forward to your feedback.

 Regards,
 Timo

>>>
>>>
>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Dawid Wysakowicz
Hi Becket,

Answering your question, we have the same intention not to duplicate
connectors between datastream and table apis. The interfaces proposed in
the FLIP are a way to describe relational properties of a source. The
intention is as you described to translate all of those expressed as
expressions or other Table specific structures into a DataStream source.
In other words I think what we are doing here is in line with what you
described.

Best,

Dawid

On 24/03/2020 02:23, Becket Qin wrote:
> Hi Timo,
>
> Thanks for the proposal. I completely agree that the current Table
> connectors could be simplified quite a bit. I haven't finished reading
> everything, but here are some quick thoughts.
>
> Actually to me the biggest question is why should there be two different
> connector systems for DataStream and Table? What is the fundamental reason
> that is preventing us from merging them to one?
>
> The basic functionality of a connector is to provide capabilities to do IO
> and Serde. Conceptually, Table connectors should just be DataStream
> connectors that are dealing with Rows. It seems that quite a few of the
> special connector requirements are just a specific way to do IO / Serde.
> Taking SupportsFilterPushDown as an example, imagine we have the following
> interface:
>
> interface FilterableSource {
> void applyFilterable(Supplier predicate);
> }
>
> And if a ParquetSource would like to support filterable, it will become:
>
> class ParquetSource implements Source, FilterableSource(FilterPredicate> {
> ...
> }
>
> For Table, one just need to provide an predicate supplier that converts an
> Expression to the specified predicate type. This has a few benefit:
> 1. Same unified API for filterable for sources, regardless of DataStream or
> Table.
> 2. The  DataStream users now can also use the ExpressionToPredicate
> supplier if they want to.
>
> To summarize, my main point is that I am wondering if it is possible to
> have a single set of connector interface for both Table and DataStream,
> rather than having two hierarchies. I am not 100% sure if this would work,
> but if it works, this would be a huge win from both code maintenance and
> user experience perspective.
>
> Thanks,
>
> Jiangjie (Becket) Qin
>
>
>
> On Tue, Mar 24, 2020 at 2:03 AM Dawid Wysakowicz 
> wrote:
>
>> Hi Timo,
>>
>> Thank you for the proposal. I think it is an important improvement that
>> will benefit many parts of the Table API. The proposal looks really good
>> to me and personally I would be comfortable with voting on the current
>> state.
>>
>> Best,
>>
>> Dawid
>>
>> On 23/03/2020 18:53, Timo Walther wrote:
>>> Hi everyone,
>>>
>>> I received some questions around how the new interfaces play together
>>> with formats and their factories.
>>>
>>> Furthermore, for MySQL or Postgres CDC logs, the format should be able
>>> to return a `ChangelogMode`.
>>>
>>> Also, I incorporated the feedback around the factory design in general.
>>>
>>> I added a new section `Factory Interfaces` to the design document.
>>> This should be helpful to understand the big picture and connecting
>>> the concepts.
>>>
>>> Please let me know what you think?
>>>
>>> Thanks,
>>> Timo
>>>
>>>
>>> On 18.03.20 13:43, Timo Walther wrote:
>>>> Hi Benchao,
>>>>
>>>> this is a very good question. I will update the FLIP about this.
>>>>
>>>> The legacy planner will not support the new interfaces. It will only
>>>> support the old interfaces. With the next release, I think the Blink
>>>> planner is stable enough to be the default one as well.
>>>>
>>>> Regards,
>>>> Timo
>>>>
>>>> On 18.03.20 08:45, Benchao Li wrote:
>>>>> Hi Timo,
>>>>>
>>>>> Thank you and others for the efforts to prepare this FLIP.
>>>>>
>>>>> The FLIP LGTM generally.
>>>>>
>>>>> +1 for moving blink data structures to table-common, it's useful to
>>>>> udf too
>>>>> in the future.
>>>>> A little question is, do we plan to support the new interfaces and data
>>>>> types in legacy planner?
>>>>> Or we only plan to support these new interfaces in blink planner.
>>>>>
>>>>> And using primary keys from DDL instead of derived key information from
>>>>> each query is also a good idea,
>>>&

Re: [DISCUSS] FLIP-110: Support LIKE clause in CREATE TABLE

2020-03-24 Thread Dawid Wysakowicz
Sorry for a late reply, but I was on vacation.

As for putting the LIKE after the schema part. You're right, sql
standard lets it be only in the schema part. I was mislead by examples
for DB2 and MYSQL, which differ from the standard in that respect. My
bad, sorry.

Nevertheless I'd still be in favour of using the LIKE clause for that
purpose rather than INHERITS. I'm fine with putting it after the schema
part. The argument that it applies to the options part make sense to me.

I must admit I am not a fan of the INHERITS clause. @Jar I'd not
redefine the semantics of the INHERITS clause entirely. I am sure it
will pose unnecessary confusion if it differs significantly from what
was implemented for, let's be true, more popular vendors such as
PostgreSQL. My biggest concern is that the INHERITS clause in PostgreSQL
allows constructs such as SELECT * FROM ONLY B (where e.g. A INHERITS
B). My understanding of the purpose of the INHERITS clause is that it
really emulates inheritance that let's you create "nested" data sets. I
think what we are more interested in is a way to adjust only the
metadata of an already existing table.

Moreover I prefer the LIKE clause as it is more widespread. In some way
it is supported by PostgreSQL, DB2, SnowflakeDB, MySQL.

Lastly @Jingsong, I am not sure about the "link" part. I know at first
glance having a link and reflecting changes might seem appealing, but I
am afraid it would pose more threads than it would give benefits. First
of all it would make the LIKE/INHERITS clause unusable for creating e.g.
hive tables or jdbc tables that could be used from other systems, as the
link would not be understandable by those systems.

Best,

Dawid



On 05/03/2020 07:46, Jark Wu wrote:
> Hi Dawid,
>
>> INHERITS creates a new table with a "link" to the original table.
> Yes, INHERITS is a "link" to the original table in PostgreSQL.
> But INHERITS is not SQL standard, I think it's fine for vendors to define
> theire semantics.
>
>> Standard also allows declaring the clause after the schema part. We can
> also do it.
> Is that true? I didn't find it in SQL standard. If this is true, I prefer
> to put LIKE after the schema part.
>
> 
>
> Hi Jingsong,
>
> The concern you mentioned in (2) is exactly my concern too. That's why I
> suggested INHERITS, or put LIKE after schema part.
>
> Best,
> Jark
>
> On Thu, 5 Mar 2020 at 12:05, Jingsong Li  wrote:
>
>> Thanks Dawid for starting this discussion.
>>
>> I like the "LIKE".
>>
>> 1.For "INHERITS", I think this is a good feature too, yes, ALTER TABLE will
>> propagate any changes in column data definitions and check constraints down
>> the inheritance hierarchy. A inherits B, A and B share every things, they
>> have the same kafka topic. If modify schema of B, this means underlying
>> kafka topic schema changed, so I think it is good to modify A too. If this
>> for "ConfluentSchemaRegistryCatalog" mention by Jark, I think sometimes
>> this is just we want.
>> But "LIKE" also very useful for many cases.
>>
>> 2.For LIKE statement in schema, I know two kinds of like syntax, one is
>> MySQL/hive/sqlserver, the other is PostgreSQL. I prefer former:
>> - In the FLIP, there is "OVERWRITING OPTIONS", this will overwrite
>> properties in "with"? This looks weird, because "LIKE" is in schema, but it
>> can affect outside properties.
>>
>> Best,
>> Jingsong Lee
>>
>> On Wed, Mar 4, 2020 at 2:05 PM Dawid Wysakowicz 
>> wrote:
>>
>>> Hi Jark,
>>> I did investigate the INHERITS clause, but it has a semantic that in my
>>> opinion we definitely don't want to support. INHERITS creates a new table
>>> with a "link" to the original table. Therefore if you e.g change the
>> schema
>>> of the original table it's also reflected in the child table. It's also
>>> possible for tables like A inherits B query them like Select * from only
>> A,
>>> by default it returns results from both tables. I am pretty sure it's not
>>> what we're looking for.
>>>
>>> PostgreSQL implements both the LIKE clause and INHERITS. I am open for
>>> discussion if we should support multiple LIKE statements or not. Standard
>>> also allows declaring the clause after the schema part. We can also do
>> it.
>>> Nevertheless I think including multiple tables might be useful, e.g. when
>>> you want to union two tables and output to the same Kafka cluster and
>> just
>>> change the target top

Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-24 Thread Dawid Wysakowicz
;>>> It's really great that we have the same goal. I am actually wondering
>> if
>>> we
>>>> can go one step further to avoid some of the interfaces in Table as
>> well.
>>>> For example, if we have the FilterableSource, do we still need the
>>>> FilterableTableSource? Should DynamicTableSource just become a
>>>> Source<*Row*,
>>>> SourceSplitT, EnumChkT>?
>>>>
>>>> Can you help me understand a bit more about the reason we need the
>>>> following relational representation / wrapper interfaces v.s. the
>>>> interfaces that we could put to the Source in FLIP-27?
>>>>
>>>> DynamicTableSource v.s. Source
>>>> SupportsFilterablePushDown v.s. FilterableSource
>>>> SupportsProjectablePushDown v.s. ProjectableSource
>>>> SupportsWatermarkPushDown v.s. WithWatermarkAssigner
>>>> SupportsComputedColumnPushDown v.s. ComputedColumnDeserializer
>>>> ScanTableSource v.s. ChangeLogDeserializer.
>>>> LookUpTableSource v.s. LookUpSource
>>>>
>>>> Assuming we have all the interfaces on the right side, do we still need
>>> the
>>>> interfaces on the left side? Note that the interfaces on the right can
>> be
>>>> used by both DataStream and Table. If we do this, there will only be
>> one
>>>> set of Source interfaces Table and DataStream, the only difference is
>>> that
>>>> the Source for table will have some specific plugins and
>> configurations.
>>> An
>>>> omnipotent Source can implement all the the above interfaces and take a
>>>> Deserializer that implements both ComputedColumnDeserializer and
>>>> ChangeLogDeserializer.
>>>>
>>>> Would the SQL planner work with that?
>>>>
>>>> Thanks,
>>>>
>>>> Jiangjie (Becket) Qin
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Tue, Mar 24, 2020 at 5:03 PM Jingsong Li 
>>>> wrote:
>>>>
>>>>> +1. Thanks Timo for the design doc.
>>>>>
>>>>> We can also consider @Experimental too. But I am +1 to
>> @PublicEvolving,
>>>> we
>>>>> should be confident in the current change.
>>>>>
>>>>> Best,
>>>>> Jingsong Lee
>>>>>
>>>>> On Tue, Mar 24, 2020 at 4:30 PM Timo Walther 
>>> wrote:
>>>>>> @Becket: We totally agree that we don't need table specific
>>> connectors
>>>>>> during runtime. As Dawid said, the interfaces proposed here are
>> just
>>>> for
>>>>>> communication with the planner. Once the properties (watermarks,
>>>>>> computed column, filters, projecttion etc.) are negotiated, we can
>>>>>> configure a regular Flink connector.
>>>>>>
>>>>>> E.g. setting the watermark assigner and deserialization schema of a
>>>>>> Kafka connector.
>>>>>>
>>>>>> For better separation of concerns, Flink connectors should not
>>> include
>>>>>> relational interfaces and depend on flink-table. This is the
>>>>>> responsibility of table source/sink.
>>>>>>
>>>>>> @Kurt: I would like to mark them @PublicEvolving already because we
>>>> need
>>>>>> to deprecate the old interfaces as early as possible. We cannot
>>>> redirect
>>>>>> to @Internal interfaces. They are not marked @Public, so we can
>> still
>>>>>> evolve them. But a core design shift should not happen again, it
>>> would
>>>>>> leave a bad impression if we are redesign over and over again.
>>> Instead
>>>>>> we should be confident in the current change.
>>>>>>
>>>>>> Regards,
>>>>>> Timo
>>>>>>
>>>>>>
>>>>>> On 24.03.20 09:20, Dawid Wysakowicz wrote:
>>>>>>> Hi Becket,
>>>>>>>
>>>>>>> Answering your question, we have the same intention not to
>>> duplicate
>>>>>>> connectors between datastream and table apis. The interfaces
>>> proposed
>>>>> in
>>>>>>> the FLIP are a way to describe relational properties of a source.
>>> The
>>>>>>> in

Re: [DISCUSS] FLIP-110: Support LIKE clause in CREATE TABLE

2020-03-25 Thread Dawid Wysakowicz
Thank you for your opinions. I updated the FLIP with results of the
discussion. Let me know if you have further concerns.

Best,

Dawid

On 05/03/2020 07:46, Jark Wu wrote:
> Hi Dawid,
>
>> INHERITS creates a new table with a "link" to the original table.
> Yes, INHERITS is a "link" to the original table in PostgreSQL.
> But INHERITS is not SQL standard, I think it's fine for vendors to define
> theire semantics.
>
>> Standard also allows declaring the clause after the schema part. We can
> also do it.
> Is that true? I didn't find it in SQL standard. If this is true, I prefer
> to put LIKE after the schema part.
>
> 
>
> Hi Jingsong,
>
> The concern you mentioned in (2) is exactly my concern too. That's why I
> suggested INHERITS, or put LIKE after schema part.
>
> Best,
> Jark
>
> On Thu, 5 Mar 2020 at 12:05, Jingsong Li  wrote:
>
>> Thanks Dawid for starting this discussion.
>>
>> I like the "LIKE".
>>
>> 1.For "INHERITS", I think this is a good feature too, yes, ALTER TABLE will
>> propagate any changes in column data definitions and check constraints down
>> the inheritance hierarchy. A inherits B, A and B share every things, they
>> have the same kafka topic. If modify schema of B, this means underlying
>> kafka topic schema changed, so I think it is good to modify A too. If this
>> for "ConfluentSchemaRegistryCatalog" mention by Jark, I think sometimes
>> this is just we want.
>> But "LIKE" also very useful for many cases.
>>
>> 2.For LIKE statement in schema, I know two kinds of like syntax, one is
>> MySQL/hive/sqlserver, the other is PostgreSQL. I prefer former:
>> - In the FLIP, there is "OVERWRITING OPTIONS", this will overwrite
>> properties in "with"? This looks weird, because "LIKE" is in schema, but it
>> can affect outside properties.
>>
>> Best,
>> Jingsong Lee
>>
>> On Wed, Mar 4, 2020 at 2:05 PM Dawid Wysakowicz 
>> wrote:
>>
>>> Hi Jark,
>>> I did investigate the INHERITS clause, but it has a semantic that in my
>>> opinion we definitely don't want to support. INHERITS creates a new table
>>> with a "link" to the original table. Therefore if you e.g change the
>> schema
>>> of the original table it's also reflected in the child table. It's also
>>> possible for tables like A inherits B query them like Select * from only
>> A,
>>> by default it returns results from both tables. I am pretty sure it's not
>>> what we're looking for.
>>>
>>> PostgreSQL implements both the LIKE clause and INHERITS. I am open for
>>> discussion if we should support multiple LIKE statements or not. Standard
>>> also allows declaring the clause after the schema part. We can also do
>> it.
>>> Nevertheless I think including multiple tables might be useful, e.g. when
>>> you want to union two tables and output to the same Kafka cluster and
>> just
>>> change the target topic. I know it's not a very common use case but it's
>>> not a big effort to support it.
>>>
>>> Let me know what you think.
>>>
>>> Best,
>>> Dawid
>>>
>>> On Wed, 4 Mar 2020, 04:55 Jark Wu,  wrote:
>>>
>>>> Hi Dawid,
>>>>
>>>> Thanks for starting this discussion. I like the idea.
>>>> Once we support more intergrated catalogs,
>>>> e.g. ConfluentSchemaRegistryCatalog, this problem will be more urgent.
>>>> Because it's very common to adjust existing tables in catalog slightly.
>>>>
>>>> My initial thought was introducing INHERITS keyword, which is also
>>>> supported in PostgreSQL [1].
>>>> This is also similar to the functionality of Hive CREATE TABLE LIKE
>> [2].
>>>> CREATE TEMPORARY TABLE MyTable (WATERMARK FOR ts) INHERITS
>>>> cat.db.KafkoTopic
>>>> CREATE TEMPORARY TABLE MyTable (WATERMARK FOR ts) INHERITS
>>>> cat.db.KafkoTopic WITH ('k' = 'v')
>>>>
>>>> The INHERITS can inherit an existing table with all columns, watermark,
>>> and
>>>> properties, but the properties and watermark and be overwrited
>>> explicitly.
>>>> The reason I prefer INHERITS rather than LIKE is the keyword position.
>> We
>>>> are copying an existing table definition including the properties.
>>>> However, LIKE appears in the 

Re: Flink CEP greedy match of single pattern

2020-03-25 Thread Dawid Wysakowicz
Hi Dominik,

I think you are hitting a bug. The greedy quantifier does not work well
if applied for the last element of a pattern. There is a jira issue to
improve support for greedy qualifier[1].

You could work it around with adding an additional state at the end. E.g. :

Pattern.begin[AccelVector](EventPatternName,
AfterMatchSkipStrategy.skipPastLastEvent())
  .where(_.data() > Threshold)
  .oneOrMore
  .greedy
  .consecutive()
  .next("b")
  .where(BooleanConditions.true())
  .within(Time.minutes(1))

Best,

Dawid

[1] https://issues.apache.org/jira/browse/FLINK-10587



On 25/03/2020 10:07, Dominik Wosiński wrote:
> P.S.
> So now my pattern looks like this:
>
> Pattern.begin[AccelVector](EventPatternName,
> AfterMatchSkipStrategy.skipPastLastEvent())
>   .where(_.data() > Threshold)
>   .oneOrMore
>   .greedy
>   .consecutive()
>   .within(Time.minutes(1))
>
>
> śr., 25 mar 2020 o 10:03 Dominik Wosiński  napisał(a):
>
>> Hey, thanks for the answer.
>>
>> But if I add the *AfterMatchSkipStrategy* it simply seems to emit event
>> by event so in the case described above it does emit: [400], [500]
>> Shouldn't the *greedy* quantifier guarantee that this will be matched as
>> many times as possible thus creating [400, 500] ??
>>
>> Thanks again.
>> Best Regards,
>> Dom.
>>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-95: New TableSource and TableSink interfaces

2020-03-26 Thread Dawid Wysakowicz
oming more important.  It helps Flink SQL go
>> smoothly
>>>> in
>>>>>> the future, and also
>>>>>> make it easier for new contributors. But I would admit this is not
>> that
>>>>>> obvious for others who don't work
>>>>>> with SQL frequently.
>>>>>>
>>>>>> Best,
>>>>>> Kurt
>>>>>>
>>>>>>
>>>>>> On Wed, Mar 25, 2020 at 11:07 AM Becket Qin 
>>>> wrote:
>>>>>>> Hi Jark,
>>>>>>>
>>>>>>> It is good to know that we do not expect the end users to touch those
>>>>>>> interfaces.
>>>>>>>
>>>>>>> Then the question boils down to whether the connector developers
>> should
>>>>>> be
>>>>>>> aware of the interfaces that are only used by the SQL optimizer. It
>>>>>> seems a
>>>>>>> win if we can avoid that.
>>>>>>>
>>>>>>> Two potential solutions off the top of my head are:
>>>>>>> 1. An internal helper class doing the instanceOf based on DataStream
>>>>>> source
>>>>>>> interface and create pluggables for that DataStream source.
>>>>>>> 2. codegen the set of TableSource interfaces given a DataStream
>> Source
>>>>>> and
>>>>>>> its corresponding TablePluggablesFactory.
>>>>>>>
>>>>>>> Thanks,
>>>>>>>
>>>>>>> Jiangjie (Becket) Qin
>>>>>>>
>>>>>>> On Wed, Mar 25, 2020 at 10:07 AM Jark Wu  wrote:
>>>>>>>
>>>>>>>> Hi Becket,
>>>>>>>>
>>>>>>>> Regarding to Flavor1 and Flavor2, I want to clarify that user will
>>>>>> never
>>>>>>>> use table source like this:
>>>>>>>>
>>>>>>>> {
>>>>>>>>  MyTableSource myTableSource = MyTableSourceFactory.create();
>>>>>>>>  myTableSource.setSchema(mySchema);
>>>>>>>>  myTableSource.applyFilterPredicate(expression);
>>>>>>>>  ...
>>>>>>>> }
>>>>>>>>
>>>>>>>> TableFactory and TableSource are not directly exposed to end users,
>>>> all
>>>>>>> the
>>>>>>>> methods are called by planner, not users.
>>>>>>>> Users always use DDL or descriptor to register a table, and planner
>>>>>> will
>>>>>>>> find the factory and create sources according to the properties.
>>>>>>>> All the optimization are applied automatically, e.g.
>> filter/projection
>>>>>>>> pushdown, users don't need to call `applyFilterPredicate`
>> explicitly.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, 25 Mar 2020 at 09:25, Becket Qin 
>>>> wrote:
>>>>>>>>> Hi Timo and Dawid,
>>>>>>>>>
>>>>>>>>> Thanks for the clarification. They really help. You are right that
>> we
>>>>>>> are
>>>>>>>>> on the same page regarding the hierarchy. I think the only
>> difference
>>>>>>>>> between our view is the flavor of the interfaces. There are two
>>>>>> flavors
>>>>>>>> of
>>>>>>>>> the source interface for DataStream and Table source.
>>>>>>>>>
>>>>>>>>> *Flavor 1. Table Sources are some wrapper interfaces around
>>>>>> DataStream
>>>>>>>>> source.*
>>>>>>>>> Following this way, we will reach the design of the current
>> proposal,
>>>>>>>> i.e.
>>>>>>>>> each pluggable exposed in the DataStream source will have a
>>>>>>> corresponding
>>>>>>>>> TableSource interface counterpart, which are at the Factory level.
>>>>>>> Users
>>>>>>>>> will write code like this:
>>>>>>>>>
>>>>>>>>> {
>>>>>>>>>   MyTableSource myTableSource = MyTa

Re: [Discuss] FLINK-16039 Add API method to get last element in session window

2020-03-26 Thread Dawid Wysakowicz
Hi Manas,

First of all I think your understanding of how the session windows work
is correct.

I tend to slightly disagree that the end for a session window is wrong.
It is my personal opinion though. I see it this way that a TimeWindow in
case of a session window is the session itself. The session always ends
after a period of inactivity. Take a user session on a webpage. Such a
session does not end/isn't brought down at the time of a last event. It
is closed after a period of inactivity. In such scenario I think the
behavior of the session window is correct.

Moreover you can achieve what you are describing with an aggregate[1]
function. You can easily maintain the biggest number seen for a window.

Lastly, I think the overall feeling in the community is that we are very
skeptical towards extending the Windows API. From what I've heard and
experienced the ProcessFunction[2] is a much better principle to build
custom solutions upon, as in fact its easier to control and even
understand. That said I am rather against introducing that change.

Best,

Dawid

[1]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/windows.html#aggregatefunction

[2]
https://ci.apache.org/projects/flink/flink-docs-release-1.10/dev/stream/operators/process_function.html#process-function-low-level-operations

On 13/03/2020 09:46, Manas Kale wrote:
> Hi all,
> I would like to start a discussion on this feature request (JIRA link).
> 
>
> Consider the events :
>
> [1, event], [2, event]
>
> where first element is event timestamp in seconds and second element is
> event code/name.
>
> Also consider that an Event time session window with inactivityGap = 2
> seconds is acting on above stream.
>
> When the first event arrives, a session window should be created that is
> [1,1].
>
> When the second event arrives, a new session window should be created that
> is [2,2]. Since this falls within firstWindowTimestamp+inactivityGap, it
> should be merged into session window [1,2] and  [2,2] should be deleted.
>
>
> *This is my understanding of how session windows are created. Please
> correct me if wrong.*
> However, Flink does not follow such a definition of windows semantically.
> If I call the  getEnd() method of the TimeWindow() class, I get back
> timestamp + inactivityGap.
>
> For the above example, after processing the first element, I would get 1 +
> 2 = 3 seconds as the window "end".
>
> The actual window end should be the timestamp 1, which is the last event in
> the session window.
>
> A solution would be to change the "end" definition of all windows, but I
> suppose this would be breaking and would need some debate.
>
> Therefore, I propose an intermediate solution : add a new API method that
> keeps track of the last element added in the session window.
>
> If there is agreement on this, I would like to start drafting a change
> document and implement this.
>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-30 Thread Dawid Wysakowicz
Hi all,

I like the overall design of the FLIP.

As for the withstanding concerns. I kind of like the approach to put the
version into the factory identifier. I think it's the cleanest way to
say that this version actually applies to the connector itself and not
to the system it connects to. BTW, I think the outcome of this
discussion will affect interfaces described in FLIP-95. If we put the
version into the functionIdentifier, do we need Factory#factoryVersion?

I also think it does make sense to have a versioning for the properties
as well. Are we able to read all the current properties with the new
factories? I think we could use the "connector.property-version" to
alternate between different Factory interfaces to support the old set of
properties. Otherwise the new factories need to understand both set of
properties, don't they?

Best,

Dawid

On 30/03/2020 17:07, Timo Walther wrote:
> Hi Jark,
>
> thanks for the FLIP. I don't have a strong opinion on
> DynamicTableFactory#factoryVersion() for distiguishing connector
> versions. We can also include it in the factory identifier itself. For
> Kafka, I would then just use `kafka` because the naming "universal"
> was just a helper solution at that time.
>
> What we need is a good guide for how to design the options. We should
> include that in the JavaDoc of factory interface itself, once it is
> in-place. I like the difference between source and sink in properties.
>
> Regarding "connector.property-version":
>
> It was meant for backwards compatibility along the dimension of
> "property design" compared to "connector version". However, since the
> connector is now responsible for parsing all properties, it is not as
> necessary as it was before.
>
> However, a change of the property design as it is done in FLIP-107
> becomes more difficult without a property version and versioning is
> always a good idea in API world.
>
> Regards,
> Timo
>
>
> On 30.03.20 16:38, Benchao Li wrote:
>> Hi Jark,
>>
>> Thanks for starting the discussion. The FLIP LGTM generally.
>>
>> Regarding connector version VS put version into connector's name,
>> I favor the latter personally, using one property to locate a
>> connector can make the error log more precise.
>>  From the users' side, using one property to match a connector will
>> be easier. Especially we have many connectors,
>> and some of the need version property required, and some of them not.
>>
>> Regarding Jingsong's suggestion,
>> IMO, it's a very good complement to the FLIP. Distinguishing
>> properties for source and sink can be very useful, and
>> also this will make the connector property more precise.
>> We are also sick of this for now, we cannot know whether a DDL is a
>> source or sink unless we look through all queries where
>> the table is used.
>> Even more, some of the required properties are only required for
>> source, bug we cannot leave it blank for sink, and vice versa.
>> I think we can also add a type for dimension tables except source and
>> sink.
>>
>> Kurt Young mailto:ykt...@gmail.com>> 于2020年3月30日
>> 周一 下午8:16写道:
>>
>>  > It's not possible for the framework to throw such exception.
>>     Because the
>>     framework doesn't know what versions do the connector support.
>>
>>     Not really, we can list all valid connectors framework could
>> found. E.g.
>>     user mistyped 'kafka-0.x', the error message will looks like:
>>
>>     we don't have any connector named "kafka-0.x", but we have:
>>     FileSystem
>>     Kafka-0.10
>>     Kafka-0.11
>>     ElasticSearch6
>>     ElasticSearch7
>>
>>     Best,
>>     Kurt
>>
>>
>>     On Mon, Mar 30, 2020 at 5:11 PM Jark Wu >     > wrote:
>>
>>  > Hi Kurt,
>>  >
>>  > > 2) Lists all available connectors seems also quite
>>     straightforward, e.g
>>  > user provided a wrong "kafka-0.8", we tell user we have
>> candidates of
>>  > "kakfa-0.11", "kafka-universal"
>>  > It's not possible for the framework to throw such exception.
>>     Because the
>>  > framework doesn't know what versions do the connector support.
>>     All the
>>  > version information is a blackbox in the identifier. But with
>>  > `Factory#factoryVersion()` interface, we can know all the
>> supported
>>  > versions.
>>  >
>>  > > 3) I don't think so. We can still treat it as the same
>>     connector but with
>>  > different versions.
>>  > That's true but that's weird. Because from the plain DDL
>>     definition, they
>>  > look like different connectors with different "connector"
>> value, e.g.
>>  > 'connector=kafka-0.8', 'connector=kafka-0.10'.
>>  >
>>  > > If users don't set any version, we will use "kafka-universal"
>>     instead.
>>  > The behavior is inconsistent IMO.
>>  > That is a long term vision when there is no kafka clusters
>> with <0.11
>>  > version.
>>  > At that point, "universal" is the only supported version in Flink
>>     and the
>>    

Re: [DISCUSS] FLIP-122: New Connector Property Keys for New Factory

2020-03-31 Thread Dawid Wysakowicz
nes there the such
>>> key, such as “connector.properties.abc” “connector.properties.def”, or
>>> should we inline them, such as “some-key-prefix” = “k1=v1, k2=v2 ..."
>>> • Should the ConfigOption support the wildcard ? (If we plan to support
>>> the current multi-line style)
>>>
>>>
>>> Best,
>>> Danny Chan
>>> 在 2020年3月31日 +0800 AM12:37,Jark Wu ,写道:
>>>> Hi all,
>>>>
>>>> Thanks for the feedbacks.
>>>>
>>>> It seems that we have a conclusion to put the version into the factory
>>>> identifier. I'm also fine with this.
>>>> If we have this outcome, the interface of Factory#factoryVersion is not
>>>> needed anymore, this can simplify the learning cost of new factory.
>>>> We may need to update FLIP-95 and re-vote for it? cc @Timo Walther
>>>> 
>>>>
>>>> kafka => kafka for 0.11+ versions, we don't suffix "-universal", because
>>>> the meaning of "universal" not easy to understand.
>>>> kafka-0.11 => kafka for 0.11 version
>>>> kafka-0.10 => kafka for 0.10 version
>>>> elasticsearch-6 => elasticsearch for 6.x versions
>>>> elasticsearch-7 => elasticsearch for 7.x versions
>>>> hbase-1.4 => hbase for 1.4.x versions
>>>> jdbc
>>>> filesystem
>>>>
>>>> We use "-" as the version delimiter to make them to be more consistent.
>>>> This is not forces, users can also use other delimiters or without
>>>> delimiter.
>>>> But this can be a guilde in the Javadoc of Factory, to make the
>>> connector
>>>> ecosystem to be more consistent.
>>>>
>>>> What do you think?
>>>>
>>>> 
>>>>
>>>> Regarding "connector.property-version":
>>>>
>>>> Hi @Dawid Wysakowicz  , the new fatories are
>>>> designed not support to read current properties.
>>>> All the current properties are routed to the old factories if they are
>>>> using "connector.type". Otherwise, properties are routed to new
>>> factories.
>>>> If I understand correctly, the "connector.property-version" is attched
>>>> implicitly by system, not manually set by users.
>>>> For example, the framework should add "connector.property-version=1" to
>>>> properties when processing DDL statement.
>>>> I'm fine to add a "connector.property-version=1" when processing DDL
>>>> statement, but I think it's also fine if we don't,
>>>> because this can be done in the future if need and the default version
>>> can
>>>> be 1.
>>>>
>>>> Best,
>>>> Jark
>>>>
>>>> On Tue, 31 Mar 2020 at 00:36, Jark Wu  wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Thanks for the feedbacks.
>>>>>
>>>>> It seems that we have a conclusion to put the version into the factory
>>>>> identifier. I'm also fine with this.
>>>>> If we have this outcome, the interface of Factory#factoryVersion is
>>> not
>>>>> needed anymore, this can simplify the learning cost of new factory.
>>>>> We may need to update FLIP-95 and re-vote for it? cc @Timo Walther
>>>>> 
>>>>>
>>>>> Btw, I would like to use "_" instead of "-" as the version delimiter,
>>>>> because "-" looks like minus and may confuse users, e.g.
>>> "elasticsearch-6".
>>>>> This is not forced, but should be a guilde in the Javadoc of Factory.
>>>>> I propose to use the following identifiers for existing connectors,
>>>>>
>>>>> kafka => kafka for 0.11+ versions, we don't suffix "-universal",
>>> because
>>>>> the meaning of "universal" not easy to understand.
>>>>> kafka-0.11 => kafka for 0.11 version
>>>>> kafka-0.10 => kafka for 0.10 version
>>>>> elasticsearch-6 => elasticsearch for 6.x versions
>>>>> elasticsearch-7 => elasticsearch for 7.x versions
>>>>> hbase-1.4 => hbase for 1.4.x versions
>>>>> jdbc
>>>>> filesystem
>>>>>
>>>>> We use "-" as the version delimiter to make them to be more
>>

Re: [DISCUSS] FLIP-110: Support LIKE clause in CREATE TABLE

2020-03-31 Thread Dawid Wysakowicz
Hi Timo,

I think your suggestion makes sense. I updated the document.

As there are no more comments I will start a vote for it.

Best,

Dawid

On 30/03/2020 16:40, Timo Walther wrote:
> Hi Dawid,
>
> thanks for updating the FLIP. One minor comment from my side, should
> we move the LIKE clause to the very end?
>
> CREATE TABLE X () WITH () LIKE ...
>
> Otherwise, the LIKE clause looks a bit lost if there are options
> afterwards. Otherwise, +1 for start a vote from my side.
>
> Regards,
> Timo
>
>
> On 25.03.20 15:30, Dawid Wysakowicz wrote:
>> Thank you for your opinions. I updated the FLIP with results of the
>> discussion. Let me know if you have further concerns.
>>
>> Best,
>>
>> Dawid
>>
>> On 05/03/2020 07:46, Jark Wu wrote:
>>> Hi Dawid,
>>>
>>>> INHERITS creates a new table with a "link" to the original table.
>>> Yes, INHERITS is a "link" to the original table in PostgreSQL.
>>> But INHERITS is not SQL standard, I think it's fine for vendors to
>>> define
>>> theire semantics.
>>>
>>>> Standard also allows declaring the clause after the schema part. We
>>>> can
>>> also do it.
>>> Is that true? I didn't find it in SQL standard. If this is true, I
>>> prefer
>>> to put LIKE after the schema part.
>>>
>>> 
>>>
>>> Hi Jingsong,
>>>
>>> The concern you mentioned in (2) is exactly my concern too. That's
>>> why I
>>> suggested INHERITS, or put LIKE after schema part.
>>>
>>> Best,
>>> Jark
>>>
>>> On Thu, 5 Mar 2020 at 12:05, Jingsong Li 
>>> wrote:
>>>
>>>> Thanks Dawid for starting this discussion.
>>>>
>>>> I like the "LIKE".
>>>>
>>>> 1.For "INHERITS", I think this is a good feature too, yes, ALTER
>>>> TABLE will
>>>> propagate any changes in column data definitions and check
>>>> constraints down
>>>> the inheritance hierarchy. A inherits B, A and B share every
>>>> things, they
>>>> have the same kafka topic. If modify schema of B, this means
>>>> underlying
>>>> kafka topic schema changed, so I think it is good to modify A too.
>>>> If this
>>>> for "ConfluentSchemaRegistryCatalog" mention by Jark, I think
>>>> sometimes
>>>> this is just we want.
>>>> But "LIKE" also very useful for many cases.
>>>>
>>>> 2.For LIKE statement in schema, I know two kinds of like syntax,
>>>> one is
>>>> MySQL/hive/sqlserver, the other is PostgreSQL. I prefer former:
>>>> - In the FLIP, there is "OVERWRITING OPTIONS", this will overwrite
>>>> properties in "with"? This looks weird, because "LIKE" is in
>>>> schema, but it
>>>> can affect outside properties.
>>>>
>>>> Best,
>>>> Jingsong Lee
>>>>
>>>> On Wed, Mar 4, 2020 at 2:05 PM Dawid Wysakowicz
>>>> 
>>>> wrote:
>>>>
>>>>> Hi Jark,
>>>>> I did investigate the INHERITS clause, but it has a semantic that
>>>>> in my
>>>>> opinion we definitely don't want to support. INHERITS creates a
>>>>> new table
>>>>> with a "link" to the original table. Therefore if you e.g change the
>>>> schema
>>>>> of the original table it's also reflected in the child table. It's
>>>>> also
>>>>> possible for tables like A inherits B query them like Select *
>>>>> from only
>>>> A,
>>>>> by default it returns results from both tables. I am pretty sure
>>>>> it's not
>>>>> what we're looking for.
>>>>>
>>>>> PostgreSQL implements both the LIKE clause and INHERITS. I am open
>>>>> for
>>>>> discussion if we should support multiple LIKE statements or not.
>>>>> Standard
>>>>> also allows declaring the clause after the schema part. We can
>>>>> also do
>>>> it.
>>>>> Nevertheless I think including multiple tables might be useful,
>>>>> e.g. when
>>>>> you want to union two tables and output to the same Kafka cluster and
>>>> just
>>>>> change the tar

[VOTE] FLIP-110: Support LIKE clause in CREATE TABLE

2020-03-31 Thread Dawid Wysakowicz
Hi all,

I would like to start the vote for FLIP-110 [1], which is discussed and
reached a consensus in the discussion thread [2].

The vote will be open until April 3rd (72h), unless there is an
objection or not enough votes.

Best,

Dawid

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE

[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-110-Support-LIKE-clause-in-CREATE-TABLE-td38378.html/
/



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-110: Support LIKE clause in CREATE TABLE

2020-03-31 Thread Dawid Wysakowicz
Hi Jingsong,

I added a short description for the options:

  * CONSTRAINTS: primary keys, unique key, does not include NOT NULL
constraint (in Flink it's part of the type)
  * GENERATED: computed columns
  * OPTIONS: connector properties in WITH (...) clause

I think partitions are a valid point. I think they are not included in
any of the options. It makes sense to include them as well. I would
suggest adding

INCLUDING | EXCLUDING PARTITIONS as another alternative.

I will not cancel the vote for now, as the comment came soon after
starting the vote. If anyone thinks I should give more time to discuss
the partitions topic, feel free to comment in this thread.

Best,

Dawid

On 31/03/2020 10:05, Jingsong Li wrote:
> Hi Dawid,
>
> Just two small questions:
> - Can you explain more about "CONSTRAINTS, GENERATED, OPTIONS" in the FLIP?
> I can image the meaning of "CONSTRAINTS, OPTIONS" in the example, but it is
> hard to guess "GENERATED".
> - Which category does partition keys belong to?
>
> (I am sorry if I've disturbed the vote thread, because in my Gmail view,
> they're the same thread.)
>
> Best,
> Jingsong Lee
>
> On Tue, Mar 31, 2020 at 3:30 PM Dawid Wysakowicz 
> wrote:
>
>> Hi Timo,
>>
>> I think your suggestion makes sense. I updated the document.
>>
>> As there are no more comments I will start a vote for it.
>>
>> Best,
>>
>> Dawid
>>
>> On 30/03/2020 16:40, Timo Walther wrote:
>>> Hi Dawid,
>>>
>>> thanks for updating the FLIP. One minor comment from my side, should
>>> we move the LIKE clause to the very end?
>>>
>>> CREATE TABLE X () WITH () LIKE ...
>>>
>>> Otherwise, the LIKE clause looks a bit lost if there are options
>>> afterwards. Otherwise, +1 for start a vote from my side.
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> On 25.03.20 15:30, Dawid Wysakowicz wrote:
>>>> Thank you for your opinions. I updated the FLIP with results of the
>>>> discussion. Let me know if you have further concerns.
>>>>
>>>> Best,
>>>>
>>>> Dawid
>>>>
>>>> On 05/03/2020 07:46, Jark Wu wrote:
>>>>> Hi Dawid,
>>>>>
>>>>>> INHERITS creates a new table with a "link" to the original table.
>>>>> Yes, INHERITS is a "link" to the original table in PostgreSQL.
>>>>> But INHERITS is not SQL standard, I think it's fine for vendors to
>>>>> define
>>>>> theire semantics.
>>>>>
>>>>>> Standard also allows declaring the clause after the schema part. We
>>>>>> can
>>>>> also do it.
>>>>> Is that true? I didn't find it in SQL standard. If this is true, I
>>>>> prefer
>>>>> to put LIKE after the schema part.
>>>>>
>>>>> 
>>>>>
>>>>> Hi Jingsong,
>>>>>
>>>>> The concern you mentioned in (2) is exactly my concern too. That's
>>>>> why I
>>>>> suggested INHERITS, or put LIKE after schema part.
>>>>>
>>>>> Best,
>>>>> Jark
>>>>>
>>>>> On Thu, 5 Mar 2020 at 12:05, Jingsong Li 
>>>>> wrote:
>>>>>
>>>>>> Thanks Dawid for starting this discussion.
>>>>>>
>>>>>> I like the "LIKE".
>>>>>>
>>>>>> 1.For "INHERITS", I think this is a good feature too, yes, ALTER
>>>>>> TABLE will
>>>>>> propagate any changes in column data definitions and check
>>>>>> constraints down
>>>>>> the inheritance hierarchy. A inherits B, A and B share every
>>>>>> things, they
>>>>>> have the same kafka topic. If modify schema of B, this means
>>>>>> underlying
>>>>>> kafka topic schema changed, so I think it is good to modify A too.
>>>>>> If this
>>>>>> for "ConfluentSchemaRegistryCatalog" mention by Jark, I think
>>>>>> sometimes
>>>>>> this is just we want.
>>>>>> But "LIKE" also very useful for many cases.
>>>>>>
>>>>>> 2.For LIKE statement in schema, I know two kinds of like syntax,
>>>>>> one is
>>>>>> MySQL/hive/sqlserver, the othe

Re: [VOTE] FLIP-110: Support LIKE clause in CREATE TABLE

2020-03-31 Thread Dawid Wysakowicz
Hi,

Just wanted to notify the voters that after a comment from Jingsong I
introduced a new like-option in the FLIP. Because it happened very short
after the vote started I will not cancel the vote (only Timo voted
before the changed).

Feel free to change your votes if you disagree. Sorry for the inconvenience.

Best,

Dawid

On 31/03/2020 09:43, Timo Walther wrote:
> +1 this will reduce manual schema work a lot!
>
> Thanks,
> Timo
>
> On 31.03.20 09:33, Dawid Wysakowicz wrote:
>> Hi all,
>>
>> I would like to start the vote for FLIP-110 [1], which is discussed and
>> reached a consensus in the discussion thread [2].
>>
>> The vote will be open until April 3rd (72h), unless there is an
>> objection or not enough votes.
>>
>> Best,
>>
>> Dawid
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-110%3A+Support+LIKE+clause+in+CREATE+TABLE
>>
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-110-Support-LIKE-clause-in-CREATE-TABLE-td38378.html/
>> /
>>
>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-84 Feedback Summary

2020-03-31 Thread Dawid Wysakowicz
Thank you Timo for the great summary! It covers (almost) all the topics.
Even though in the end we are not suggesting much changes to the current
state of FLIP I think it is important to lay out all possible use cases
so that we do not change the execution model every release.

There is one additional thing we discussed. Could we change the result
type of TableResult#collect to Iterator? Even though those
interfaces do not differ much. I think Iterator better describes that
the results might not be materialized on the client side, but can be
retrieved on a per record basis. The contract of the Iterable#iterator
is that it returns a new iterator each time, which effectively means we
can iterate the results multiple times. Iterating the results is not
possible when we don't retrieve all the results from the cluster at once.

I think we should also use Iterator for
TableEnvironment#executeMultilineSql(String statements):
Iterator.

Best,

Dawid

On 31/03/2020 19:27, Timo Walther wrote:
> Hi Godfrey,
>
> Aljoscha, Dawid, Klou, and I had another discussion around FLIP-84. In
> particular, we discussed how the current status of the FLIP and the
> future requirements around multiline statements, async/sync, collect()
> fit together.
>
> We also updated the FLIP-84 Feedback Summary document [1] with some
> use cases.
>
> We believe that we found a good solution that also fits to what is in
> the current FLIP. So no bigger changes necessary, which is great!
>
> Our findings were:
>
> 1. Async vs sync submission of Flink jobs:
>
> Having a blocking `execute()` in DataStream API was rather a mistake.
> Instead all submissions should be async because this allows supporting
> both modes if necessary. Thus, submitting all queries async sounds
> good to us. If users want to run a job sync, they can use the
> JobClient and wait for completion (or collect() in case of batch jobs).
>
> 2. Multi-statement execution:
>
> For the multi-statement execution, we don't see a contradication with
> the async execution behavior. We imagine a method like:
>
> TableEnvironment#executeMultilineSql(String statements):
> Iterable
>
> Where the `Iterator#next()` method would trigger the next statement
> submission. This allows a caller to decide synchronously when to
> submit statements async to the cluster. Thus, a service such as the
> SQL Client can handle the result of each statement individually and
> process statement by statement sequentially.
>
> 3. The role of TableResult and result retrieval in general
>
> `TableResult` is similar to `JobClient`. Instead of returning a
> `CompletableFuture` of something, it is a concrete util class where
> some methods have the behavior of completable future (e.g. collect(),
> print()) and some are already completed (getTableSchema(),
> getResultKind()).
>
> `StatementSet#execute()` returns a single `TableResult` because the
> order is undefined in a set and all statements have the same schema.
> Its `collect()` will return a row for each executed `INSERT INTO` in
> the order of statement definition.
>
> For simple `SELECT * FROM ...`, the query execution might block until
> `collect()` is called to pull buffered rows from the job (from
> socket/REST API what ever we will use in the future). We can say that
> a statement finished successfully, when the `collect#Iterator#hasNext`
> has returned false.
>
> I hope this summarizes our discussion @Dawid/Aljoscha/Klou?
>
> It would be great if we can add these findings to the FLIP before we
> start voting.
>
> One minor thing: some `execute()` methods still throw a checked
> exception; can we remove that from the FLIP? Also the above mentioned
> `Iterator#next()` would trigger an execution without throwing a
> checked exception.
>
> Thanks,
> Timo
>
> [1]
> https://docs.google.com/document/d/1ueLjQWRPdLTFB_TReAyhseAX-1N3j4WYWD0F02Uau0E/edit#
>
> On 31.03.20 06:28, godfrey he wrote:
>> Hi, Timo & Jark
>>
>> Thanks for your explanation.
>> Agree with you that async execution should always be async,
>> and sync execution scenario can be covered  by async execution.
>> It helps provide an unified entry point for batch and streaming.
>> I think we can also use sync execution for some testing.
>> So, I agree with you that we provide `executeSql` method and it's async
>> method.
>> If we want sync method in the future, we can add method named
>> `executeSqlSync`.
>>
>> I think we've reached an agreement. I will update the document, and
>> start
>> voting process.
>>
>> Best,
>> Godfrey
>>
>>
>> Jark Wu  于2020年3月31日周二 上午12:46写道:
>>
>>> Hi,
>>>
>>> I didn't follow the full discussion.
>>> But I share the same concern with Timo that streaming queries should
>>> always
>>> be async.
>>> Otherwise, I can image it will cause a lot of confusion and problems if
>>> users don't deeply keep the "sync" in mind (e.g. client hangs).
>>> Besides, the streaming mode is still the majority use cases of Flink
>>> and
>>> Flink SQL. We should put the usability at a high priorit

Re: [DISCUSS]FLIP-113: Support SQL and planner hints

2020-04-01 Thread Dawid Wysakowicz
Hi,

Few comments from my side:

1. Regarding the motivation:

I think the example with changing the update-mode is not a good one. In
the long term this should be done with EMIT CHANGELOG (discussed in
FLIP-105).

Nitpicking: I would mention that it is rather for debugging/ad-hoc
solution. I think this should not be a recommended way for production
use cases as it bypasses the Catalog, which should be the source of truth.

2. I could not understand how the additional options will be passed to
the TableSourceFactory. Could you elaborate a bit more on that? I see
there is a Context interface that gives the options. But cannot find a
way to get the context itself in the factory. Moreover I think it would
make more sense to have rather a push based approach here. Something
like applyOptions(ReadableConfig) method.

3. As for the concerns Jingsong raised in the voting thread. I think it
is not a big problem, but I agree this should be also described. I
disagree with "Connector don't know format information in TableFactory
before obtains real properties, so it can not list any format
`supportedHintOptions`".

When a factory is instantiated it has access to the CatalogTable,
therefore it has access to all the original properties. In turn it knows
the original format and can call FormatFactory#supportedHintOptions().

The only case when this would not work would be if we allow changing the
format of the Table (e.g. from avro to parquet), which does not sound
like a good idea to me. I think this feature should not end up as a way
to declare a whole table inline in a SQL query, but should rather be a
simple way for debugging queries. We should not end up with an extreme
example where we do:

|SELECT * FROM myTable /* OPTIONS('connector.type'='kafka', ...,
'format.type' = 'json', ) */|

4. SQL Hints syntax.

I think the k and v in the hint_item should be QUOTED_STRING (not sure
if it is equivalent to string_literal). I think we should not use
simple_identifier because this implies that we cannot use e.g. any SQL
keywords. Anyway it has nothing to do with identifiers. If I am not
mistaken it is also how the options in the CREATE statement are implemented.

What is the purpose of the remaining hint_item: hint_name(hint_opt [
,hint_opt ]*)? It is not discussed in the FLIP. Moreover I got a feeling
it does also suggests to support the whole Apache Calcite hint system
without specifying that explicitly. Is the intention of the FLIP to
support choosing e.g. JOIN strategies through hints already? If it is so
it should be mentioned in the FLIP, imo.

5. I think something does not work around the supportedHintOptions and
wildcards. How do you want to represent a wildcard key as a
ConfigOption? I am not sure about that, just a though, maybe it make
sense to have rather Set supportedHintOptionKeys()?

Best,

Dawid


signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-84 Feedback Summary

2020-04-01 Thread Dawid Wysakowicz
When considering the multi-line support I think it is helpful to start
with a use case in mind. In my opinion consumers of this method will be:

 1. sql-client
 2. third-part sql based platforms

@Godfrey As for the quit/source/... commands. I think those belong to
the responsibility of aforementioned. I think they should not be
understandable by the TableEnvironment. What would quit on a
TableEnvironment do? Moreover I think such commands should be prefixed
appropriately. I think it's a common practice to e.g. prefix those with
! or : to say they are meta commands of the tool rather than a query.

I also don't necessarily understand why platform users need to know the
kind of the query to use the proposed method. They should get the type
from the TableResult#ResultKind. If the ResultKind is SUCCESS, it was a
DCL/DDL. If SUCCESS_WITH_CONTENT it was a DML/DQL. If that's not enough
we can enrich the TableResult with more explicit kind of query, but so
far I don't see such a need.

@Kurt In those cases I would assume the developers want to present
results of the queries anyway. Moreover I think it is safe to assume
they can adhere to such a contract that the results must be iterated.

For direct users of TableEnvironment/Table API this method does not make
much sense anyway, in my opinion. I think we can rather safely assume in
this scenario they do not want to submit multiple queries at a single time.

Best,

Dawid


On 01/04/2020 15:07, Kurt Young wrote:
> One comment to `executeMultilineSql`, I'm afraid sometimes user might
> forget to
> iterate the returned iterators, e.g. user submits a bunch of DDLs and
> expect the
> framework will execute them one by one. But it didn't.
>
> Best,
> Kurt
>
>
> On Wed, Apr 1, 2020 at 5:10 PM Aljoscha Krettek  wrote:
>
>> Agreed to what Dawid and Timo said.
>>
>> To answer your question about multi line SQL: no, we don't think we need
>> this in Flink 1.11, we only wanted to make sure that the interfaces that
>> we now put in place will potentially allow this in the future.
>>
>> Best,
>> Aljoscha
>>
>> On 01.04.20 09:31, godfrey he wrote:
>>> Hi, Timo & Dawid,
>>>
>>> Thanks so much for the effort of `multiline statements supporting`,
>>> I have a few questions about this method:
>>>
>>> 1. users can well control the execution logic through the proposed method
>>>   if they know what the statements are (a statement is a DDL, a DML or
>>> others).
>>> but if a statement is from a file, that means users do not know what the
>>> statements are,
>>> the execution behavior is unclear.
>>> As a platform user, I think this method is hard to use, unless the
>> platform
>>> defines
>>> a set of rule about the statements order, such as: no select in the
>> middle,
>>> dml must be at tail of sql file (which may be the most case in product
>>> env).
>>> Otherwise the platform must parse the sql first, then know what the
>>> statements are.
>>> If do like that, the platform can handle all cases through `executeSql`
>> and
>>> `StatementSet`.
>>>
>>> 2. SQL client can't also use `executeMultilineSql` to supports multiline
>>> statements,
>>>   because there are some special commands introduced in SQL client,
>>> such as `quit`, `source`, `load jar` (not exist now, but maybe we need
>> this
>>> command
>>>   to support dynamic table source and udf).
>>> Does TableEnvironment also supports those commands?
>>>
>>> 3. btw, we must have this feature in release-1.11? I find there are few
>>> user cases
>>>   in the feedback document which behavior is unclear now.
>>>
>>> regarding to "change the return value from `Iterable>> `Iterator>> I couldn't agree more with this change. Just as Dawid mentioned
>>> "The contract of the Iterable#iterator is that it returns a new iterator
>>> each time,
>>>   which effectively means we can iterate the results multiple times.",
>>> we does not provide iterate the results multiple times.
>>> If we want do that, the client must buffer all results. but it's
>> impossible
>>> for streaming job.
>>>
>>> Best,
>>> Godfrey
>>>
>>> Dawid Wysakowicz  于2020年4月1日周三 上午3:14写道:
>>>
>>>> Thank you Timo for the great summary! It covers (almost) all the topics.
>>>> Even though in the end we are not suggesting much changes to the current
>>>> state of FLIP I think it is important to lay out all possible use cases
>>&

Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-04-02 Thread Dawid Wysakowicz
Generally +1

One slight concern I have is about the |SupportsProjectionPushDown.|I
don't necessarily understand how can we express projections with
TableSchema. It's unclear for me what happens when a type of a field
changes, fields are in a different order, when types do not match. How
do we express projection of a nested field with TableSchema?

I don't think this changes the core design presented in the FLIP,
therefore I'm fine with accepting the FLIP. I wanted to mention my
concerns, so that maybe we can adjust the passed around structures slightly.

Best,

Dawid
||

On 30/03/2020 14:42, Leonard Xu wrote:
> +1(non-binding)
>
> Best,
> Leonard Xu
>
>> 在 2020年3月30日,16:43,Jingsong Li  写道:
>>
>> +1
>>
>> Best,
>> Jingsong Lee
>>
>> On Mon, Mar 30, 2020 at 4:41 PM Kurt Young  wrote:
>>
>>> +1
>>>
>>> Best,
>>> Kurt
>>>
>>>
>>> On Mon, Mar 30, 2020 at 4:08 PM Benchao Li  wrote:
>>>
 +1 (non-binding)

 Jark Wu  于2020年3月30日周一 下午3:57写道:

> +1 from my side.
>
> Thanks Timo for driving this.
>
> Best,
> Jark
>
> On Mon, 30 Mar 2020 at 15:36, Timo Walther  wrote:
>
>> Hi all,
>>
>> I would like to start the vote for FLIP-95 [1], which is discussed
>>> and
>> reached a consensus in the discussion thread [2].
>>
>> The vote will be open until April 2nd (72h), unless there is an
>> objection or not enough votes.
>>
>> Thanks,
>> Timo
>>
>> [1]
>>
>>
>>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-95%3A+New+TableSource+and+TableSink+interfaces
>> [2]
>>
>>
>>> https://lists.apache.org/thread.html/r03cbce8996fd06c9b0406c9ddc0d271bd456f943f313b9261fa061f9%40%3Cdev.flink.apache.org%3E

 --

 Benchao Li
 School of Electronics Engineering and Computer Science, Peking University
 Tel:+86-15650713730
 Email: libenc...@gmail.com; libenc...@pku.edu.cn

>>
>> -- 
>> Best, Jingsong Lee


signature.asc
Description: OpenPGP digital signature


Re: Question about the ReadableConfigToConfigurationAdapter

2020-04-02 Thread Dawid Wysakowicz
Hi,

Yes this is a bug that is tracked here:
https://issues.apache.org/jira/browse/FLINK-16913. I am working on it
right now.

You should expect a fix very soon.

Best,

Dawid

On 02/04/2020 17:07, Till Rohrmann wrote:
> Hi Canbin,
>
> this looks indeed like a bug to me. I'm pulling in Dawid who worked on
> this part and might be able to tell us more about it.
>
> If he agrees, then I would suggest to open a JIRA issue and to fix it.
>
> Cheers,
> Till
>
> On Wed, Apr 1, 2020 at 11:26 AM Canbin Zheng  > wrote:
>
> Hi everyone,
>
> Recently I failed to run a Flink job when enabling
> RocksDBStateBackend on
> the branch master and 1.10.
>
> The exception stack trace is:
>
> The program finished with the following
> exception:org.apache.flink.client.program.ProgramInvocationException:
> The main method caused an error: The adapter does not support this
> method
>         at
> 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)
>         at
> 
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)
>         at
> org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:143)
>         at
> 
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:659)
>         at
> org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)
>         at
> 
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:890)
>         at
> 
> org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:963)
>         at
> 
> org.apache.flink.runtime.security.contexts.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
>         at
> org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:963)
> Caused by: java.lang.UnsupportedOperationException: The adapter does
> not support this method
>         at
> 
> org.apache.flink.configuration.ReadableConfigToConfigurationAdapter.getEnum(ReadableConfigToConfigurationAdapter.java:258)
>         at
> 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.(RocksDBStateBackend.java:336)
>         at
> 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackend.configure(RocksDBStateBackend.java:394)
>         at
> 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackendFactory.createFromConfig(RocksDBStateBackendFactory.java:47)
>         at
> 
> org.apache.flink.contrib.streaming.state.RocksDBStateBackendFactory.createFromConfig(RocksDBStateBackendFactory.java:32)
>         at
> 
> org.apache.flink.runtime.state.StateBackendLoader.loadStateBackendFromConfig(StateBackendLoader.java:154)
>         at
> 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.loadStateBackend(StreamExecutionEnvironment.java:792)
>         at
> 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.configure(StreamExecutionEnvironment.java:761)
>         at
> 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.(StreamExecutionEnvironment.java:217)
>         at
> 
> org.apache.flink.client.program.StreamContextEnvironment.(StreamContextEnvironment.java:53)
>         at
> 
> org.apache.flink.client.program.StreamContextEnvironment.lambda$setAsContext$2(StreamContextEnvironment.java:103)
>         at java.util.Optional.map(Optional.java:215)
>         at
> 
> org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.getExecutionEnvironment(StreamExecutionEnvironment.java:1882)
>         at
> 
> org.apache.flink.streaming.examples.socket.SocketWindowWordCount.main(SocketWindowWordCount.java:62)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at
> 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>         at
> 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>         at java.lang.reflect.Method.invoke(Method.java:498)
>         at
> 
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)
>         ... 8 more
>
> It seems that this is a BUG. Does anyone encounter such a problem?
>
> I am wondering why we introduce ReadableConfigToConfigurationAdapter
> to wrap the Configuration but leave many of the methods in it to throw
> UnsupportedOperationException that causes problems.
>
> Regards,
>
> Canbin Zheng
>


signature.asc
Description: OpenPGP digital signature


Re: [VOTE] FLIP-122: New Connector Property Keys for New Factory

2020-04-02 Thread Dawid Wysakowicz
+1

Best,

Dawid

On 02/04/2020 18:28, Timo Walther wrote:
> +1
>
> Thanks,
> Timo
>
> On 02.04.20 17:22, Jark Wu wrote:
>> Hi all,
>>
>> I would like to start the vote for FLIP-122 [1], which is discussed and
>> reached a consensus in the discussion thread [2].
>>
>> The vote will be open for at least 72h, unless there is an objection
>> or not
>> enough votes.
>>
>> Thanks,
>> Timo
>>
>> [1]
>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-122%3A+New+Connector+Property+Keys+for+New+Factory
>>
>> [2]
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-122-New-Connector-Property-Keys-for-New-Factory-td39462.html
>>
>>
>



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] FLIP-95: New TableSource and TableSink interfaces

2020-04-03 Thread Dawid Wysakowicz
Hi all,

Just to make it clear. I don't want to block the whole effort. I'm big
+1 on the whole document and -0 on using the TableSchema for a
projection pushdown.

My personal opinion was that TableSchema is misleading for Projection
pushdown and I still think that way. That's why I wanted to bring it up
to see if I am the only one. If that's the case, let's just proceed with
the TableSchema.

Best,

Dawid

On 02/04/2020 17:19, Jark Wu wrote:
> Hi Timo,
>
> I don't think source should work with `CatalogTableSchema`. So far, a table
> source doesn't need to know the logic information of computed column and
> watermark.
> IMO, we should provide a method to convert from `CatalogTableSchema` into
> `TableSchema` without computed columns in source factory,
> and a source should just hold the `TableSchema`.
>
> I agree doing the intersection/diff logic is trivial, but maybe we can
> provide utilities to do that? So that we can keep the interface clean.
>
>
> Best,
> Jark
>
>
> On Thu, 2 Apr 2020 at 20:17, Timo Walther  wrote:
>
>> Hi Jark,
>>
>> if catalogs use `CatalogTableSchema` in the future. The source would
>> internally also work with `CatalogTableSchema`. I'm fine with cleaning
>> up the `TableSchema` class but should a source deal with two different
>> schema classes then?
>>
>> Another problem that I see is that connectors usually need to perform
>> some index arithmetics. Dealing with TableSchema and additionally within
>> a field with DataType might be a bit inconvenient. A dedicated class
>> with utilities might be helpful such that not every source needs to
>> implement the same intersection/diff logic again.
>>
>> Regards,
>> Timo
>>
>>
>> On 02.04.20 14:06, Jark Wu wrote:
>>> Hi Dawid,
>>>
>>>> How to express projections with TableSchema?
>>> The TableSource holds the original TableSchema (i.e. from DDL) and the
>>> pushed TableSchema represents the schema after projection.
>>> Thus the table source can compare them to figure out changed field orders
>>> or not matched types.
>>> For most sources who maps physical storage by field names (e.g. jdbc,
>>> hbase, json) they can just simply apply the pushed TableSchema.
>>> But sources who maps by field indexes (e.g. csv), they need to figure out
>>> the projected indexes by comparing the original and projected schema.
>>> For example, the original schema is [a: String, b: Int, c: Timestamp],
>> and
>>> b is pruned, then the pushed schema is [a: String, c: Timestamp]. So the
>>> source can figure out index=1 is pruned.
>>>
>>>> How do we express projection of a nested field with TableSchema?
>>> This is the same to the above one. For example, the original schema is
>> [rk:
>>> String, f1 Row].
>>> If `f1.q1` is pruned, the pushed schema will be [rk: String, f1 Row>> Double>].
>>>
>>>> TableSchema might be used at too many different places for different
>>> responsibilities.
>>> Agree. We have recognized that a structure and builder for pure table
>>> schema is required in many places. But we mixed many concepts of catalog
>>> table schema in TableSchema.
>>> IIRC, in an offline discussion of FLIP-84, we want to introduce a new
>>> `CatalogTableSchema` to represent the schema part of a DDL,
>>> and remove all the watermark, computed column information from
>> TableSchema?
>>> Then `TableSchema` can continue to serve as a pure table schema and it
>>> stays in a good package.
>>>
>>> Best,
>>> Jark
>>>
>>>
>>>
>>>
>>> On Thu, 2 Apr 2020 at 19:39, Timo Walther  wrote:
>>>
>>>> Hi Dawid,
>>>>
>>>> thanks for your feedback. I agree with your concerns. I also observed
>>>> that TableSchema might be used at too many different places for
>>>> different responsibilities.
>>>>
>>>> How about we introduce a helper class for `SupportsProjectionPushDown`
>>>> and also `LookupTableSource#Context#getKeys()` to represent nested
>>>> structure of names. Data types, constraints, or computed columns are not
>>>> necessary at those locations.
>>>>
>>>> We can also add utility methods for connectors to this helper class
>>>> there to quickly figuring out differences between the original table
>>>> schema and the new one.
>>>>
>>>> SelectedFields {
>>>>
>>>>  private 

[DISCUSS] FLIP-124: Add open/close and Collector to (De)SerializationSchema

2020-04-06 Thread Dawid Wysakowicz
Hi devs,

When working on improving the Table API/SQL connectors we faced a few
shortcomings of the DeserializationSchema and SerializationSchema
interfaces. Similar features were also mentioned by other users in the
past. The shortcomings I would like to address with the FLIP include:

  * Emitting 0 to m records from the deserialization schema with per
partition watermarks
  o https://github.com/apache/flink/pull/3314#issuecomment-376237266
  o differentiate null value from no value
  o support for Debezium CDC format

(https://cwiki.apache.org/confluence/display/FLINK/FLIP-105%3A+Support+to+Interpret+and+Emit+Changelog+in+Flink+SQL)

  * A way to initialize the schema
  o establish external connections
  o generate code on startup
  o no need for lazy initialization

  * Access to metrics

[http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Custom-Metrics-outside-RichFunctions-td32282.html#a32329]

One important aspect I would like to hear your opinion on is how to
support the Collector interface in Kafka source. Of course if we agree
to add the Collector to the DeserializationSchema.

The FLIP can be found here:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148645988&src=contextnavpagetreemode

Looking forward to your feedback.

Best,

Dawid



signature.asc
Description: OpenPGP digital signature


[RESULT][VOTE] FLIP-110: Support LIKE clause in CREATE TABLE

2020-04-06 Thread Dawid Wysakowicz
Hi all,

The voting time for FLIP-110 has passed. I'm closing the vote now.

There were 5 +1 votes, 4 of which are binding:

- Timo (binding)

- Jingsong (binding)

- Danny (non-binding)

- Jark (binding)

- Aljosha (binding)


There were no disapproving votes.

Thus, FLIP-110 has been accepted.

Thanks everyone for joining the discussion and giving feedback!

Best,
Dawid




signature.asc
Description: OpenPGP digital signature


Re: [VOTE] FLIP-84: Improve & Refactor API of TableEnvironment & Table

2020-04-06 Thread Dawid Wysakowicz
+1

Best,

Dawid

On 07/04/2020 07:44, godfrey he wrote:
> Hi, Kurt
>
> yes. `TableEnvironement#executeSql` also could execute `SELECT` statement,
> which is similar to `Table#execute`.
> I add this to the document.
>
> Best,
> Godfrey
>
> Kurt Young  于2020年4月7日周二 上午11:52写道:
>
>> +1 (binding)
>>
>> The latest doc looks good to me. One minor comment is with the latest
>> changes, it seems also very easy
>> to support running SELECT query in TableEnvironement#executeSql method.
>> Will this also be supported?
>>
>> Best,
>> Kurt
>>
>>
>> On Mon, Apr 6, 2020 at 10:49 PM Timo Walther  wrote:
>>
>>> Thanks, for the update.
>>>
>>> +1 (binding) for this FLIP
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>> On 06.04.20 16:47, godfrey he wrote:
 Hi Timo,

 Sorry for late reply, and thanks for your correction. I have fixed the
>>> typo
 and updated the document.

 Best,
 Godfrey

 Timo Walther  于2020年4月6日周一 下午6:05写道:

> Hi Godfrey,
>
> did you see my remaining feedback in the discussion thread? We could
> finish this FLIP if this gets resolved.
>
> Thanks,
> Timo
>
> On 03.04.20 15:12, Terry Wang wrote:
>> +1 (non-binding)
>> Looks great to me, Thanks for driving on this.
>>
>> Best,
>> Terry Wang
>>
>>
>>
>>> 2020年4月3日 21:07,godfrey he  写道:
>>>
>>> Hi everyone,
>>>
>>> I'd like to start the vote of FLIP-84[1] again, which is discussed
>> and
>>> reached consensus in the discussion thread[2].
>>>
>>> The vote will be open for at least 72 hours. Unless there is an
> objection,
>>> I will try to close it by Apr 6, 2020 13:10 UTC if we have received
>>> sufficient votes.
>>>
>>>
>>> [1]
>>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>>> [2]
>>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
>>>
>>> Bests,
>>> Godfrey
>>>
>>> godfrey he  于2020年3月31日周二 下午8:42写道:
>>>
 Hi, Timo

 So sorry about that, I'm in a little hurry. Let's wait for 24h.

 Best,
 Godfrey

 Timo Walther  于2020年3月31日周二 下午5:26写道:

> -1
>
> The current discussion has not completed. The last comments were
>>> sent
> less than 24h ago.
>
> Let's wait a bit longer to collect feedback from all stakeholders.
>
> Thanks,
> Timo
>
> On 31.03.20 08:31, godfrey he wrote:
>> Hi everyone,
>>
>> I'd like to start the vote of FLIP-84[1] again, because we have
>>> some
>> feedbacks. The feedbacks are all about new introduced methods,
>> here
> is
> the
>> discussion thread [2].
>>
>> The vote will be open for at least 72 hours. Unless there is an
> objection,
>> I will try to close it by Apr 3, 2020 06:30 UTC if we have
>> received
>> sufficient votes.
>>
>>
>> [1]
>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=134745878
>> [2]
>>
>> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-84-Feedback-Summary-td39261.html
>>
>> Bests,
>> Godfrey
>>
>
>
>>>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-124: Add open/close and Collector to (De)SerializationSchema

2020-04-06 Thread Dawid Wysakowicz
Hi all,

@Timo I'm fine with OpenContext.

@Timo @Seth Sure we can combine all the parameters in a single object.
Will update the FLIP

@Jark I was aware of the implementation of SinkFunction, but it was a
conscious choice to not do it that way.

Personally I am against giving a default implementation to both the new
and old methods. This results in an interface that by default does
nothing or notifies the user only in the runtime, that he/she has not
implemented a method of the interface, which does not sound like a good
practice to me. Moreover I believe the method without a Collector will
still be the preferred method by many users. Plus it communicates
explicitly what is the minimal functionality required by the interface.
Nevertheless I am happy to hear other opinions.

@all I also prefer the buffering approach. Let's wait a day or two more
to see if others think differently.

Best,

Dawid

On 07/04/2020 06:11, Jark Wu wrote:
> Hi Dawid,
>
> Thanks for driving this. This is a blocker to support Debezium CDC format
> (FLIP-105). So big +1 from my side.
>
> Regarding to emitting multiple records and checkpointing, I'm also in favor
> of option#1: buffer all the records outside of the checkpoint lock.
> I think most of the use cases will not buffer larger data than
> it's deserialized byte[].
>
> I have a minor suggestion on DeserializationSchema: could we have a default
> implementation (maybe throw exception) for `T deserialize(byte[] message)`?
> I think this will not break compatibility, and users don't have to
> implement this deprecated interface if he/she wants to use the new
> collector interface.
> I think SinkFunction also did this in the same way: introduce a new invoke
> method with Context parameter, and give the old invoke method an
> empty implemention.
>
> Best,
> Jark
>
> On Mon, 6 Apr 2020 at 23:51, Seth Wiesman  wrote:
>
>> I would be in favor of buffering data outside of the checkpoint lock. In my
>> experience, serialization is always the biggest performance killer in user
>> code and I have a hard time believing in practice that anyone is going to
>> buffer so many records that is causes real memory concerns.
>>
>> To add to Timo's point,
>>
>> Statefun actually did that on its Kinesis ser/de interfaces[1,2].
>>
>> Seth
>>
>> [1]
>>
>> https://github.com/apache/flink-statefun/blob/master/statefun-kinesis-io/src/main/java/org/apache/flink/statefun/sdk/kinesis/ingress/KinesisIngressDeserializer.java
>> [2]
>>
>> https://github.com/apache/flink-statefun/blob/master/statefun-kinesis-io/src/main/java/org/apache/flink/statefun/sdk/kinesis/egress/KinesisEgressSerializer.java
>>
>>
>> On Mon, Apr 6, 2020 at 4:49 AM Timo Walther  wrote:
>>
>>> Hi Dawid,
>>>
>>> thanks for this FLIP. This solves a lot of issues with the current
>>> design for both the Flink contributors and users. +1 for this.
>>>
>>> Some minor suggestions from my side:
>>> - How about finding something shorter for `InitializationContext`? Maybe
>>> just `OpenContext`?
>>> - While introducing default methods for existing interfaces, shall we
>>> also create contexts for those methods? I see the following method in
>>> your FLIP and wonder if we can reduce the number of parameters while
>>> introducing a new method:
>>>
>>> deserialize(
>>>  byte[] recordValue,
>>>  String partitionKey,
>>>  String seqNum,
>>>  long approxArrivalTimestamp,
>>>  String stream,
>>>  String shardId,
>>>  Collector out)
>>>
>>> to:
>>>
>>> deserialize(
>>>  byte[] recordValue,
>>>  Context c,
>>>  Collector out)
>>>
>>> What do you think?
>>>
>>> Regards,
>>> Timo
>>>
>>>
>>>
>>> On 06.04.20 11:08, Dawid Wysakowicz wrote:
>>>> Hi devs,
>>>>
>>>> When working on improving the Table API/SQL connectors we faced a few
>>>> shortcomings of the DeserializationSchema and SerializationSchema
>>>> interfaces. Similar features were also mentioned by other users in the
>>>> past. The shortcomings I would like to address with the FLIP include:
>>>>
>>>>   * Emitting 0 to m records from the deserialization schema with per
>>>> partition watermarks
>>>>   o
>> https://github.com/apache/flink/pull/3314#issuecomment-376237266
>>>>   o differenti

Re: [ANNOUNCE] New Flink committer: Seth Wiesman

2020-04-06 Thread Dawid Wysakowicz
Congratulations Seth. Happy to have you in the community!

Best,

Dawid

On 07/04/2020 08:43, Dian Fu wrote:
> Congratulations!
>
>> 在 2020年4月7日,下午2:35,Konstantin Knauf  写道:
>>
>> Congratulations, Seth! Well deserved :)
>>
>> On Tue, Apr 7, 2020 at 8:33 AM Tzu-Li (Gordon) Tai 
>> wrote:
>>
>>> Hi everyone!
>>>
>>> On behalf of the PMC, I’m very happy to announce Seth Wiesman as a new
>>> Flink committer.
>>>
>>> Seth started contributing to the project in March 2017. You may know him
>>> from several contributions in the past.
>>> He had helped a lot with Flink documentation, and had contributed the State
>>> Processor API.
>>> Over the past few months, he has also helped tremendously in writing the
>>> majority of the
>>> Stateful Functions documentation.
>>>
>>> Please join me in congratulating Seth for becoming a Flink committer!
>>>
>>> Thanks,
>>> Gordon
>>>
>>
>> -- 
>>
>> Konstantin Knauf | Head of Product
>>
>> +49 160 91394525
>>
>>
>> Follow us @VervericaData Ververica 
>>
>>
>> --
>>
>> Join Flink Forward  - The Apache Flink
>> Conference
>>
>> Stream Processing | Event Driven | Real Time
>>
>> --
>>
>> Ververica GmbH | Invalidenstrasse 115, 10115 Berlin, Germany
>>
>> --
>> Ververica GmbH
>> Registered at Amtsgericht Charlottenburg: HRB 158244 B
>> Managing Directors: Timothy Alexander Steinert, Yip Park Tung Jason, Ji
>> (Tony) Cheng



signature.asc
Description: OpenPGP digital signature


Re: [ANNOUNCE] New Committers and PMC member

2020-04-06 Thread Dawid Wysakowicz
Thank you all for the support!

Best,

Dawid

On 02/04/2020 04:33, godfrey he wrote:
> Congratulations to all of you~
>
> Best,
> Godfrey
>
> Ismaël Mejía  于2020年4月2日周四 上午6:42写道:
>
>> Congrats everyone!
>>
>> On Thu, Apr 2, 2020 at 12:16 AM Rong Rong  wrote:
>>> Congratulations to all!!!
>>>
>>> --
>>> Rong
>>>
>>> On Wed, Apr 1, 2020 at 2:27 PM Thomas Weise  wrote:
>>>
 Congratulations!


 On Wed, Apr 1, 2020 at 9:31 AM Fabian Hueske 
>> wrote:
> Congrats everyone!
>
> Cheers, Fabian
>
> Am Mi., 1. Apr. 2020 um 18:26 Uhr schrieb Yun Tang >> :
>> Congratulations to all of you!
>>
>> Best
>> Yun Tang
>> 
>> From: Yang Wang 
>> Sent: Wednesday, April 1, 2020 22:28
>> To: dev 
>> Subject: Re: [ANNOUNCE] New Committers and PMC member
>>
>> Congratulations all.
>>
>> Best,
>> Yang
>>
>> Leonard Xu  于2020年4月1日周三 下午10:15写道:
>>
>>> Congratulations Konstantin, Dawid and Zhijiang!  Well deserved!
>>>
>>> Best,
>>> Leonard Xu
 在 2020年4月1日,21:22,Jark Wu  写道:

 Congratulations to you all!

 Best,
 Jark

 On Wed, 1 Apr 2020 at 20:33, Kurt Young 
>> wrote:
> Congratulations to you all!
>
> Best,
> Kurt
>
>
> On Wed, Apr 1, 2020 at 7:41 PM Danny Chan <
>> yuzhao@gmail.com>
>> wrote:
>> Congratulations!
>>
>> Best,
>> Danny Chan
>> 在 2020年4月1日 +0800 PM7:36,dev@flink.apache.org,写道:
>>> Congratulations!
>>>



signature.asc
Description: OpenPGP digital signature


Re: [VOTE] FLIP-113: Supports Dynamic Table Options for Flink SQL

2020-04-11 Thread Dawid Wysakowicz
+1,


Best,

Dawid

On 10/04/2020 12:07, Kurt Young wrote:
> +1
>
> Best,
> Kurt
>
>
> On Fri, Apr 10, 2020 at 6:01 PM Jark Wu  wrote:
>
>> +1 from my side (binding)
>>
>> Best,
>> Jark
>>
>> On Fri, 10 Apr 2020 at 17:03, Timo Walther  wrote:
>>
>>> +1 (binding)
>>>
>>> Thanks for the healthy discussion. I think this feature can be useful
>>> during the development of a pipeline.
>>>
>>> Regards,
>>> Timo
>>>
>>> On 10.04.20 03:34, Danny Chan wrote:
 Hi all,

 I would like to start the vote for FLIP-113 [1], which is discussed and
 reached a consensus in the discussion thread [2].

 The vote will be open until April 13nd (72h), unless there is an
 objection or not enough votes.

 Best,
 Danny Chan

 [1]

>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-113%3A+Supports+Dynamic+Table+Options+for+Flink+SQL
 [2]

>> https://lists.apache.org/thread.html/r94af5d3d97e76e7dd9df68cb0becc7ba74d15591a8fab84c72fa%40%3Cdev.flink.apache.org%3E
>>>



signature.asc
Description: OpenPGP digital signature


Re: [DISCUSS] FLIP-124: Add open/close and Collector to (De)SerializationSchema

2020-04-14 Thread Dawid Wysakowicz
Hi Xiaogang,

I very much agree with Jark's and Aljoscha's responses.


On 10/04/2020 17:35, Jark Wu wrote:
> Hi Xiaogang,
>
> I think this proposal doesn't conflict with your use case, you can still
> chain a ProcessFunction after a source which emits raw data.
> But I'm not in favor of chaining ProcessFunction after source, and we
> should avoid that, because:
>
> 1) For correctness, it is necessary to perform the watermark generation as
> early as possible in order to be close to the actual data
>  generation within a source's data partition. This is also the purpose of
> per-partition watermark and event-time alignment.
>  Many on going FLIPs (e.g. FLIP-27, FLIP-95) works a lot on this effort.
> Deseriazing records and generating watermark in chained
>  ProcessFunction makes it difficult to do per-partition watermark in the
> future.
> 2) In Flink SQL, a source should emit the deserialized row instead of raw
> data. Otherwise, users have to define raw byte[] as the
>  single column of the defined table, and parse them in queries, which is
> very inconvenient.
>
> Best,
> Jark
>
> On Fri, 10 Apr 2020 at 09:18, SHI Xiaogang  wrote:
>
>> Hi,
>>
>> I don't think the proposal is a good solution to the problems. I am in
>> favour of using a ProcessFunction chained to the source/sink function to
>> serialize/deserialize the records, instead of embedding (de)serialization
>> schema in source/sink function.
>>
>> Message packing is heavily used in our production environment to allow
>> compression and improve throughput. As buffered messages have to be
>> delivered when the time exceeds the limit, timers are also required in our
>> cases. I think it's also a common need for other users.
>>
>> In the this proposal, with more components added into the context, in the
>> end we will find the serialization/deserialization schema is just another
>> wrapper of ProcessFunction.
>>
>> Regards,
>> Xiaogang
>>
>> Aljoscha Krettek  于2020年4月7日周二 下午6:34写道:
>>
>>> On 07.04.20 08:45, Dawid Wysakowicz wrote:
>>>
>>>> @Jark I was aware of the implementation of SinkFunction, but it was a
>>>> conscious choice to not do it that way.
>>>>
>>>> Personally I am against giving a default implementation to both the new
>>>> and old methods. This results in an interface that by default does
>>>> nothing or notifies the user only in the runtime, that he/she has not
>>>> implemented a method of the interface, which does not sound like a good
>>>> practice to me. Moreover I believe the method without a Collector will
>>>> still be the preferred method by many users. Plus it communicates
>>>> explicitly what is the minimal functionality required by the interface.
>>>> Nevertheless I am happy to hear other opinions.
>>> Dawid and I discussed this before. I did the extension of the
>>> SinkFunction but by now I think it's better to do it this way, because
>>> otherwise you can implement the interface without implementing any
>> methods.
>>>> @all I also prefer the buffering approach. Let's wait a day or two more
>>>> to see if others think differently.
>>> I'm also in favour of buffering outside the lock.
>>>
>>> Also, +1 to this FLIP.
>>>
>>> Best,
>>> Aljoscha
>>>



signature.asc
Description: OpenPGP digital signature


[VOTE] FLIP-124: Add open/close and Collector to (De)SerializationSchema

2020-04-16 Thread Dawid Wysakowicz
Hi all,

I would like to start the vote for FLIP-124 [1], which is discussed and
reached a consensus in the discussion thread [2].

The vote will be open until April 20th, unless there is an objection or
not enough votes.

Best,

Dawid

[1]
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148645988

[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-124-Add-open-close-and-Collector-to-De-SerializationSchema-td39864.html/
/



signature.asc
Description: OpenPGP digital signature


  1   2   3   4   5   6   7   8   9   10   >