date:20210201

Re: [ANNOUNCE] Apache Flink 1.10.3 released

2021-02-01 Thread Matthias Pohl

Yes, thanks for taking over the release!

Best,
Matthias

On Mon, Feb 1, 2021 at 5:04 AM Zhu Zhu  wrote:

> Thanks Xintong for being the release manager and everyone who helped with
> the release!
>
> Cheers,
> Zhu
>
> Dian Fu  于2021年1月29日周五 下午5:56写道：
>
>> Thanks Xintong for driving this release!
>>
>> Regards,
>> Dian
>>
>> 在 2021年1月29日，下午5:24，Till Rohrmann  写道：
>>
>> Thanks Xintong for being our release manager. Well done!
>>
>> Cheers,
>> Till
>>
>> On Fri, Jan 29, 2021 at 9:50 AM Yang Wang  wrote:
>>
>>> Thanks Xintong for driving this release.
>>>
>>> Best,
>>> Yang
>>>
>>> Yu Li  于2021年1月29日周五 下午3:52写道：
>>>
 Thanks Xintong for being our release manager and everyone else who made
 the release possible!

 Best Regards,
 Yu


 On Fri, 29 Jan 2021 at 15:05, Xintong Song  wrote:

> The Apache Flink community is very happy to announce the release of
> Apache
> Flink 1.10.3, which is the third bugfix release for the Apache Flink
> 1.10
> series.
>
> Apache Flink® is an open-source stream processing framework for
> distributed, high-performing, always-available, and accurate data
> streaming
> applications.
>
> The release is available for download at:
> https://flink.apache.org/downloads.html
>
> Please check out the release blog post for an overview of the
> improvements
> for this bugfix release:
> https://flink.apache.org/news/2021/01/29/release-1.10.3.html
>
> The full release notes are available in Jira:
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12348668
>
> We would like to thank all contributors of the Apache Flink community
> who
> made this release possible!
>
> Regards,
> Xintong Song
>

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-01 Thread Rui Li

Thanks Jane for starting the discussion.

Regarding #1, I also prefer `USE MODULES` syntax. It can be interpreted as
"setting the current order of modules", which is similar to "setting the
current catalog" for `USE CATALOG`.

Regarding #3, I'm fine to map modules purely by name because I think it
satisfies all the use cases we have at hand. But I guess we need to make
sure we're backward compatible, i.e. users don't need to change their yaml
files to configure the modules.

On Mon, Feb 1, 2021 at 3:10 PM Jark Wu  wrote:

> Thanks Jane for the summary and starting the discussion in the mailing
> list.
>
> Here are my thoughts:
>
> 1) syntax to reorder modules
> I agree with Rui Li it would be quite useful if we can have some syntax to
> reorder modules.
> I slightly prefer `USE MODULES x, y, z` than `RELOAD MODULES x, y, z`,
> because USE has a more sense of effective and specifying ordering, than
> RELOAD.
> From my feeling, RELOAD just means we unregister and register x,y,z modules
> again,
> it sounds like other registered modules are still in use and in the order.
>
> 3) mapping modules purely by name
> This can definitely improve the usability of loading modules, because
> the 'type=' property
> looks really redundant. We can think of this as a syntax sugar that the
> default type value is the module name.
> And we can support to specify 'type=' property in the future to allow
> multiple modules for one module type.
>
> Besides, I would like to mention one more change, that the module name
> proposed in FLIP-68 is a string literal.
> But I think we are all on the same page to change it into a simple
> (non-compound) identifier.
>
> LOAD/UNLOAD MODULE 'core'
> ==>
> LOAD/UNLOAD MODULE core
>
>
> Best,
> Jark
>
>
> On Sat, 30 Jan 2021 at 04:00, Jane Chan  wrote:
>
> > Hi everyone,
> >
> > I would like to start a discussion on FLINK-21045 [1] about supporting
> > `LOAD MODULE` and `UNLOAD MODULE` SQL syntax. It's first proposed by
> > FLIP-68 [2] as following.
> >
> > -- load a module with the given name and append it to the end of the
> module
> > list
> > LOAD MODULE 'name' [WITH ('type'='xxx', 'prop'='myProp', ...)]
> >
> > --unload a module by name from the module list and other modules remain
> in
> > the same relative positions
> > UNLOAD MODULE 'name'
> >
> > After a round of discussion on the Jira ticket, it seems some unanswered
> > questions need more opinions and suggestions.
> >
> > 1. The way to redefine resolution order easily
> >
> > Rui Li suggested introducing `USE MODULES` and adding similar
> > functionality to the API because
> >
> > >  1) It's very tedious to unload old modules just to reorder them.
> >
> >  2) Users may not even know how to "re-load" an old module if it was not
> > > initially loaded by the user, e.g. don't know which type to use.
> >
> >
> > Jane Chan wondered that module is not like the catalog which has a
> > concept of namespace could specify, and `USE` sounds like a
> > mutual-exclusive concept.
> > Maybe `RELOAD MODULES` can express upgrading the priority of the
> loaded
> > module(s).
> >
> >
> > 2. `LOAD/UNLOAD MODULE` v.s. `CREATE/DROP MODULE` syntax
> > Jark Wu and Nicholas Jiang proposed to use `CREATE/DROP MODULE`
> instead
> > of `LOAD/UNLOAD MODULE` because
> >
> > >  1) From a pure SQL user's perspective, maybe `CREATE MODULE + USE
> > MODULE`
> > > is easier to use rather than `LOAD/UNLOAD`.
> > >  2) This will be very similar to what the catalog used now.
> >
> >
> >   Timo Walther would rather stick to the agreed design because
> > loading/unloading modules is a concept known from kernels etc.
> >
> > 3. Simplify the module design by mapping modules purely by name
> >
> > LOAD MODULE geo_utils
> > LOAD MODULE hive WITH ('version'='2.1')  -- no dedicated
> 'type='/'module='
> > but allow only 1 module to be loaded parameterized
> > UNLOAD hive
> > USE MODULES hive, core
> >
> >
> > Please find more details in the reference link. Looking forward to your
> > feedback.
> >
> > [1] https://issues.apache.org/jira/browse/FLINK-21045#
> > <
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules
> > >
> > [2]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules
> >
> > Best,
> > Jane
> >
>


-- 
Best regards!
Rui Li

[jira] [Created] (FLINK-21225) OverConvertRule does not consider distinct

2021-02-01 Thread Timo Walther (Jira)

Timo Walther created FLINK-21225:


 Summary: OverConvertRule does not consider distinct
 Key: FLINK-21225
 URL: https://issues.apache.org/jira/browse/FLINK-21225
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / Planner
Reporter: Timo Walther


We don't support OVER window distinct aggregates in Table API. Even though this 
is explicitly documented:

https://ci.apache.org/projects/flink/flink-docs-release-1.12/dev/table/tableApi.html#aggregations

{code}
// Distinct aggregation on over window
Table result = orders
.window(Over
.partitionBy($("a"))
.orderBy($("rowtime"))
.preceding(UNBOUNDED_RANGE)
.as("w"))
.select(
$("a"), $("b").avg().distinct().over($("w")),
$("b").max().over($("w")),
$("b").min().over($("w"))
);
{code}

The distinct flag is set to false in {{OverConvertRule}}.

See also
http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Unknown-call-expression-avg-amount-when-use-distinct-in-Flink-Thanks-td40905.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [VOTE] FLIP-159: Reactive Mode

2021-02-01 Thread Matthias Pohl

Thanks Robert and congratulations on your first FLIP.
+1 (non-binding)

Matthias

On Mon, Feb 1, 2021 at 4:22 AM Zhu Zhu  wrote:

> +1 (binding)
>
> Thanks,
> Zhu
>
> Till Rohrmann  于2021年1月29日周五 下午10:23写道：
>
> > LGTM. Thanks for the work Robert!
> >
> > +1 (binding)
> >
> > Cheers,
> > Till
> >
> > On Thu, Jan 28, 2021 at 11:27 AM Yang Wang 
> wrote:
> >
> > > Thanks Robert for your great work on this FLIP. This is really a big
> step
> > > to make Flink auto scalable.
> > >
> > > +1 (non-binding)
> > >
> > >
> > > Best,
> > > Yang
> > >
> > > Robert Metzger  于2021年1月28日周四 下午4:32写道：
> > >
> > > > @Yangze: That's something I overlooked. I should have waited. If
> > FLIP-160
> > > > is rejected or undergoes fundamental changes, I'll cancel this vote
> and
> > > > rewrite FLIP-159.
> > > > But I have the impression that there were no major concerns regarding
> > > > FLIP-160 so far.
> > > >
> > > > On Thu, Jan 28, 2021 at 8:46 AM Yangze Guo 
> wrote:
> > > >
> > > > > Thanks for driving this, Robert! LGTM.
> > > > >
> > > > > +1
> > > > >
> > > > > minor: Just a little confused about the program. It seems this
> > > > > proposal relies on the FLIP-160, which is still under discussion.
> > > > > Should we always vote for the prerequisite first?
> > > > >
> > > > > Best,
> > > > > Yangze Guo
> > > > >
> > > > >
> > > > > On Thu, Jan 28, 2021 at 3:27 PM Xintong Song <
> tonysong...@gmail.com>
> > > > > wrote:
> > > > > >
> > > > > > Thanks Robert. LGTM.
> > > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, Jan 28, 2021 at 2:50 PM Robert Metzger <
> > rmetz...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > since the discussion [1] about FLIP-159 [2] seems to have
> > reached a
> > > > > > > consensus, I'd like to start a formal vote for the FLIP.
> > > > > > >
> > > > > > > Please vote +1 to approve the FLIP, or -1 with a comment. The
> > vote
> > > > > will be
> > > > > > > open at least until Tuesday, Feb 2nd.
> > > > > > >
> > > > > > > Best,
> > > > > > > Robert
> > > > > > >
> > > > > > > [1]
> > > > > > >
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://lists.apache.org/thread.html/ra688faf9dca036500f0445c55671e70ba96c70f942afe650e9db8374%40%3Cdev.flink.apache.org%3E
> > > > > > > [2]
> > > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-159%3A+Reactive+Mode
> > > > > > >
> > > > >
> > > >
> > >
> >

Re: [VOTE] FLIP-160: Declarative scheduler

2021-02-01 Thread Matthias Pohl

+1 (non-binding)

Thanks,
Matthias

On Mon, Feb 1, 2021 at 4:22 AM Zhu Zhu  wrote:

> +1 (binding)
>
> Thanks,
> Zhu
>
> Yang Wang  于2021年2月1日周一 上午11:04写道：
>
> > +1 (non-binding)
> >
> > Best,
> > Yang
> >
> > Yangze Guo  于2021年2月1日周一 上午9:50写道：
> >
> > > +1 (non-binding)
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Sat, Jan 30, 2021 at 8:40 AM Xintong Song 
> > > wrote:
> > > >
> > > > +1 (binding)
> > > >
> > > > Thank you~
> > > >
> > > > Xintong Song
> > > >
> > > >
> > > >
> > > > On Fri, Jan 29, 2021 at 10:41 PM Robert Metzger  >
> > > wrote:
> > > >
> > > > > ... and thanks a lot for your work :) I'm really excited about
> > finally
> > > > > adding this feature to Flink!
> > > > >
> > > > >
> > > > > On Fri, Jan 29, 2021 at 3:40 PM Robert Metzger <
> rmetz...@apache.org>
> > > > > wrote:
> > > > >
> > > > > > +1 (binding)
> > > > > >
> > > > > > On Fri, Jan 29, 2021 at 3:23 PM Till Rohrmann <
> > trohrm...@apache.org>
> > > > > > wrote:
> > > > > >
> > > > > >> Hi all,
> > > > > >>
> > > > > >> since the discussion [1] about FLIP-160 [2] seems to have
> reached
> > a
> > > > > >> consensus, I'd like to start a formal vote for the FLIP.
> > > > > >>
> > > > > >> Please vote +1 to approve the FLIP, or -1 with a comment. The
> vote
> > > will
> > > > > be
> > > > > >> open at least until Wednesday, Feb 3rd.
> > > > > >>
> > > > > >> Cheers,
> > > > > >> Till
> > > > > >>
> > > > > >> [1]
> > > > > >>
> > > > > >>
> > > > >
> > >
> >
> https://lists.apache.org/thread.html/r604a01f739639e2a5f093fbe7894c172125530332747ecf6990a6ce4%40%3Cdev.flink.apache.org%3E
> > > > > >> [2]
> > > > > >>
> > > > > >>
> > > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-160%3A+Declarative+Scheduler
> > > > > >>
> > > > > >
> > > > >
> > >
> >

[jira] [Created] (FLINK-21226) Reintroduce TableColumn.of for backwards compatibility

2021-02-01 Thread Timo Walther (Jira)

Timo Walther created FLINK-21226:


 Summary: Reintroduce TableColumn.of for backwards compatibility
 Key: FLINK-21226
 URL: https://issues.apache.org/jira/browse/FLINK-21226
 Project: Flink
  Issue Type: Bug
  Components: Table SQL / API
Reporter: Timo Walther
Assignee: Timo Walther


FLINK-19341 accidentally dropped the {{TableColumn.of}} method that might be 
used frequently by downstream projects. We should reintroduce it for 1-2 
releases.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-01 Thread Timo Walther

Thanks for starting the discussion Jane.

I'm fine with using `USE` for reordering the modules.

I agree with Jark to not use a string literal for the module name but an
identifer.

However, to simplify the design I would completely remove the `type=`
property because having multiple ways of defining the same thing might
be confusing without providing additional benefits. I also think that
users should not be able to load the same module multiple times.

Regarding Rui's comment, the YAML file should not be affected by this
change and we can leave this part of the API untouched. We need to
update the `ModuleFactory` anyways because it still uses the deprecated
`TableFactory` class.

Regards,
Timo

On 01.02.21 09:18, Rui Li wrote:

Thanks Jane for starting the discussion.

Regarding #1, I also prefer `USE MODULES` syntax. It can be interpreted as
"setting the current order of modules", which is similar to "setting the
current catalog" for `USE CATALOG`.

Regarding #3, I'm fine to map modules purely by name because I think it
satisfies all the use cases we have at hand. But I guess we need to make
sure we're backward compatible, i.e. users don't need to change their yaml
files to configure the modules.

On Mon, Feb 1, 2021 at 3:10 PM Jark Wu wrote:

Thanks Jane for the summary and starting the discussion in the mailing
list.

Here are my thoughts:

1) syntax to reorder modules
I agree with Rui Li it would be quite useful if we can have some syntax to
reorder modules.
I slightly prefer `USE MODULES x, y, z` than `RELOAD MODULES x, y, z`,
because USE has a more sense of effective and specifying ordering, than
RELOAD.
From my feeling, RELOAD just means we unregister and register x,y,z modules
again,
it sounds like other registered modules are still in use and in the order.

3) mapping modules purely by name
This can definitely improve the usability of loading modules, because
the 'type=' property
looks really redundant. We can think of this as a syntax sugar that the
default type value is the module name.
And we can support to specify 'type=' property in the future to allow
multiple modules for one module type.

Besides, I would like to mention one more change, that the module name
proposed in FLIP-68 is a string literal.
But I think we are all on the same page to change it into a simple
(non-compound) identifier.

LOAD/UNLOAD MODULE 'core'
==>
LOAD/UNLOAD MODULE core

Best,
Jark

On Sat, 30 Jan 2021 at 04:00, Jane Chan wrote:

Hi everyone,

I would like to start a discussion on FLINK-21045 [1] about supporting
`LOAD MODULE` and `UNLOAD MODULE` SQL syntax. It's first proposed by
FLIP-68 [2] as following.

-- load a module with the given name and append it to the end of the

module

list
LOAD MODULE 'name' [WITH ('type'='xxx', 'prop'='myProp', ...)]

--unload a module by name from the module list and other modules remain

the same relative positions
UNLOAD MODULE 'name'

After a round of discussion on the Jira ticket, it seems some unanswered
questions need more opinions and suggestions.

1. The way to redefine resolution order easily

Rui Li suggested introducing `USE MODULES` and adding similar
functionality to the API because

1) It's very tedious to unload old modules just to reorder them.

2) Users may not even know how to "re-load" an old module if it was not

initially loaded by the user, e.g. don't know which type to use.

Jane Chan wondered that module is not like the catalog which has a
concept of namespace could specify, and `USE` sounds like a
mutual-exclusive concept.
Maybe `RELOAD MODULES` can express upgrading the priority of the

loaded

module(s).

2. `LOAD/UNLOAD MODULE` v.s. `CREATE/DROP MODULE` syntax
Jark Wu and Nicholas Jiang proposed to use `CREATE/DROP MODULE`

instead

of `LOAD/UNLOAD MODULE` because

1) From a pure SQL user's perspective, maybe `CREATE MODULE + USE

MODULE`

is easier to use rather than `LOAD/UNLOAD`.
2) This will be very similar to what the catalog used now.

Timo Walther would rather stick to the agreed design because
loading/unloading modules is a concept known from kernels etc.

3. Simplify the module design by mapping modules purely by name

LOAD MODULE geo_utils
LOAD MODULE hive WITH ('version'='2.1') -- no dedicated

'type='/'module='

but allow only 1 module to be loaded parameterized
UNLOAD hive
USE MODULES hive, core

Please find more details in the reference link. Looking forward to your
feedback.

[1] https://issues.apache.org/jira/browse/FLINK-21045#
<

https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules

[2]

https://cwiki.apache.org/confluence/display/FLINK/FLIP-68%3A+Extend+Core+Table+System+with+Pluggable+Modules

Best,
Jane

Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior

2021-02-01 Thread Timo Walther

Parts of the FLIP can already be implemented without a completed voting, 
e.g. there is no doubt that we should support TIME(9).


However, I don't see a benefit of reworking the time functions to rework 
them again later. If we lock the time on query-start the implementation 
of the previsouly mentioned functions will be completely different.


Regards,
Timo


On 01.02.21 02:37, Kurt Young wrote:

I also prefer to not expand this FLIP further, but we could open a
discussion thread
right after this FLIP being accepted and start coding & reviewing. Make
technique
discussion and coding more pipelined will improve efficiency.

Best,
Kurt


On Sat, Jan 30, 2021 at 3:47 PM Leonard Xu  wrote:


Hi, Timo


I do think that this topic must be part of the FLIP as well. Esp. if the

FLIP has the title "time function behavior" and this is clearly a
behavioral aspect. We are performing a heavy refactoring of the SQL query
semantics in Flink here which will affect a lot of users. We cannot rework
the time functions a third time after this.

I checked a couple of other vendors. It seems that they all lock the

timestamp when the query is started. And as you said, in this case both
mature (Oracle) and less mature systems (Hive, MySQL) have the same
behavior.

FLIP-162> “These problems come from the fact that lots of time-related
functions like PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and
CURRENT_TIMESTAMP are returning time values based on UTC+0 time zone."
The motivation of  FLIP-162 is to correct the wrong time-related function
value which caused by timezone. And after our discussed before, we found
it's related to the function return type compared to SQL standard and other
vendors and thus we proposed make the function return type also consistent.
This is the exact meaning of the FLIP  title and that the FLIP plans to do.

But for the function materialization mechanism, we didn't consider yet as
a part of our plan because we need to fix the timezone and function type
issues no matter we modify the function materialization mechanism in the
future or not.
So I think it's not belong to this FLIP scope.

It will have been a great work if we can fix current FLIP's 7 proposals
well, we don't want to expand the scope again Eps it's not part of our
plan.

What do you think? @Timo

And what’s others' thoughts?  @Jark @Kurt

Best,
Leonard





Flink should not differ. I fear that we have to adopt this behavior as

well to call us standard compliant. Otherwise it will also not be possible
to have Hive compatibility with proper semantics. It could lead to
unintended behavior.


I see two options for this topic:

1) Clearly distinguish between query-start and processing time

MySQL offers NOW() and SYSDATE() to distinguish the two semantics. We

could run all the previously discussed functions that have a meaning in
other systems in query-start time and use a different name for processing
time. `SYS_TIMESTAMP`, `SYS_DATE`, `SYS_TIME`, `SYS_LOCALTIMESTAMP`,
`SYS_LOCALDATE`, `SYS_LOCALTIME`?


2) Introduce a config option

We are non-compliant by default and allow typical batch behavior if

needed via a config option. But batch/stream unification should not mean
that we disable certain unification aspects by default.


What do you think?

Regards,
Timo

On 28.01.21 16:51, Leonard Xu wrote:

Hi, Timo

I'm sorry that I need to open another discussion thread befoe voting

but I think we should also discuss this in this FLIP before it pops up at a
later stage.


How do we want our time functions to behave in long running queries?

It’s okay to open this thread. Although I don’t want to consider the

function value materialization in this FLIP scope,  I could try explain
something.

See also:


https://stackoverflow.com/questions/5522656/sql-now-in-long-running-query


I think this was never discussed thoroughly. Actually

CURRENT_TIMESTAMP/NOW/LOCALTIMESTAMP should have slightly different
semantics than PROCTIME(). What it is our current behavior? Are we
materializing those time values during planning?

Currently CURRENT_TIMESTAMP/NOW/LOCALTIMESTAMP  keeps same behavior in

both Batch and Stream world,  the function value is materialized for per
record not the query start(plan phase).

For  PROCTIME(), it also keeps same behavior  in both Batch and Stream

world, in fact we just supported PROCTIME() in Batch last week[1].

In one word, we keep same semantics/behavior for Batch and Stream.

Esp. long running batch queries might suffer from inconsistencies

here. When a timestamp is produced by one operator using CURRENT_TIMESTAMP
and a different one might filter relating to CURRENT_TIMESTAMP.

It’s a good question, and I've found some users have asked simillar

questions in user/user-zh mail-list,  given a fact that many Batch systems
like Hive/Presto using the value of query start, but it’s not suitable for
Stream engine, for example user will use CURRENT_TIMESTAMP to define event
time.

As a unified Batch/Stream SQL engine, keep sa

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-01 Thread Jane Chan

Hi Jark and Rui,

Thanks for the discussions.

Regarding #1, I'm fine with `USE MODULES` syntax, and

> It can be interpreted as "setting the current order of modules", which is
> similar to "setting the current catalog" for `USE CATALOG`.
>
I would like to confirm that the unmentioned modules remain in the same
relative order? E.g., if there are three loaded modules `X`, `Y`, `Z`, then
`USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`.

Regarding #3, I'm fine with mapping modules purely by name, and I think
Jark raised a good point on making the module name a simple identifier
instead of a string literal. For backward compatibility, since we haven't
supported this syntax yet, the affected users are those who defined modules
in the YAML configuration file. Maybe we can eliminate the 'type' from the
'requiredContext' to make it optional. Thus the proposed mapping mechanism
could use the module name to lookup the suitable factory,  and in the
meanwhile updating documentation to encourage users to simplify their YAML
configuration. And in the long run, we can deprecate the 'type'.

Best,
Jane

On Mon, Feb 1, 2021 at 4:19 PM Rui Li  wrote:

> Thanks Jane for starting the discussion.
>
> Regarding #1, I also prefer `USE MODULES` syntax. It can be interpreted as
> "setting the current order of modules", which is similar to "setting the
> current catalog" for `USE CATALOG`.
>
> Regarding #3, I'm fine to map modules purely by name because I think it
> satisfies all the use cases we have at hand. But I guess we need to make
> sure we're backward compatible, i.e. users don't need to change their yaml
> files to configure the modules.
>
> On Mon, Feb 1, 2021 at 3:10 PM Jark Wu  wrote:
>
> > Thanks Jane for the summary and starting the discussion in the mailing
> > list.
> >
> > Here are my thoughts:
> >
> > 1) syntax to reorder modules
> > I agree with Rui Li it would be quite useful if we can have some syntax
> to
> > reorder modules.
> > I slightly prefer `USE MODULES x, y, z` than `RELOAD MODULES x, y, z`,
> > because USE has a more sense of effective and specifying ordering, than
> > RELOAD.
> > From my feeling, RELOAD just means we unregister and register x,y,z
> modules
> > again,
> > it sounds like other registered modules are still in use and in the
> order.
> >
> > 3) mapping modules purely by name
> > This can definitely improve the usability of loading modules, because
> > the 'type=' property
> > looks really redundant. We can think of this as a syntax sugar that the
> > default type value is the module name.
> > And we can support to specify 'type=' property in the future to allow
> > multiple modules for one module type.
> >
> > Besides, I would like to mention one more change, that the module name
> > proposed in FLIP-68 is a string literal.
> > But I think we are all on the same page to change it into a simple
> > (non-compound) identifier.
> >
> > LOAD/UNLOAD MODULE 'core'
> > ==>
> > LOAD/UNLOAD MODULE core
> >
> >
> > Best,
> > Jark
> >
> >
> > On Sat, 30 Jan 2021 at 04:00, Jane Chan  wrote:
> >
> > > Hi everyone,
> > >
> > > I would like to start a discussion on FLINK-21045 [1] about supporting
> > > `LOAD MODULE` and `UNLOAD MODULE` SQL syntax. It's first proposed by
> > > FLIP-68 [2] as following.
> > >
> > > -- load a module with the given name and append it to the end of the
> > module
> > > list
> > > LOAD MODULE 'name' [WITH ('type'='xxx', 'prop'='myProp', ...)]
> > >
> > > --unload a module by name from the module list and other modules remain
> > in
> > > the same relative positions
> > > UNLOAD MODULE 'name'
> > >
> > > After a round of discussion on the Jira ticket, it seems some
> unanswered
> > > questions need more opinions and suggestions.
> > >
> > > 1. The way to redefine resolution order easily
> > >
> > > Rui Li suggested introducing `USE MODULES` and adding similar
> > > functionality to the API because
> > >
> > > >  1) It's very tedious to unload old modules just to reorder them.
> > >
> > >  2) Users may not even know how to "re-load" an old module if it was
> not
> > > > initially loaded by the user, e.g. don't know which type to use.
> > >
> > >
> > > Jane Chan wondered that module is not like the catalog which has a
> > > concept of namespace could specify, and `USE` sounds like a
> > > mutual-exclusive concept.
> > > Maybe `RELOAD MODULES` can express upgrading the priority of the
> > loaded
> > > module(s).
> > >
> > >
> > > 2. `LOAD/UNLOAD MODULE` v.s. `CREATE/DROP MODULE` syntax
> > > Jark Wu and Nicholas Jiang proposed to use `CREATE/DROP MODULE`
> > instead
> > > of `LOAD/UNLOAD MODULE` because
> > >
> > > >  1) From a pure SQL user's perspective, maybe `CREATE MODULE + USE
> > > MODULE`
> > > > is easier to use rather than `LOAD/UNLOAD`.
> > > >  2) This will be very similar to what the catalog used now.
> > >
> > >
> > >   Timo Walther would rather stick to the agreed design because
> > > loading/unloading modules is a concept known from

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-01 Thread Timo Walther

IMHO I would rather unload the not mentioned modules. The statement 
expresses `USE` that implicilty implies that the other modules are "not 
used". What do others think?


Regards,
Timo


On 01.02.21 11:28, Jane Chan wrote:

Hi Jark and Rui,

Thanks for the discussions.

Regarding #1, I'm fine with `USE MODULES` syntax, and


It can be interpreted as "setting the current order of modules", which is
similar to "setting the current catalog" for `USE CATALOG`.


I would like to confirm that the unmentioned modules remain in the same
relative order? E.g., if there are three loaded modules `X`, `Y`, `Z`, then
`USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`.

Regarding #3, I'm fine with mapping modules purely by name, and I think
Jark raised a good point on making the module name a simple identifier
instead of a string literal. For backward compatibility, since we haven't
supported this syntax yet, the affected users are those who defined modules
in the YAML configuration file. Maybe we can eliminate the 'type' from the
'requiredContext' to make it optional. Thus the proposed mapping mechanism
could use the module name to lookup the suitable factory,  and in the
meanwhile updating documentation to encourage users to simplify their YAML
configuration. And in the long run, we can deprecate the 'type'.

Best,
Jane

On Mon, Feb 1, 2021 at 4:19 PM Rui Li  wrote:


Thanks Jane for starting the discussion.

Regarding #1, I also prefer `USE MODULES` syntax. It can be interpreted as
"setting the current order of modules", which is similar to "setting the
current catalog" for `USE CATALOG`.

Regarding #3, I'm fine to map modules purely by name because I think it
satisfies all the use cases we have at hand. But I guess we need to make
sure we're backward compatible, i.e. users don't need to change their yaml
files to configure the modules.

On Mon, Feb 1, 2021 at 3:10 PM Jark Wu  wrote:


Thanks Jane for the summary and starting the discussion in the mailing
list.

Here are my thoughts:

1) syntax to reorder modules
I agree with Rui Li it would be quite useful if we can have some syntax

to

reorder modules.
I slightly prefer `USE MODULES x, y, z` than `RELOAD MODULES x, y, z`,
because USE has a more sense of effective and specifying ordering, than
RELOAD.
 From my feeling, RELOAD just means we unregister and register x,y,z

modules

again,
it sounds like other registered modules are still in use and in the

order.


3) mapping modules purely by name
This can definitely improve the usability of loading modules, because
the 'type=' property
looks really redundant. We can think of this as a syntax sugar that the
default type value is the module name.
And we can support to specify 'type=' property in the future to allow
multiple modules for one module type.

Besides, I would like to mention one more change, that the module name
proposed in FLIP-68 is a string literal.
But I think we are all on the same page to change it into a simple
(non-compound) identifier.

LOAD/UNLOAD MODULE 'core'
==>
LOAD/UNLOAD MODULE core


Best,
Jark


On Sat, 30 Jan 2021 at 04:00, Jane Chan  wrote:


Hi everyone,

I would like to start a discussion on FLINK-21045 [1] about supporting
`LOAD MODULE` and `UNLOAD MODULE` SQL syntax. It's first proposed by
FLIP-68 [2] as following.

-- load a module with the given name and append it to the end of the

module

list
LOAD MODULE 'name' [WITH ('type'='xxx', 'prop'='myProp', ...)]

--unload a module by name from the module list and other modules remain

in

the same relative positions
UNLOAD MODULE 'name'

After a round of discussion on the Jira ticket, it seems some

unanswered

questions need more opinions and suggestions.

1. The way to redefine resolution order easily

 Rui Li suggested introducing `USE MODULES` and adding similar
functionality to the API because


  1) It's very tedious to unload old modules just to reorder them.


  2) Users may not even know how to "re-load" an old module if it was

not

initially loaded by the user, e.g. don't know which type to use.



 Jane Chan wondered that module is not like the catalog which has a
concept of namespace could specify, and `USE` sounds like a
mutual-exclusive concept.
 Maybe `RELOAD MODULES` can express upgrading the priority of the

loaded

module(s).


2. `LOAD/UNLOAD MODULE` v.s. `CREATE/DROP MODULE` syntax
 Jark Wu and Nicholas Jiang proposed to use `CREATE/DROP MODULE`

instead

of `LOAD/UNLOAD MODULE` because


  1) From a pure SQL user's perspective, maybe `CREATE MODULE + USE

MODULE`

is easier to use rather than `LOAD/UNLOAD`.
  2) This will be very similar to what the catalog used now.



   Timo Walther would rather stick to the agreed design because
loading/unloading modules is a concept known from kernels etc.

3. Simplify the module design by mapping modules purely by name

LOAD MODULE geo_utils
LOAD MODULE hive WITH ('version'='2.1')  -- no dedicated

'type='/'module='

but allow only 1 module to be lo

[jira] [Created] (FLINK-21227) Fixed: Upgrade Version com.google.protobuf:protoc:3.5.1:exe to 3.7.0 for (power)ppc64le support

2021-02-01 Thread Bivas (Jira)

Bivas created FLINK-21227:
-

 Summary: Fixed: Upgrade Version 
com.google.protobuf:protoc:3.5.1:exe to 3.7.0 for (power)ppc64le support
 Key: FLINK-21227
 URL: https://issues.apache.org/jira/browse/FLINK-21227
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile)
Reporter: Bivas


com.google.protobuf:*protoc:3.5.1:exe* was not supported by power. Later 
versions released multi-arch support including power(ppc64le).Using 
*protoc:3.7.0:exe* able to build and E2E tests passed successfully.

https://github.com/bivasda1/flink/blob/master/flink-formats/flink-parquet/pom.xml#L253



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21228) [Kinesis][Producer] Deadlock in KinesisProducer

2021-02-01 Thread Danny Cranmer (Jira)

Danny Cranmer created FLINK-21228:
-

 Summary: [Kinesis][Producer] Deadlock in KinesisProducer
 Key: FLINK-21228
 URL: https://issues.apache.org/jira/browse/FLINK-21228
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kinesis
Affects Versions: 1.12.1
Reporter: Danny Cranmer


*Background*
Application sink failed and resulted in:
- Indefinite backpressure being applied
- Exception never thrown causing job to fail

Application running with:

{code:java}
flinkKinesisProducer.setQueueLimit(1);
flinkKinesisProducer.setFailOnError(true); 
{code}

- {{KinesisProducer}} is waiting for queue to empty before sending the next 
record 
([code|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java#L303])
- KPL ran out of memory, which raised an error, however this is processed async 
([code|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java#L275])
- {{KinesisProducer}} would have rethrown the error and restarted the job, 
however operator stuck in an infinite loop enforcing the queue limit (which 
never clears) 
([code|https://github.com/apache/flink/blob/master/flink-connectors/flink-connector-kinesis/src/main/java/org/apache/flink/streaming/connectors/kinesis/FlinkKinesisProducer.java#L306])

*Proposal*
- {{checkAndPropagateAsyncError()}} while enforcing queue limit in 
{{enforceQueueLimit()}} to break deadlock




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-01 Thread Jane Chan

Hi Timo, thanks for the discussion.

It seems to reach an agreement regarding #3 that <1> Module name should
better be a simple identifier rather than a string literal. <2> Property
`type` is redundant and should be removed, and mapping will rely on the
module name because loading a module multiple times just using a different
module name doesn't make much sense. <3> We should migrate to the newer API
rather than the deprecated `TableFactory` class.

Regarding #1, I think the point lies in whether changing the resolution
order implies an `unload` operation explicitly (i.e., users could sense
it). What do others think?

Best,
Jane

On Mon, Feb 1, 2021 at 6:41 PM Timo Walther  wrote:

> IMHO I would rather unload the not mentioned modules. The statement
> expresses `USE` that implicilty implies that the other modules are "not
> used". What do others think?
>
> Regards,
> Timo
>
>
> On 01.02.21 11:28, Jane Chan wrote:
> > Hi Jark and Rui,
> >
> > Thanks for the discussions.
> >
> > Regarding #1, I'm fine with `USE MODULES` syntax, and
> >
> >> It can be interpreted as "setting the current order of modules", which
> is
> >> similar to "setting the current catalog" for `USE CATALOG`.
> >>
> > I would like to confirm that the unmentioned modules remain in the same
> > relative order? E.g., if there are three loaded modules `X`, `Y`, `Z`,
> then
> > `USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`.
> >
> > Regarding #3, I'm fine with mapping modules purely by name, and I think
> > Jark raised a good point on making the module name a simple identifier
> > instead of a string literal. For backward compatibility, since we haven't
> > supported this syntax yet, the affected users are those who defined
> modules
> > in the YAML configuration file. Maybe we can eliminate the 'type' from
> the
> > 'requiredContext' to make it optional. Thus the proposed mapping
> mechanism
> > could use the module name to lookup the suitable factory,  and in the
> > meanwhile updating documentation to encourage users to simplify their
> YAML
> > configuration. And in the long run, we can deprecate the 'type'.
> >
> > Best,
> > Jane
> >
> > On Mon, Feb 1, 2021 at 4:19 PM Rui Li  wrote:
> >
> >> Thanks Jane for starting the discussion.
> >>
> >> Regarding #1, I also prefer `USE MODULES` syntax. It can be interpreted
> as
> >> "setting the current order of modules", which is similar to "setting the
> >> current catalog" for `USE CATALOG`.
> >>
> >> Regarding #3, I'm fine to map modules purely by name because I think it
> >> satisfies all the use cases we have at hand. But I guess we need to make
> >> sure we're backward compatible, i.e. users don't need to change their
> yaml
> >> files to configure the modules.
> >>
> >> On Mon, Feb 1, 2021 at 3:10 PM Jark Wu  wrote:
> >>
> >>> Thanks Jane for the summary and starting the discussion in the mailing
> >>> list.
> >>>
> >>> Here are my thoughts:
> >>>
> >>> 1) syntax to reorder modules
> >>> I agree with Rui Li it would be quite useful if we can have some syntax
> >> to
> >>> reorder modules.
> >>> I slightly prefer `USE MODULES x, y, z` than `RELOAD MODULES x, y, z`,
> >>> because USE has a more sense of effective and specifying ordering, than
> >>> RELOAD.
> >>>  From my feeling, RELOAD just means we unregister and register x,y,z
> >> modules
> >>> again,
> >>> it sounds like other registered modules are still in use and in the
> >> order.
> >>>
> >>> 3) mapping modules purely by name
> >>> This can definitely improve the usability of loading modules, because
> >>> the 'type=' property
> >>> looks really redundant. We can think of this as a syntax sugar that the
> >>> default type value is the module name.
> >>> And we can support to specify 'type=' property in the future to allow
> >>> multiple modules for one module type.
> >>>
> >>> Besides, I would like to mention one more change, that the module name
> >>> proposed in FLIP-68 is a string literal.
> >>> But I think we are all on the same page to change it into a simple
> >>> (non-compound) identifier.
> >>>
> >>> LOAD/UNLOAD MODULE 'core'
> >>> ==>
> >>> LOAD/UNLOAD MODULE core
> >>>
> >>>
> >>> Best,
> >>> Jark
> >>>
> >>>
> >>> On Sat, 30 Jan 2021 at 04:00, Jane Chan  wrote:
> >>>
>  Hi everyone,
> 
>  I would like to start a discussion on FLINK-21045 [1] about supporting
>  `LOAD MODULE` and `UNLOAD MODULE` SQL syntax. It's first proposed by
>  FLIP-68 [2] as following.
> 
>  -- load a module with the given name and append it to the end of the
> >>> module
>  list
>  LOAD MODULE 'name' [WITH ('type'='xxx', 'prop'='myProp', ...)]
> 
>  --unload a module by name from the module list and other modules
> remain
> >>> in
>  the same relative positions
>  UNLOAD MODULE 'name'
> 
>  After a round of discussion on the Jira ticket, it seems some
> >> unanswered
>  questions need more opinions and suggestions.
> 
>  1. The way to redefine resolution order easi

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-01 Thread Jark Wu

I agree with Timo that the USE implies the specified modules are in use in
the specified order and others are not used.
This would be easier to know what's the result list and order after the USE
statement.
That means: if current modules in order are x, y, z. And `USE MODULES z, y`
means current modules in order are z, y.

But I would like to not unload the unmentioned modules in the USE
statement. Because it seems strange that USE
will implicitly remove modules. In the above example, the user may type the
wrong modules list using USE by mistake
 and would like to declare the list again, the user has to create the
module again with some properties he may don't know. Therefore, I propose
the USE statement just specifies the current module lists and doesn't
unload modules.
Besides that, we may need a new syntax to list all the modules including
not used but loaded.
We can introduce SHOW FULL MODULES for this purpose with an additional
`used` column.

For example:

Flink SQL> list modules:
---
| modules |
---
| x   |
| y   |
| z   |
---
Flink SQL> USE MODULES z, y;
Flink SQL> show modules:
---
| modules |
---
| z   |
| y   |
---
Flink SQL> show FULL modules;
---
| modules |  used |
---
| z   | true  |
| y   | true  |
| x   | false |
---
Flink SQL> USE MODULES z, y, x;
Flink SQL> show modules;
---
| modules |
---
| z   |
| y   |
| x   |
---

What do you think?

Best,
Jark

On Mon, 1 Feb 2021 at 19:02, Jane Chan  wrote:

> Hi Timo, thanks for the discussion.
>
> It seems to reach an agreement regarding #3 that <1> Module name should
> better be a simple identifier rather than a string literal. <2> Property
> `type` is redundant and should be removed, and mapping will rely on the
> module name because loading a module multiple times just using a different
> module name doesn't make much sense. <3> We should migrate to the newer API
> rather than the deprecated `TableFactory` class.
>
> Regarding #1, I think the point lies in whether changing the resolution
> order implies an `unload` operation explicitly (i.e., users could sense
> it). What do others think?
>
> Best,
> Jane
>
> On Mon, Feb 1, 2021 at 6:41 PM Timo Walther  wrote:
>
> > IMHO I would rather unload the not mentioned modules. The statement
> > expresses `USE` that implicilty implies that the other modules are "not
> > used". What do others think?
> >
> > Regards,
> > Timo
> >
> >
> > On 01.02.21 11:28, Jane Chan wrote:
> > > Hi Jark and Rui,
> > >
> > > Thanks for the discussions.
> > >
> > > Regarding #1, I'm fine with `USE MODULES` syntax, and
> > >
> > >> It can be interpreted as "setting the current order of modules", which
> > is
> > >> similar to "setting the current catalog" for `USE CATALOG`.
> > >>
> > > I would like to confirm that the unmentioned modules remain in the same
> > > relative order? E.g., if there are three loaded modules `X`, `Y`, `Z`,
> > then
> > > `USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`.
> > >
> > > Regarding #3, I'm fine with mapping modules purely by name, and I think
> > > Jark raised a good point on making the module name a simple identifier
> > > instead of a string literal. For backward compatibility, since we
> haven't
> > > supported this syntax yet, the affected users are those who defined
> > modules
> > > in the YAML configuration file. Maybe we can eliminate the 'type' from
> > the
> > > 'requiredContext' to make it optional. Thus the proposed mapping
> > mechanism
> > > could use the module name to lookup the suitable factory,  and in the
> > > meanwhile updating documentation to encourage users to simplify their
> > YAML
> > > configuration. And in the long run, we can deprecate the 'type'.
> > >
> > > Best,
> > > Jane
> > >
> > > On Mon, Feb 1, 2021 at 4:19 PM Rui Li  wrote:
> > >
> > >> Thanks Jane for starting the discussion.
> > >>
> > >> Regarding #1, I also prefer `USE MODULES` syntax. It can be
> interpreted
> > as
> > >> "setting the current order of modules", which is similar to "setting
> the
> > >> current catalog" for `USE CATALOG`.
> > >>
> > >> Regarding #3, I'm fine to map modules purely by name because I think
> it
> > >> satisfies all the use cases we have at hand. But I guess we need to
> make
> > >> sure we're backward compatible, i.e. users don't need to change their
> > yaml
> > >> files to configure the modules.
> > >>
> > >> On Mon, Feb 1, 2021 at 3:10 PM Jark Wu  wrote:
> > >>
> > >>> Thanks Jane for the summary and starting the discussion in the
> mailing
> > >>> list.
> > >>>
> > >>> Here are my thoughts:
> > >>>
> > >>> 1) syntax to reorder modules
> > >>> I agree with Rui Li it would be quite useful if we can have some
> syntax
> > >> to
> > >>> reorder modules.
> > >>> I slightly prefer `USE MODULES x, y, z` than `RELOAD MODULES x, y,
> z`,
> > >>> because USE has a more sense of effective and specif

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-01 Thread Rui Li

If `USE MODULES` implies unloading modules that are not listed, does it
also imply loading modules that are not previously loaded, especially since
we're mapping modules by name now?

On Mon, Feb 1, 2021 at 8:20 PM Jark Wu  wrote:

> I agree with Timo that the USE implies the specified modules are in use in
> the specified order and others are not used.
> This would be easier to know what's the result list and order after the USE
> statement.
> That means: if current modules in order are x, y, z. And `USE MODULES z, y`
> means current modules in order are z, y.
>
> But I would like to not unload the unmentioned modules in the USE
> statement. Because it seems strange that USE
> will implicitly remove modules. In the above example, the user may type the
> wrong modules list using USE by mistake
>  and would like to declare the list again, the user has to create the
> module again with some properties he may don't know. Therefore, I propose
> the USE statement just specifies the current module lists and doesn't
> unload modules.
> Besides that, we may need a new syntax to list all the modules including
> not used but loaded.
> We can introduce SHOW FULL MODULES for this purpose with an additional
> `used` column.
>
> For example:
>
> Flink SQL> list modules:
> ---
> | modules |
> ---
> | x   |
> | y   |
> | z   |
> ---
> Flink SQL> USE MODULES z, y;
> Flink SQL> show modules:
> ---
> | modules |
> ---
> | z   |
> | y   |
> ---
> Flink SQL> show FULL modules;
> ---
> | modules |  used |
> ---
> | z   | true  |
> | y   | true  |
> | x   | false |
> ---
> Flink SQL> USE MODULES z, y, x;
> Flink SQL> show modules;
> ---
> | modules |
> ---
> | z   |
> | y   |
> | x   |
> ---
>
> What do you think?
>
> Best,
> Jark
>
> On Mon, 1 Feb 2021 at 19:02, Jane Chan  wrote:
>
> > Hi Timo, thanks for the discussion.
> >
> > It seems to reach an agreement regarding #3 that <1> Module name should
> > better be a simple identifier rather than a string literal. <2> Property
> > `type` is redundant and should be removed, and mapping will rely on the
> > module name because loading a module multiple times just using a
> different
> > module name doesn't make much sense. <3> We should migrate to the newer
> API
> > rather than the deprecated `TableFactory` class.
> >
> > Regarding #1, I think the point lies in whether changing the resolution
> > order implies an `unload` operation explicitly (i.e., users could sense
> > it). What do others think?
> >
> > Best,
> > Jane
> >
> > On Mon, Feb 1, 2021 at 6:41 PM Timo Walther  wrote:
> >
> > > IMHO I would rather unload the not mentioned modules. The statement
> > > expresses `USE` that implicilty implies that the other modules are "not
> > > used". What do others think?
> > >
> > > Regards,
> > > Timo
> > >
> > >
> > > On 01.02.21 11:28, Jane Chan wrote:
> > > > Hi Jark and Rui,
> > > >
> > > > Thanks for the discussions.
> > > >
> > > > Regarding #1, I'm fine with `USE MODULES` syntax, and
> > > >
> > > >> It can be interpreted as "setting the current order of modules",
> which
> > > is
> > > >> similar to "setting the current catalog" for `USE CATALOG`.
> > > >>
> > > > I would like to confirm that the unmentioned modules remain in the
> same
> > > > relative order? E.g., if there are three loaded modules `X`, `Y`,
> `Z`,
> > > then
> > > > `USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`.
> > > >
> > > > Regarding #3, I'm fine with mapping modules purely by name, and I
> think
> > > > Jark raised a good point on making the module name a simple
> identifier
> > > > instead of a string literal. For backward compatibility, since we
> > haven't
> > > > supported this syntax yet, the affected users are those who defined
> > > modules
> > > > in the YAML configuration file. Maybe we can eliminate the 'type'
> from
> > > the
> > > > 'requiredContext' to make it optional. Thus the proposed mapping
> > > mechanism
> > > > could use the module name to lookup the suitable factory,  and in the
> > > > meanwhile updating documentation to encourage users to simplify their
> > > YAML
> > > > configuration. And in the long run, we can deprecate the 'type'.
> > > >
> > > > Best,
> > > > Jane
> > > >
> > > > On Mon, Feb 1, 2021 at 4:19 PM Rui Li  wrote:
> > > >
> > > >> Thanks Jane for starting the discussion.
> > > >>
> > > >> Regarding #1, I also prefer `USE MODULES` syntax. It can be
> > interpreted
> > > as
> > > >> "setting the current order of modules", which is similar to "setting
> > the
> > > >> current catalog" for `USE CATALOG`.
> > > >>
> > > >> Regarding #3, I'm fine to map modules purely by name because I think
> > it
> > > >> satisfies all the use cases we have at hand. But I guess we need to
> > make
> > > >> sure we're backward compatible, i.e. users don't need to change
> their
> > > yaml
> > > >> files to co

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-01 Thread Timo Walther


+1 to Jark's proposal

I like the difference between just loading and actually enabling these 
modules.


@Rui: I would use the same behavior as catalogs here. You cannot `USE` a 
catalog without creating it before.


Another question is whether a LOAD operation also adds the module to the 
enabled list by default?


Regards,
Timo

On 01.02.21 13:52, Rui Li wrote:

If `USE MODULES` implies unloading modules that are not listed, does it
also imply loading modules that are not previously loaded, especially since
we're mapping modules by name now?

On Mon, Feb 1, 2021 at 8:20 PM Jark Wu  wrote:


I agree with Timo that the USE implies the specified modules are in use in
the specified order and others are not used.
This would be easier to know what's the result list and order after the USE
statement.
That means: if current modules in order are x, y, z. And `USE MODULES z, y`
means current modules in order are z, y.

But I would like to not unload the unmentioned modules in the USE
statement. Because it seems strange that USE
will implicitly remove modules. In the above example, the user may type the
wrong modules list using USE by mistake
  and would like to declare the list again, the user has to create the
module again with some properties he may don't know. Therefore, I propose
the USE statement just specifies the current module lists and doesn't
unload modules.
Besides that, we may need a new syntax to list all the modules including
not used but loaded.
We can introduce SHOW FULL MODULES for this purpose with an additional
`used` column.

For example:

Flink SQL> list modules:
---
| modules |
---
| x   |
| y   |
| z   |
---
Flink SQL> USE MODULES z, y;
Flink SQL> show modules:
---
| modules |
---
| z   |
| y   |
---
Flink SQL> show FULL modules;
---
| modules |  used |
---
| z   | true  |
| y   | true  |
| x   | false |
---
Flink SQL> USE MODULES z, y, x;
Flink SQL> show modules;
---
| modules |
---
| z   |
| y   |
| x   |
---

What do you think?

Best,
Jark

On Mon, 1 Feb 2021 at 19:02, Jane Chan  wrote:


Hi Timo, thanks for the discussion.

It seems to reach an agreement regarding #3 that <1> Module name should
better be a simple identifier rather than a string literal. <2> Property
`type` is redundant and should be removed, and mapping will rely on the
module name because loading a module multiple times just using a

different

module name doesn't make much sense. <3> We should migrate to the newer

API

rather than the deprecated `TableFactory` class.

Regarding #1, I think the point lies in whether changing the resolution
order implies an `unload` operation explicitly (i.e., users could sense
it). What do others think?

Best,
Jane

On Mon, Feb 1, 2021 at 6:41 PM Timo Walther  wrote:


IMHO I would rather unload the not mentioned modules. The statement
expresses `USE` that implicilty implies that the other modules are "not
used". What do others think?

Regards,
Timo


On 01.02.21 11:28, Jane Chan wrote:

Hi Jark and Rui,

Thanks for the discussions.

Regarding #1, I'm fine with `USE MODULES` syntax, and


It can be interpreted as "setting the current order of modules",

which

is

similar to "setting the current catalog" for `USE CATALOG`.


I would like to confirm that the unmentioned modules remain in the

same

relative order? E.g., if there are three loaded modules `X`, `Y`,

`Z`,

then

`USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`.

Regarding #3, I'm fine with mapping modules purely by name, and I

think

Jark raised a good point on making the module name a simple

identifier

instead of a string literal. For backward compatibility, since we

haven't

supported this syntax yet, the affected users are those who defined

modules

in the YAML configuration file. Maybe we can eliminate the 'type'

from

the

'requiredContext' to make it optional. Thus the proposed mapping

mechanism

could use the module name to lookup the suitable factory,  and in the
meanwhile updating documentation to encourage users to simplify their

YAML

configuration. And in the long run, we can deprecate the 'type'.

Best,
Jane

On Mon, Feb 1, 2021 at 4:19 PM Rui Li  wrote:


Thanks Jane for starting the discussion.

Regarding #1, I also prefer `USE MODULES` syntax. It can be

interpreted

as

"setting the current order of modules", which is similar to "setting

the

current catalog" for `USE CATALOG`.

Regarding #3, I'm fine to map modules purely by name because I think

it

satisfies all the use cases we have at hand. But I guess we need to

make

sure we're backward compatible, i.e. users don't need to change

their

yaml

files to configure the modules.

On Mon, Feb 1, 2021 at 3:10 PM Jark Wu  wrote:


Thanks Jane for the summary and starting the discussion in the

mailing

list.

Here are my thoughts:

1) syntax to reorder modules
I agree wit

Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior

2021-02-01 Thread Leonard Xu

Hi, all

I’ve discussed with @Timo @Jark about the time function evaluation further. We 
reach a consensus that we’d better address the time function 
evaluation(function value materialization) in this FLIP as well.

We’re fine with introducing an option table.exec.time-function-evaluation to 
control the materialize time point of time function value. The time function 
includes
LOCALTIME
LOCALTIMESTAMP
CURRENT_DATE
CURRENT_TIME
CURRENT_TIMESTAMP
NOW()
The default value of table.exec.time-function-evaluation is 'per-record', which 
means Flink evaluates the function value per record, we recommend users config 
this option value for their streaming pipe lines.
Another valid option value is ’query-start’, which means Flink evaluates the 
function value at the query start, we recommend users config this option value 
for their batch pipelines.
In the future, more valid evaluation option value like ‘auto' may be supported 
if there’re new requirements, e.g： support ‘auto’ option which evaluates time 
function value per-record in streaming mode and evaluates
time function value at query start in batch mode.

Alternative1:
Introduce function like CURRENT_TIMESTAMP2/CURRENT_TIMESTAMP_NOW which 
evaluates function value at query start. This may confuse users a bit that we 
provide two similar functions but with different return value. 

Alternative2:  
   Do not introduce any configuration/function, control the function 
evaluation by pipeline execution mode. This may produce different result when 
user use their  streaming pipeline sql to run a batch pipeline(e.g 
backfilling), and user also 
can not control these function behavior. 

How do you think ? 

Thanks,
Leonard

> 在 2021年2月1日，18:23，Timo Walther  写道：
> 
> Parts of the FLIP can already be implemented without a completed voting, e.g. 
> there is no doubt that we should support TIME(9).
> 
> However, I don't see a benefit of reworking the time functions to rework them 
> again later. If we lock the time on query-start the implementation of the 
> previsouly mentioned functions will be completely different.
> 
> Regards,
> Timo
> 
> 
> On 01.02.21 02:37, Kurt Young wrote:
>> I also prefer to not expand this FLIP further, but we could open a
>> discussion thread
>> right after this FLIP being accepted and start coding & reviewing. Make
>> technique
>> discussion and coding more pipelined will improve efficiency.
>> Best,
>> Kurt
>> On Sat, Jan 30, 2021 at 3:47 PM Leonard Xu  wrote:
>>> Hi, Timo
>>> 
 I do think that this topic must be part of the FLIP as well. Esp. if the
>>> FLIP has the title "time function behavior" and this is clearly a
>>> behavioral aspect. We are performing a heavy refactoring of the SQL query
>>> semantics in Flink here which will affect a lot of users. We cannot rework
>>> the time functions a third time after this.
 I checked a couple of other vendors. It seems that they all lock the
>>> timestamp when the query is started. And as you said, in this case both
>>> mature (Oracle) and less mature systems (Hive, MySQL) have the same
>>> behavior.
>>> 
>>> FLIP-162> “These problems come from the fact that lots of time-related
>>> functions like PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and
>>> CURRENT_TIMESTAMP are returning time values based on UTC+0 time zone."
>>> The motivation of  FLIP-162 is to correct the wrong time-related function
>>> value which caused by timezone. And after our discussed before, we found
>>> it's related to the function return type compared to SQL standard and other
>>> vendors and thus we proposed make the function return type also consistent.
>>> This is the exact meaning of the FLIP  title and that the FLIP plans to do.
>>> 
>>> But for the function materialization mechanism, we didn't consider yet as
>>> a part of our plan because we need to fix the timezone and function type
>>> issues no matter we modify the function materialization mechanism in the
>>> future or not.
>>> So I think it's not belong to this FLIP scope.
>>> 
>>> It will have been a great work if we can fix current FLIP's 7 proposals
>>> well, we don't want to expand the scope again Eps it's not part of our
>>> plan.
>>> 
>>> What do you think? @Timo
>>> 
>>> And what’s others' thoughts?  @Jark @Kurt
>>> 
>>> Best,
>>> Leonard
>>> 
>>> 
>>> 
>>> 
 Flink should not differ. I fear that we have to adopt this behavior as
>>> well to call us standard compliant. Otherwise it will also not be possible
>>> to have Hive compatibility with proper semantics. It could lead to
>>> unintended behavior.

 I see two options for this topic:

 1) Clearly distinguish between query-start and processing time

 MySQL offers NOW() and SYSDATE() to distinguish the two semantics. We
>>> could run all the previously discussed functions that have a meaning in
>>> other systems in query-start time and use a different name for processing
>>> time. `SYS_TIMESTAMP`, `SYS_DATE`, `SYS_TIME`, `SYS_LOC

[jira] [Created] (FLINK-21229) Support ssl connection with schema registry format

2021-02-01 Thread Dawid Wysakowicz (Jira)

Dawid Wysakowicz created FLINK-21229:


 Summary: Support ssl connection with schema registry format
 Key: FLINK-21229
 URL: https://issues.apache.org/jira/browse/FLINK-21229
 Project: Flink
  Issue Type: Improvement
  Components: Formats (JSON, Avro, Parquet, ORC, SequenceFile), Table 
SQL / Ecosystem
Reporter: Dawid Wysakowicz


There is no way to pass an ssl configuration to the Confluent schema registry 
format. We should be able to pass:

{code}
- schema.registry.ssl.truststore.location
- schema.registry.ssl.truststore.password
- schema.registry.ssl.keystore.location
- schema.registry.ssl.keystore.password
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21230) Add protobuf wrapper types for the StateFun SDK types.

2021-02-01 Thread Igal Shilman (Jira)

Igal Shilman created FLINK-21230:


 Summary: Add protobuf wrapper types for the StateFun SDK types.
 Key: FLINK-21230
 URL: https://issues.apache.org/jira/browse/FLINK-21230
 Project: Flink
  Issue Type: Task
  Components: Stateful Functions
Reporter: Igal Shilman


Add primitive wrapper types to be used for messaging and state as part of the 
new type system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior

2021-02-01 Thread Timo Walther


Hi Leonard,

thanks for considering this issue as well. +1 for the proposed config 
option. Let's start a voting thread once the FLIP document has been 
updated if there are no other concerns?


Thanks,
Timo


On 01.02.21 15:07, Leonard Xu wrote:

Hi, all

I’ve discussed with @Timo @Jark about the time function evaluation further. We 
reach a consensus that we’d better address the time function 
evaluation(function value materialization) in this FLIP as well.

We’re fine with introducing an option table.exec.time-function-evaluation to 
control the materialize time point of time function value. The time function 
includes
LOCALTIME
LOCALTIMESTAMP
CURRENT_DATE
CURRENT_TIME
CURRENT_TIMESTAMP
NOW()
The default value of table.exec.time-function-evaluation is 'per-record', which 
means Flink evaluates the function value per record, we recommend users config 
this option value for their streaming pipe lines.
Another valid option value is ’query-start’, which means Flink evaluates the 
function value at the query start, we recommend users config this option value 
for their batch pipelines.
In the future, more valid evaluation option value like ‘auto' may be supported 
if there’re new requirements, e.g： support ‘auto’ option which evaluates time 
function value per-record in streaming mode and evaluates
time function value at query start in batch mode.

Alternative1:
Introduce function like CURRENT_TIMESTAMP2/CURRENT_TIMESTAMP_NOW which 
evaluates function value at query start. This may confuse users a bit that we 
provide two similar functions but with different return value. 

Alternative2:
Do not introduce any configuration/function, control the function 
evaluation by pipeline execution mode. This may produce different result when 
user use their  streaming pipeline sql to run a batch pipeline(e.g 
backfilling), and user also
can not control these function behavior.


How do you think ?

Thanks,
Leonard
  


在 2021年2月1日，18:23，Timo Walther  写道：

Parts of the FLIP can already be implemented without a completed voting, e.g. 
there is no doubt that we should support TIME(9).

However, I don't see a benefit of reworking the time functions to rework them 
again later. If we lock the time on query-start the implementation of the 
previsouly mentioned functions will be completely different.

Regards,
Timo


On 01.02.21 02:37, Kurt Young wrote:

I also prefer to not expand this FLIP further, but we could open a
discussion thread
right after this FLIP being accepted and start coding & reviewing. Make
technique
discussion and coding more pipelined will improve efficiency.
Best,
Kurt
On Sat, Jan 30, 2021 at 3:47 PM Leonard Xu  wrote:

Hi, Timo


I do think that this topic must be part of the FLIP as well. Esp. if the

FLIP has the title "time function behavior" and this is clearly a
behavioral aspect. We are performing a heavy refactoring of the SQL query
semantics in Flink here which will affect a lot of users. We cannot rework
the time functions a third time after this.

I checked a couple of other vendors. It seems that they all lock the

timestamp when the query is started. And as you said, in this case both
mature (Oracle) and less mature systems (Hive, MySQL) have the same
behavior.

FLIP-162> “These problems come from the fact that lots of time-related
functions like PROCTIME(), NOW(), CURRENT_DATE, CURRENT_TIME and
CURRENT_TIMESTAMP are returning time values based on UTC+0 time zone."
The motivation of  FLIP-162 is to correct the wrong time-related function
value which caused by timezone. And after our discussed before, we found
it's related to the function return type compared to SQL standard and other
vendors and thus we proposed make the function return type also consistent.
This is the exact meaning of the FLIP  title and that the FLIP plans to do.

But for the function materialization mechanism, we didn't consider yet as
a part of our plan because we need to fix the timezone and function type
issues no matter we modify the function materialization mechanism in the
future or not.
So I think it's not belong to this FLIP scope.

It will have been a great work if we can fix current FLIP's 7 proposals
well, we don't want to expand the scope again Eps it's not part of our
plan.

What do you think? @Timo

And what’s others' thoughts?  @Jark @Kurt

Best,
Leonard





Flink should not differ. I fear that we have to adopt this behavior as

well to call us standard compliant. Otherwise it will also not be possible
to have Hive compatibility with proper semantics. It could lead to
unintended behavior.


I see two options for this topic:

1) Clearly distinguish between query-start and processing time

MySQL offers NOW() and SYSDATE() to distinguish the two semantics. We

could run all the previously discussed functions that have a meaning in
other systems in query-start time and use a different name for processing
time. `SYS_TIMESTAMP`, `SYS_DATE`, `SYS_TIME`, `SYS_LOCALTIMESTAMP`,
`SYS_LOCAL

[jira] [Created] (FLINK-21231) add "SHOW VIEWS" to SQL client

2021-02-01 Thread tim yu (Jira)

tim yu created FLINK-21231:
--

 Summary: add "SHOW VIEWS" to SQL client
 Key: FLINK-21231
 URL: https://issues.apache.org/jira/browse/FLINK-21231
 Project: Flink
  Issue Type: New Feature
Reporter: tim yu


SQL client cannot run "SHOW VIEWS" statement now, We should add the "SHOW 
VIEWS" implement to it.

   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-01 Thread Jane Chan

+1 to Jark's proposal

 To make it clearer,  will `module#getFunctionDefinition()` return empty
suppose the module is loaded but not enabled?

Best,
Jane

On Mon, Feb 1, 2021 at 10:02 PM Timo Walther  wrote:

> +1 to Jark's proposal
>
> I like the difference between just loading and actually enabling these
> modules.
>
> @Rui: I would use the same behavior as catalogs here. You cannot `USE` a
> catalog without creating it before.
>
> Another question is whether a LOAD operation also adds the module to the
> enabled list by default?
>
> Regards,
> Timo
>
> On 01.02.21 13:52, Rui Li wrote:
> > If `USE MODULES` implies unloading modules that are not listed, does it
> > also imply loading modules that are not previously loaded, especially
> since
> > we're mapping modules by name now?
> >
> > On Mon, Feb 1, 2021 at 8:20 PM Jark Wu  wrote:
> >
> >> I agree with Timo that the USE implies the specified modules are in use
> in
> >> the specified order and others are not used.
> >> This would be easier to know what's the result list and order after the
> USE
> >> statement.
> >> That means: if current modules in order are x, y, z. And `USE MODULES
> z, y`
> >> means current modules in order are z, y.
> >>
> >> But I would like to not unload the unmentioned modules in the USE
> >> statement. Because it seems strange that USE
> >> will implicitly remove modules. In the above example, the user may type
> the
> >> wrong modules list using USE by mistake
> >>   and would like to declare the list again, the user has to create the
> >> module again with some properties he may don't know. Therefore, I
> propose
> >> the USE statement just specifies the current module lists and doesn't
> >> unload modules.
> >> Besides that, we may need a new syntax to list all the modules including
> >> not used but loaded.
> >> We can introduce SHOW FULL MODULES for this purpose with an additional
> >> `used` column.
> >>
> >> For example:
> >>
> >> Flink SQL> list modules:
> >> ---
> >> | modules |
> >> ---
> >> | x   |
> >> | y   |
> >> | z   |
> >> ---
> >> Flink SQL> USE MODULES z, y;
> >> Flink SQL> show modules:
> >> ---
> >> | modules |
> >> ---
> >> | z   |
> >> | y   |
> >> ---
> >> Flink SQL> show FULL modules;
> >> ---
> >> | modules |  used |
> >> ---
> >> | z   | true  |
> >> | y   | true  |
> >> | x   | false |
> >> ---
> >> Flink SQL> USE MODULES z, y, x;
> >> Flink SQL> show modules;
> >> ---
> >> | modules |
> >> ---
> >> | z   |
> >> | y   |
> >> | x   |
> >> ---
> >>
> >> What do you think?
> >>
> >> Best,
> >> Jark
> >>
> >> On Mon, 1 Feb 2021 at 19:02, Jane Chan  wrote:
> >>
> >>> Hi Timo, thanks for the discussion.
> >>>
> >>> It seems to reach an agreement regarding #3 that <1> Module name should
> >>> better be a simple identifier rather than a string literal. <2>
> Property
> >>> `type` is redundant and should be removed, and mapping will rely on the
> >>> module name because loading a module multiple times just using a
> >> different
> >>> module name doesn't make much sense. <3> We should migrate to the newer
> >> API
> >>> rather than the deprecated `TableFactory` class.
> >>>
> >>> Regarding #1, I think the point lies in whether changing the resolution
> >>> order implies an `unload` operation explicitly (i.e., users could sense
> >>> it). What do others think?
> >>>
> >>> Best,
> >>> Jane
> >>>
> >>> On Mon, Feb 1, 2021 at 6:41 PM Timo Walther 
> wrote:
> >>>
>  IMHO I would rather unload the not mentioned modules. The statement
>  expresses `USE` that implicilty implies that the other modules are
> "not
>  used". What do others think?
> 
>  Regards,
>  Timo
> 
> 
>  On 01.02.21 11:28, Jane Chan wrote:
> > Hi Jark and Rui,
> >
> > Thanks for the discussions.
> >
> > Regarding #1, I'm fine with `USE MODULES` syntax, and
> >
> >> It can be interpreted as "setting the current order of modules",
> >> which
>  is
> >> similar to "setting the current catalog" for `USE CATALOG`.
> >>
> > I would like to confirm that the unmentioned modules remain in the
> >> same
> > relative order? E.g., if there are three loaded modules `X`, `Y`,
> >> `Z`,
>  then
> > `USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`.
> >
> > Regarding #3, I'm fine with mapping modules purely by name, and I
> >> think
> > Jark raised a good point on making the module name a simple
> >> identifier
> > instead of a string literal. For backward compatibility, since we
> >>> haven't
> > supported this syntax yet, the affected users are those who defined
>  modules
> > in the YAML configuration file. Maybe we can eliminate the 'type'
> >> from
>  the
> > 'requiredContext' to make it optional. Thus the proposed mapping
>  mechanism
> > could use th

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-01 Thread Timo Walther


Not the module itself but the ModuleManager should handle this case, yes.

Regards,
Timo


On 01.02.21 17:35, Jane Chan wrote:

+1 to Jark's proposal

  To make it clearer,  will `module#getFunctionDefinition()` return empty
suppose the module is loaded but not enabled?

Best,
Jane

On Mon, Feb 1, 2021 at 10:02 PM Timo Walther  wrote:


+1 to Jark's proposal

I like the difference between just loading and actually enabling these
modules.

@Rui: I would use the same behavior as catalogs here. You cannot `USE` a
catalog without creating it before.

Another question is whether a LOAD operation also adds the module to the
enabled list by default?

Regards,
Timo

On 01.02.21 13:52, Rui Li wrote:

If `USE MODULES` implies unloading modules that are not listed, does it
also imply loading modules that are not previously loaded, especially

since

we're mapping modules by name now?

On Mon, Feb 1, 2021 at 8:20 PM Jark Wu  wrote:


I agree with Timo that the USE implies the specified modules are in use

in

the specified order and others are not used.
This would be easier to know what's the result list and order after the

USE

statement.
That means: if current modules in order are x, y, z. And `USE MODULES

z, y`

means current modules in order are z, y.

But I would like to not unload the unmentioned modules in the USE
statement. Because it seems strange that USE
will implicitly remove modules. In the above example, the user may type

the

wrong modules list using USE by mistake
   and would like to declare the list again, the user has to create the
module again with some properties he may don't know. Therefore, I

propose

the USE statement just specifies the current module lists and doesn't
unload modules.
Besides that, we may need a new syntax to list all the modules including
not used but loaded.
We can introduce SHOW FULL MODULES for this purpose with an additional
`used` column.

For example:

Flink SQL> list modules:
---
| modules |
---
| x   |
| y   |
| z   |
---
Flink SQL> USE MODULES z, y;
Flink SQL> show modules:
---
| modules |
---
| z   |
| y   |
---
Flink SQL> show FULL modules;
---
| modules |  used |
---
| z   | true  |
| y   | true  |
| x   | false |
---
Flink SQL> USE MODULES z, y, x;
Flink SQL> show modules;
---
| modules |
---
| z   |
| y   |
| x   |
---

What do you think?

Best,
Jark

On Mon, 1 Feb 2021 at 19:02, Jane Chan  wrote:


Hi Timo, thanks for the discussion.

It seems to reach an agreement regarding #3 that <1> Module name should
better be a simple identifier rather than a string literal. <2>

Property

`type` is redundant and should be removed, and mapping will rely on the
module name because loading a module multiple times just using a

different

module name doesn't make much sense. <3> We should migrate to the newer

API

rather than the deprecated `TableFactory` class.

Regarding #1, I think the point lies in whether changing the resolution
order implies an `unload` operation explicitly (i.e., users could sense
it). What do others think?

Best,
Jane

On Mon, Feb 1, 2021 at 6:41 PM Timo Walther 

wrote:



IMHO I would rather unload the not mentioned modules. The statement
expresses `USE` that implicilty implies that the other modules are

"not

used". What do others think?

Regards,
Timo


On 01.02.21 11:28, Jane Chan wrote:

Hi Jark and Rui,

Thanks for the discussions.

Regarding #1, I'm fine with `USE MODULES` syntax, and


It can be interpreted as "setting the current order of modules",

which

is

similar to "setting the current catalog" for `USE CATALOG`.


I would like to confirm that the unmentioned modules remain in the

same

relative order? E.g., if there are three loaded modules `X`, `Y`,

`Z`,

then

`USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`.

Regarding #3, I'm fine with mapping modules purely by name, and I

think

Jark raised a good point on making the module name a simple

identifier

instead of a string literal. For backward compatibility, since we

haven't

supported this syntax yet, the affected users are those who defined

modules

in the YAML configuration file. Maybe we can eliminate the 'type'

from

the

'requiredContext' to make it optional. Thus the proposed mapping

mechanism

could use the module name to lookup the suitable factory,  and in the
meanwhile updating documentation to encourage users to simplify their

YAML

configuration. And in the long run, we can deprecate the 'type'.

Best,
Jane

On Mon, Feb 1, 2021 at 4:19 PM Rui Li  wrote:


Thanks Jane for starting the discussion.

Regarding #1, I also prefer `USE MODULES` syntax. It can be

interpreted

as

"setting the current order of modules", which is similar to "setting

the

current catalog" for `USE CATALOG`.

Regarding #3, I'm fine to map modules purely by name because I think

it

satisfies all the use case

[jira] [Created] (FLINK-21232) Introduce pluggable Hadoop delegation token providers

2021-02-01 Thread jackwangcs (Jira)

jackwangcs created FLINK-21232:
--

 Summary: Introduce pluggable Hadoop delegation token providers
 Key: FLINK-21232
 URL: https://issues.apache.org/jira/browse/FLINK-21232
 Project: Flink
  Issue Type: New Feature
  Components: Deployment / YARN
Reporter: jackwangcs


Introduce a pluggable delegation provider via SPI. 

Delegation provider could be placed in connector related code and is more 
extendable comparing using reflection way to obtain DTs.

Email dicussion thread:

[https://lists.apache.org/thread.html/rbedb6e769358a10c6426c4c42b3b51cdbed48a3b6537e4ebde912bc0%40%3Cdev.flink.apache.org%3E]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [Announce] Documentation Freeze Feb 2nd

2021-02-01 Thread Seth Wiesman

Reminder

On Thu, Jan 28, 2021 at 9:07 AM Seth Wiesman  wrote:

> Hi Everyone,
>
> As part of migrating the flink documentation to Hugo, I need to ask the
> community for a short documentation freeze.  This will keep us from losing
> any contributions during the migration. I am proposing the freeze begin
> next week February 2nd with the goal to get the change merged in that week.
> I have been working to have everything ready to go to keep this as
> unobtrusive as possible.
>
> If you have a pending documentation PR please do not rush it. If it is not
> merged before next Tuesday you will simply need to rebase after the
> migration is completed.
>
> Please let me know if you have any questions.
>
> Seth
>

Re: [DISCUSS] Support obtaining Hive delegation tokens when submitting application to Yarn

2021-02-01 Thread Jack W

Hi Rui,

I agree with you that we can implement puggable DT providers firstly, I have 
created a new ticket to track it: 
https://issues.apache.org/jira/browse/FLINK-21232. 

Spark’s HadoopDelegationTokenManager could run on both client and 
driver(Application master) sides. On the client side, 
HadoopDelegationTokenManager is used to obtain tokens when users use keytab or 
`kinit`(credential cache);  on the driver side, it is used to obtain and renew 
DTs. To explain this, there are some backgrounds. Currently, Flink will 
distribute keytab to JobManager and TaskManagers, the kerberos credentials are 
renewed by the keytab on JobManager and TaskManagers. However, Spark adopts a 
different way solution, it only ships the keytab to Driver and Driver will use 
this keytab to renew all delegation tokens periodically and then distribute the 
renewed tokens to Executors. In this way, Spark can reduce the load on KDC. You 
could refer this doc for details: 
https://docs.google.com/document/d/10V7LiNlUJKeKZ58mkR7oVv1t6BrC6TZi3FGf2Dm6-i8/edit

Thanks,
Jie

On 2021/01/27 03:33:37, Rui Li  wrote: 
> Hi Jie,
> 
> Thanks for the investigation. I think we can first implement pluggable DT
> providers, and add renewal abilities incrementally. I'm also curious where
> Spark runs its HadoopDelegationTokenManager when renewal is enabled?
> Because it seems HadoopDelegationTokenManager needs access to keytab to
> create new tokens, does that mean it can only run on the client side?
> 
> On Mon, Jan 25, 2021 at 10:32 AM 王 杰  wrote:
> 
> > Hi Till,
> >
> > Sorry for late response, I just did some investigations about Spark. Spark
> > adopted the SPI way to obtain delegations for different components. It has
> > a HadoopDelegationTokenManager.scala<
> > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala>
> > to manage all Hadoop delegation tokens including obtaining and renewing the
> > delegation tokens.
> >
> > When the HadoopDelegationTokenManager is initializing, it will use
> > ServiceLoader to load all HadoopDelegationTokenProviders in different
> > connectors. As for Hive, the provider implementation is
> > HadoopDelegationTokenProvider<
> > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/security/HadoopDelegationTokenManager.scala
> > >.
> >
> > Thanks,
> > Jie
> >
> >
> > On 2021/01/13 08:51:29, Till Rohrmann  > trohrm...@apache.org>> wrote:
> > > Hi Jie Wang,
> > >
> > > thanks for starting this discussion. To me the SPI approach sounds better
> > > because it is not as brittle as using reflection. Concerning the
> > > configuration, we could think about introducing some Hive specific
> > > configuration options which allow us to specify these paths. How are
> > other
> > > projects which integrate with Hive are solving this problem?
> > >
> > > Cheers,
> > > Till
> > >
> > > On Tue, Jan 12, 2021 at 4:13 PM 王 杰  > jackwan...@outlook.com>> wrote:
> > >
> > > > Hi everyone,
> > > >
> > > > Currently, Hive delegation token is not obtained when Flink submits the
> > > > application in Yarn mode using kinit way. The ticket is
> > > > https://issues.apache.org/jira/browse/FLINK-20714. I'd like to start a
> > > > discussion about how to support this feature.
> > > >
> > > > Maybe we have two options:
> > > > 1. Using a reflection way to construct a Hive client to obtain the
> > token,
> > > > just same as the org.apache.flink.yarn.Utils.obtainTokenForHBase
> > > > implementation.
> > > > 2. Introduce a pluggable delegation provider via SPI. Delegation
> > provider
> > > > could be placed in connector related code, so reflection is not needed
> > and
> > > > is more extendable.
> > > >
> > > >
> > > >
> > > > Both options have to handle how to specify the HiveConf to use. In Hive
> > > > connector, user could specify both hiveConfDir and hadoopConfDir when
> > > > creating HiveCatalog. The hadoopConfDir may not the same as the Hadoop
> > > > configuration in HadoopModule.
> > > >
> > > > Looking forward to your suggestions.
> > > >
> > > > --
> > > > Best regards!
> > > > Jie Wang
> > > >
> > > >
> > >
> >
> 
> 
> -- 
> Best regards!
> Rui Li
>

[jira] [Created] (FLINK-21233) Race condition in CheckpointCoordinator in finishing sync savepoint

2021-02-01 Thread Roman Khachatryan (Jira)

Roman Khachatryan created FLINK-21233:
-

 Summary: Race condition in CheckpointCoordinator in finishing sync 
savepoint
 Key: FLINK-21233
 URL: https://issues.apache.org/jira/browse/FLINK-21233
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Checkpointing
Affects Versions: 1.12.1, 1.11.3, 1.13.0
Reporter: Roman Khachatryan


I'm writing an integration test and see a failure from time to time (1 per 100 
on my machine):
{code:java}
Caused by: java.util.concurrent.CompletionException: 
org.apache.flink.runtime.checkpoint.CheckpointException: CheckpointCoordinator 
shutdown.
{code}
 

Consider the final stage of the synchronous savepoint (started by stop with 
savepoint command):
 # The last subtask ACKs the checkpoint
 # CheckpointCoordinator finalizes the checkpoint and sends out confirmations
 # EndOfPartition is generated on sources and flows through the graph
 # Each Subtask notifies the Scheduler about its completion
 # Upon receiving the last notification Scheduler shuts down 
CheckpointCoordinator
 # CheckpointCoordinator aborts all pending checkpoing

Not that Scheduler and CheckpointCoordinator run in different threads.

So if savepoint finalization takes longer then it can be aborted before 
completion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-01 Thread Jark Wu

Hi Timo,

> Another question is whether a LOAD operation also adds the module to the
enabled list by default?

I would like to add the module to the enabled list by default, the main
reasons are:
1) Reordering is an advanced requirement, adding modules needs additional
USE statements with "core" module
 sounds too burdensome. Most users should be satisfied with only LOAD
statements.
2) We should keep compatible for TableEnvironment#loadModule().
3) We are using the LOAD statement instead of CREATE, so I think it's fine
that it does some implicit things.

Best,
Jark

On Tue, 2 Feb 2021 at 00:48, Timo Walther  wrote:

> Not the module itself but the ModuleManager should handle this case, yes.
>
> Regards,
> Timo
>
>
> On 01.02.21 17:35, Jane Chan wrote:
> > +1 to Jark's proposal
> >
> >   To make it clearer,  will `module#getFunctionDefinition()` return empty
> > suppose the module is loaded but not enabled?
> >
> > Best,
> > Jane
> >
> > On Mon, Feb 1, 2021 at 10:02 PM Timo Walther  wrote:
> >
> >> +1 to Jark's proposal
> >>
> >> I like the difference between just loading and actually enabling these
> >> modules.
> >>
> >> @Rui: I would use the same behavior as catalogs here. You cannot `USE` a
> >> catalog without creating it before.
> >>
> >> Another question is whether a LOAD operation also adds the module to the
> >> enabled list by default?
> >>
> >> Regards,
> >> Timo
> >>
> >> On 01.02.21 13:52, Rui Li wrote:
> >>> If `USE MODULES` implies unloading modules that are not listed, does it
> >>> also imply loading modules that are not previously loaded, especially
> >> since
> >>> we're mapping modules by name now?
> >>>
> >>> On Mon, Feb 1, 2021 at 8:20 PM Jark Wu  wrote:
> >>>
>  I agree with Timo that the USE implies the specified modules are in
> use
> >> in
>  the specified order and others are not used.
>  This would be easier to know what's the result list and order after
> the
> >> USE
>  statement.
>  That means: if current modules in order are x, y, z. And `USE MODULES
> >> z, y`
>  means current modules in order are z, y.
> 
>  But I would like to not unload the unmentioned modules in the USE
>  statement. Because it seems strange that USE
>  will implicitly remove modules. In the above example, the user may
> type
> >> the
>  wrong modules list using USE by mistake
> and would like to declare the list again, the user has to create
> the
>  module again with some properties he may don't know. Therefore, I
> >> propose
>  the USE statement just specifies the current module lists and doesn't
>  unload modules.
>  Besides that, we may need a new syntax to list all the modules
> including
>  not used but loaded.
>  We can introduce SHOW FULL MODULES for this purpose with an additional
>  `used` column.
> 
>  For example:
> 
>  Flink SQL> list modules:
>  ---
>  | modules |
>  ---
>  | x   |
>  | y   |
>  | z   |
>  ---
>  Flink SQL> USE MODULES z, y;
>  Flink SQL> show modules:
>  ---
>  | modules |
>  ---
>  | z   |
>  | y   |
>  ---
>  Flink SQL> show FULL modules;
>  ---
>  | modules |  used |
>  ---
>  | z   | true  |
>  | y   | true  |
>  | x   | false |
>  ---
>  Flink SQL> USE MODULES z, y, x;
>  Flink SQL> show modules;
>  ---
>  | modules |
>  ---
>  | z   |
>  | y   |
>  | x   |
>  ---
> 
>  What do you think?
> 
>  Best,
>  Jark
> 
>  On Mon, 1 Feb 2021 at 19:02, Jane Chan  wrote:
> 
> > Hi Timo, thanks for the discussion.
> >
> > It seems to reach an agreement regarding #3 that <1> Module name
> should
> > better be a simple identifier rather than a string literal. <2>
> >> Property
> > `type` is redundant and should be removed, and mapping will rely on
> the
> > module name because loading a module multiple times just using a
>  different
> > module name doesn't make much sense. <3> We should migrate to the
> newer
>  API
> > rather than the deprecated `TableFactory` class.
> >
> > Regarding #1, I think the point lies in whether changing the
> resolution
> > order implies an `unload` operation explicitly (i.e., users could
> sense
> > it). What do others think?
> >
> > Best,
> > Jane
> >
> > On Mon, Feb 1, 2021 at 6:41 PM Timo Walther 
> >> wrote:
> >
> >> IMHO I would rather unload the not mentioned modules. The statement
> >> expresses `USE` that implicilty implies that the other modules are
> >> "not
> >> used". What do others think?
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 01.02.21 11:28, Jane Chan wrote:
> >>> Hi Jark and Rui,
> >>>
> >

Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior

2021-02-01 Thread Jark Wu

Hi Leonard, Timo,

I just did some investigation and found all the other batch processing
systems
 evaluate the time functions at query-start, including Snowflake, Hive,
Spark, Trino.
I'm wondering whether the default 'per-record' mode will still be weird for
batch users.
I know we proposed the option for batch users to change the behavior.
However if 90% users need to set this config before submitting batch jobs,
why not
use this mode for batch by default? For the other 10% special users, they
can still
set the config to per-record before submitting batch jobs. I believe this
can greatly
improve the usability for batch cases.

Therefore, what do you think about using "auto" as the default option
value?

It evaluates time functions per-record in streaming mode and evaluates at
query start in batch mode.
I think this can make both streaming users and batch users happy. IIUC, the
reason why we
proposing the default "per-record" mode is for the batch streaming
consistent.
However, I think time functions are special cases because they are
naturally non-deterministic.
Even if streaming jobs and batch jobs all use "per-record" mode, they still
can't provide consistent
results. Thus, I think we may need to think more from the users'
perspective.

Best,
Jark


On Mon, 1 Feb 2021 at 23:06, Timo Walther  wrote:

> Hi Leonard,
>
> thanks for considering this issue as well. +1 for the proposed config
> option. Let's start a voting thread once the FLIP document has been
> updated if there are no other concerns?
>
> Thanks,
> Timo
>
>
> On 01.02.21 15:07, Leonard Xu wrote:
> > Hi, all
> >
> > I’ve discussed with @Timo @Jark about the time function evaluation
> further. We reach a consensus that we’d better address the time function
> evaluation(function value materialization) in this FLIP as well.
> >
> > We’re fine with introducing an option
> table.exec.time-function-evaluation to control the materialize time point
> of time function value. The time function includes
> > LOCALTIME
> > LOCALTIMESTAMP
> > CURRENT_DATE
> > CURRENT_TIME
> > CURRENT_TIMESTAMP
> > NOW()
> > The default value of table.exec.time-function-evaluation is
> 'per-record', which means Flink evaluates the function value per record, we
> recommend users config this option value for their streaming pipe lines.
> > Another valid option value is ’query-start’, which means Flink evaluates
> the function value at the query start, we recommend users config this
> option value for their batch pipelines.
> > In the future, more valid evaluation option value like ‘auto' may be
> supported if there’re new requirements, e.g： support ‘auto’ option which
> evaluates time function value per-record in streaming mode and evaluates
> > time function value at query start in batch mode.
> >
> > Alternative1:
> >   Introduce function like CURRENT_TIMESTAMP2/CURRENT_TIMESTAMP_NOW
> which evaluates function value at query start. This may confuse users a bit
> that we provide two similar functions but with different return value.
>
> >
> > Alternative2:
> > Do not introduce any configuration/function, control the
> function evaluation by pipeline execution mode. This may produce different
> result when user use their  streaming pipeline sql to run a batch
> pipeline(e.g backfilling), and user also
> > can not control these function behavior.
> >
> >
> > How do you think ?
> >
> > Thanks,
> > Leonard
> >
> >
> >> 在 2021年2月1日，18:23，Timo Walther  写道：
> >>
> >> Parts of the FLIP can already be implemented without a completed
> voting, e.g. there is no doubt that we should support TIME(9).
> >>
> >> However, I don't see a benefit of reworking the time functions to
> rework them again later. If we lock the time on query-start the
> implementation of the previsouly mentioned functions will be completely
> different.
> >>
> >> Regards,
> >> Timo
> >>
> >>
> >> On 01.02.21 02:37, Kurt Young wrote:
> >>> I also prefer to not expand this FLIP further, but we could open a
> >>> discussion thread
> >>> right after this FLIP being accepted and start coding & reviewing. Make
> >>> technique
> >>> discussion and coding more pipelined will improve efficiency.
> >>> Best,
> >>> Kurt
> >>> On Sat, Jan 30, 2021 at 3:47 PM Leonard Xu  wrote:
>  Hi, Timo
> 
> > I do think that this topic must be part of the FLIP as well. Esp. if
> the
>  FLIP has the title "time function behavior" and this is clearly a
>  behavioral aspect. We are performing a heavy refactoring of the SQL
> query
>  semantics in Flink here which will affect a lot of users. We cannot
> rework
>  the time functions a third time after this.
> > I checked a couple of other vendors. It seems that they all lock the
>  timestamp when the query is started. And as you said, in this case
> both
>  mature (Oracle) and less mature systems (Hive, MySQL) have the same
>  behavior.
> 
>  FLIP-162> “These problems come from the fact that lots of time-related
>  functions like

Re: [DISCUSS]FLIP-163: SQL Client Improvements

2021-02-01 Thread Jingsong Li

Thanks for the proposal, yes, sql-client is too outdated. +1 for improving
it.

About "SET"  and "RESET", Why not be "SET" and "UNSET"?

Best,
Jingsong

On Mon, Feb 1, 2021 at 2:46 PM Rui Li  wrote:

> Thanks Shengkai for the update! The proposed changes look good to me.
>
> On Fri, Jan 29, 2021 at 8:26 PM Shengkai Fang  wrote:
>
> > Hi, Rui.
> > You are right. I have already modified the FLIP.
> >
> > The main changes:
> >
> > # -f parameter has no restriction about the statement type.
> > Sometimes, users use the pipe to redirect the result of queries to debug
> > when submitting job by -f parameter. It's much convenient comparing to
> > writing INSERT INTO statements.
> >
> > # Add a new sql client option `sql-client.job.detach` .
> > Users prefer to execute jobs one by one in the batch mode. Users can set
> > this option false and the client will process the next job until the
> > current job finishes. The default value of this option is false, which
> > means the client will execute the next job when the current job is
> > submitted.
> >
> > Best,
> > Shengkai
> >
> >
> >
> > Rui Li  于2021年1月29日周五 下午4:52写道：
> >
> >> Hi Shengkai,
> >>
> >> Regarding #2, maybe the -f options in flink and hive have different
> >> implications, and we should clarify the behavior. For example, if the
> >> client just submits the job and exits, what happens if the file contains
> >> two INSERT statements? I don't think we should treat them as a statement
> >> set, because users should explicitly write BEGIN STATEMENT SET in that
> >> case. And the client shouldn't asynchronously submit the two jobs,
> because
> >> the 2nd may depend on the 1st, right?
> >>
> >> On Fri, Jan 29, 2021 at 4:30 PM Shengkai Fang 
> wrote:
> >>
> >>> Hi Rui,
> >>> Thanks for your feedback. I agree with your suggestions.
> >>>
> >>> For the suggestion 1: Yes. we are plan to strengthen the set command.
> In
> >>> the implementation, it will just put the key-value into the
> >>> `Configuration`, which will be used to generate the table config. If
> hive
> >>> supports to read the setting from the table config, users are able to
> set
> >>> the hive-related settings.
> >>>
> >>> For the suggestion 2: The -f parameter will submit the job and exit. If
> >>> the queries never end, users have to cancel the job by themselves,
> which is
> >>> not reliable(people may forget their jobs). In most case, queries are
> used
> >>> to analyze the data. Users should use queries in the interactive mode.
> >>>
> >>> Best,
> >>> Shengkai
> >>>
> >>> Rui Li  于2021年1月29日周五 下午3:18写道：
> >>>
>  Thanks Shengkai for bringing up this discussion. I think it covers a
>  lot of useful features which will dramatically improve the usability
> of our
>  SQL Client. I have two questions regarding the FLIP.
> 
>  1. Do you think we can let users set arbitrary configurations via the
>  SET command? A connector may have its own configurations and we don't
> have
>  a way to dynamically change such configurations in SQL Client. For
> example,
>  users may want to be able to change hive conf when using hive
> connector [1].
>  2. Any reason why we have to forbid queries in SQL files specified
> with
>  the -f option? Hive supports a similar -f option but allows queries
> in the
>  file. And a common use case is to run some query and redirect the
> results
>  to a file. So I think maybe flink users would like to do the same,
>  especially in batch scenarios.
> 
>  [1] https://issues.apache.org/jira/browse/FLINK-20590
> 
>  On Fri, Jan 29, 2021 at 10:46 AM Sebastian Liu  >
>  wrote:
> 
> > Hi Shengkai,
> >
> > Glad to see this improvement. And I have some additional suggestions:
> >
> > #1. Unify the TableEnvironment in ExecutionContext to
> > StreamTableEnvironment for both streaming and batch sql.
> > #2. Improve the way of results retrieval: sql client collect the
> > results
> > locally all at once using accumulators at present,
> >   which may have memory issues in JM or Local for the big query
> > result.
> > Accumulator is only suitable for testing purpose.
> >   We may change to use SelectTableSink, which is based
> > on CollectSinkOperatorCoordinator.
> > #3. Do we need to consider Flink SQL gateway which is in FLIP-91.
> Seems
> > that this FLIP has not moved forward for a long time.
> >   Provide a long running service out of the box to facilitate the
> > sql
> > submission is necessary.
> >
> > What do you think of these?
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-91%3A+Support+SQL+Client+Gateway
> >
> >
> > Shengkai Fang  于2021年1月28日周四 下午8:54写道：
> >
> > > Hi devs,
> > >
> > > Jark and I want to start a discussion about FLIP-163:SQL Client
> > > Improvements.
> > >
> > > Many users have complained about the problems of t

Re: [DISCUSS] FLIP-162: Consistent Flink SQL time function behavior

2021-02-01 Thread Jingsong Li

+1 for the default "auto" to the "table.exec.time-function-evaluation".

>From the definition of these functions, in my opinion:
- Batch is the instant execution of all records, which is the meaning of
the word "BATCH", so there is only one time at query-start.
- Stream only executes a single record in a moment, so time is generated by
each record.

On the other hand, we should be more careful about consistency with other
systems.

Best,
Jingsong

On Tue, Feb 2, 2021 at 11:24 AM Jark Wu  wrote:

> Hi Leonard, Timo,
>
> I just did some investigation and found all the other batch processing
> systems
>  evaluate the time functions at query-start, including Snowflake, Hive,
> Spark, Trino.
> I'm wondering whether the default 'per-record' mode will still be weird for
> batch users.
> I know we proposed the option for batch users to change the behavior.
> However if 90% users need to set this config before submitting batch jobs,
> why not
> use this mode for batch by default? For the other 10% special users, they
> can still
> set the config to per-record before submitting batch jobs. I believe this
> can greatly
> improve the usability for batch cases.
>
> Therefore, what do you think about using "auto" as the default option
> value?
>
> It evaluates time functions per-record in streaming mode and evaluates at
> query start in batch mode.
> I think this can make both streaming users and batch users happy. IIUC, the
> reason why we
> proposing the default "per-record" mode is for the batch streaming
> consistent.
> However, I think time functions are special cases because they are
> naturally non-deterministic.
> Even if streaming jobs and batch jobs all use "per-record" mode, they still
> can't provide consistent
> results. Thus, I think we may need to think more from the users'
> perspective.
>
> Best,
> Jark
>
>
> On Mon, 1 Feb 2021 at 23:06, Timo Walther  wrote:
>
> > Hi Leonard,
> >
> > thanks for considering this issue as well. +1 for the proposed config
> > option. Let's start a voting thread once the FLIP document has been
> > updated if there are no other concerns?
> >
> > Thanks,
> > Timo
> >
> >
> > On 01.02.21 15:07, Leonard Xu wrote:
> > > Hi, all
> > >
> > > I’ve discussed with @Timo @Jark about the time function evaluation
> > further. We reach a consensus that we’d better address the time function
> > evaluation(function value materialization) in this FLIP as well.
> > >
> > > We’re fine with introducing an option
> > table.exec.time-function-evaluation to control the materialize time point
> > of time function value. The time function includes
> > > LOCALTIME
> > > LOCALTIMESTAMP
> > > CURRENT_DATE
> > > CURRENT_TIME
> > > CURRENT_TIMESTAMP
> > > NOW()
> > > The default value of table.exec.time-function-evaluation is
> > 'per-record', which means Flink evaluates the function value per record,
> we
> > recommend users config this option value for their streaming pipe lines.
> > > Another valid option value is ’query-start’, which means Flink
> evaluates
> > the function value at the query start, we recommend users config this
> > option value for their batch pipelines.
> > > In the future, more valid evaluation option value like ‘auto' may be
> > supported if there’re new requirements, e.g： support ‘auto’ option which
> > evaluates time function value per-record in streaming mode and evaluates
> > > time function value at query start in batch mode.
> > >
> > > Alternative1:
> > >   Introduce function like CURRENT_TIMESTAMP2/CURRENT_TIMESTAMP_NOW
> > which evaluates function value at query start. This may confuse users a
> bit
> > that we provide two similar functions but with different return value.
> >
> > >
> > > Alternative2:
> > > Do not introduce any configuration/function, control the
> > function evaluation by pipeline execution mode. This may produce
> different
> > result when user use their  streaming pipeline sql to run a batch
> > pipeline(e.g backfilling), and user also
> > > can not control these function behavior.
> > >
> > >
> > > How do you think ?
> > >
> > > Thanks,
> > > Leonard
> > >
> > >
> > >> 在 2021年2月1日，18:23，Timo Walther  写道：
> > >>
> > >> Parts of the FLIP can already be implemented without a completed
> > voting, e.g. there is no doubt that we should support TIME(9).
> > >>
> > >> However, I don't see a benefit of reworking the time functions to
> > rework them again later. If we lock the time on query-start the
> > implementation of the previsouly mentioned functions will be completely
> > different.
> > >>
> > >> Regards,
> > >> Timo
> > >>
> > >>
> > >> On 01.02.21 02:37, Kurt Young wrote:
> > >>> I also prefer to not expand this FLIP further, but we could open a
> > >>> discussion thread
> > >>> right after this FLIP being accepted and start coding & reviewing.
> Make
> > >>> technique
> > >>> discussion and coding more pipelined will improve efficiency.
> > >>> Best,
> > >>> Kurt
> > >>> On Sat, Jan 30, 2021 at 3:47 PM Leonard Xu 
> wrote:
> >  H

[jira] [Created] (FLINK-21234) testKafkaSourceSinkWithKeyAndPartialValue[legacy = false, format = csv] hang

2021-02-01 Thread Guowei Ma (Jira)

Guowei Ma created FLINK-21234:
-

 Summary: testKafkaSourceSinkWithKeyAndPartialValue[legacy = false, 
format = csv] hang
 Key: FLINK-21234
 URL: https://issues.apache.org/jira/browse/FLINK-21234
 Project: Flink
  Issue Type: Bug
  Components: Connectors / Kafka
Affects Versions: 1.13.0
Reporter: Guowei Ma


https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12758&view=logs&j=c5f0071e-1851-543e-9a45-9ac140befc32&t=1fb1a56f-e8b5-5a82-00a0-a2db7757b4f5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21235) leaderChange_withBlockingJobManagerTermination_doesNotAffectNewLeader hang

2021-02-01 Thread Guowei Ma (Jira)

Guowei Ma created FLINK-21235:
-

 Summary: 
leaderChange_withBlockingJobManagerTermination_doesNotAffectNewLeader hang
 Key: FLINK-21235
 URL: https://issues.apache.org/jira/browse/FLINK-21235
 Project: Flink
  Issue Type: Bug
  Components: Runtime / Coordination
Affects Versions: 1.11.3
Reporter: Guowei Ma


[https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=12759&view=logs&j=3b6ec2fd-a816-5e75-c775-06fb87cb6670&t=b33fdd4f-3de5-542e-2624-5d53167bb672]
{code:java}
at 
java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1742)
at 
java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908)
at 
org.apache.flink.util.AutoCloseableAsync.close(AutoCloseableAsync.java:36)
at 
org.apache.flink.runtime.dispatcher.runner.DefaultDispatcherRunnerITCase.leaderChange_withBlockingJobManagerTermination_doesNotAffectNewLeader(DefaultDispatcherRunnerITCase.java:211)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:55)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48)
at org.junit.rules.RunRules.evaluate(RunRules.java:20)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at 
org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:365)
at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithReru{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21236) Don't explicitly use HeapMemorySegment in row format serde

2021-02-01 Thread Kurt Young (Jira)

Kurt Young created FLINK-21236:
--

 Summary: Don't explicitly use HeapMemorySegment in row format serde
 Key: FLINK-21236
 URL: https://issues.apache.org/jira/browse/FLINK-21236
 Project: Flink
  Issue Type: Improvement
  Components: Table SQL / Planner
Affects Versions: 1.12.0
Reporter: Kurt Young
 Fix For: 1.13.0


`RawFormatDeserializationSchema` and `RawFormatSerializationSchema` explicitly 
used `HeapMemorySegment`, and in a typical batch job, `HybridMemorySegment` 
will also be loaded and used as managed memory. This will prevent Class 
Hierarchy Analysis (CHA) to optimize the function call of MemorySegment. More 
details can be found here: 
[https://flink.apache.org/news/2015/09/16/off-heap-memory.html]

We can use `ByteBuffer` instead of `HeapMemorySegment`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (FLINK-21237) Reflects the actual running state of the job

2021-02-01 Thread liuzhuo (Jira)

liuzhuo created FLINK-21237:
---

 Summary: Reflects the actual running state of the job
 Key: FLINK-21237
 URL: https://issues.apache.org/jira/browse/FLINK-21237
 Project: Flink
  Issue Type: Improvement
  Components: Runtime / Task
Reporter: liuzhuo


 
{code:java}

public enum JobStatus {
   ...
   /** Some tasks are scheduled or running, some may be pending, some may be
finished. */
   RUNNING(TerminalState.NON_TERMINAL),
   ...
}{code}
According to the RUNNING comment, some tasks are not in the true RUNNING state, 
and may take a while to reach RUNNING, or even fail due to some errors. why not 
to provide a state that truly reflects the Tasks RUNNING, indicating that all 
tasks are RUNNING and in this state they can process data correctly

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Re: [DISCUSS] FLINK-21045: Support 'load module' and 'unload module' SQL syntax

2021-02-01 Thread Timo Walther


+1

@Jane Can you summarize our discussion in the JIRA issue?

Thanks,
Timo


On 02.02.21 03:50, Jark Wu wrote:

Hi Timo,


Another question is whether a LOAD operation also adds the module to the

enabled list by default?

I would like to add the module to the enabled list by default, the main
reasons are:
1) Reordering is an advanced requirement, adding modules needs additional
USE statements with "core" module
  sounds too burdensome. Most users should be satisfied with only LOAD
statements.
2) We should keep compatible for TableEnvironment#loadModule().
3) We are using the LOAD statement instead of CREATE, so I think it's fine
that it does some implicit things.

Best,
Jark

On Tue, 2 Feb 2021 at 00:48, Timo Walther  wrote:


Not the module itself but the ModuleManager should handle this case, yes.

Regards,
Timo


On 01.02.21 17:35, Jane Chan wrote:

+1 to Jark's proposal

   To make it clearer,  will `module#getFunctionDefinition()` return empty
suppose the module is loaded but not enabled?

Best,
Jane

On Mon, Feb 1, 2021 at 10:02 PM Timo Walther  wrote:


+1 to Jark's proposal

I like the difference between just loading and actually enabling these
modules.

@Rui: I would use the same behavior as catalogs here. You cannot `USE` a
catalog without creating it before.

Another question is whether a LOAD operation also adds the module to the
enabled list by default?

Regards,
Timo

On 01.02.21 13:52, Rui Li wrote:

If `USE MODULES` implies unloading modules that are not listed, does it
also imply loading modules that are not previously loaded, especially

since

we're mapping modules by name now?

On Mon, Feb 1, 2021 at 8:20 PM Jark Wu  wrote:


I agree with Timo that the USE implies the specified modules are in

use

in

the specified order and others are not used.
This would be easier to know what's the result list and order after

the

USE

statement.
That means: if current modules in order are x, y, z. And `USE MODULES

z, y`

means current modules in order are z, y.

But I would like to not unload the unmentioned modules in the USE
statement. Because it seems strange that USE
will implicitly remove modules. In the above example, the user may

type

the

wrong modules list using USE by mistake
and would like to declare the list again, the user has to create

the

module again with some properties he may don't know. Therefore, I

propose

the USE statement just specifies the current module lists and doesn't
unload modules.
Besides that, we may need a new syntax to list all the modules

including

not used but loaded.
We can introduce SHOW FULL MODULES for this purpose with an additional
`used` column.

For example:

Flink SQL> list modules:
---
| modules |
---
| x   |
| y   |
| z   |
---
Flink SQL> USE MODULES z, y;
Flink SQL> show modules:
---
| modules |
---
| z   |
| y   |
---
Flink SQL> show FULL modules;
---
| modules |  used |
---
| z   | true  |
| y   | true  |
| x   | false |
---
Flink SQL> USE MODULES z, y, x;
Flink SQL> show modules;
---
| modules |
---
| z   |
| y   |
| x   |
---

What do you think?

Best,
Jark

On Mon, 1 Feb 2021 at 19:02, Jane Chan  wrote:


Hi Timo, thanks for the discussion.

It seems to reach an agreement regarding #3 that <1> Module name

should

better be a simple identifier rather than a string literal. <2>

Property

`type` is redundant and should be removed, and mapping will rely on

the

module name because loading a module multiple times just using a

different

module name doesn't make much sense. <3> We should migrate to the

newer

API

rather than the deprecated `TableFactory` class.

Regarding #1, I think the point lies in whether changing the

resolution

order implies an `unload` operation explicitly (i.e., users could

sense

it). What do others think?

Best,
Jane

On Mon, Feb 1, 2021 at 6:41 PM Timo Walther 

wrote:



IMHO I would rather unload the not mentioned modules. The statement
expresses `USE` that implicilty implies that the other modules are

"not

used". What do others think?

Regards,
Timo


On 01.02.21 11:28, Jane Chan wrote:

Hi Jark and Rui,

Thanks for the discussions.

Regarding #1, I'm fine with `USE MODULES` syntax, and


It can be interpreted as "setting the current order of modules",

which

is

similar to "setting the current catalog" for `USE CATALOG`.


I would like to confirm that the unmentioned modules remain in the

same

relative order? E.g., if there are three loaded modules `X`, `Y`,

`Z`,

then

`USE MODULES Y, Z` means shifting the order to `Y`, `Z`, `X`.

Regarding #3, I'm fine with mapping modules purely by name, and I

think

Jark raised a good point on making the module name a simple

identifier

instead of a string literal. For backward compatibility, since we

haven't

supported this syntax yet, the affected users are those who defined

modules

in

[jira] [Created] (FLINK-21238) Support to close PythonFunctionFactory manually

2021-02-01 Thread Dian Fu (Jira)

Dian Fu created FLINK-21238:
---

 Summary: Support to close PythonFunctionFactory manually
 Key: FLINK-21238
 URL: https://issues.apache.org/jira/browse/FLINK-21238
 Project: Flink
  Issue Type: Improvement
  Components: API / Python
Reporter: Dian Fu
Assignee: Dian Fu
 Fix For: 1.13.0


PythonFunctionFactory is used to convert a Python class to a Java 
PythonFunction representation which could then be used as a user-defined 
function. Underlying PythonFunctionFactory, there is a Python process which is 
used to perform the actual conversion work. Currently, the Python process is 
added to shutdown hook and closed when the JVM exits. The aim of this JIRA is 
to provide more flexibility for users by introducing a close method to 
PythonFunctionFactory to allow it to be manually closed. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

37 matches

Mail list logo