date:20190908

[jira] [Created] (FLINK-14003) Add access to customized state other than built in state types

2019-09-08 Thread Shimin Yang (Jira)

Shimin Yang created FLINK-14003:
---

 Summary: Add access to customized state other than built in state 
types
 Key: FLINK-14003
 URL: https://issues.apache.org/jira/browse/FLINK-14003
 Project: Flink
  Issue Type: Wish
  Components: Runtime / State Backends
Reporter: Shimin Yang


Currently the keyed state backend supports states like value, list, map, 
although they are quite useful, but it may not fits all user cases. 

User can develop their customized state backend based on different data store 
as a plugin, but they do not have access to customized state if the type is not 
one of those built-in types. 

I propose to add getPartitionedState method to RuntimeContext and 
KeyedStateStore interface to allow user to access their customized state, and 
it will not affect current code path.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

2019-09-08 Thread myasuka

Congratulations Kostas! 

Best
Yun Tang



--
Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/

[ANNOUNCE] Weekly Community Update 2019/33-36

2019-09-08 Thread Konstantin Knauf

Dear Community,

happy to share this "week's" community update, back after a three week
summer break. It's been a very busy time in the Flink community as a lot of
FLIP discussions and votes for Apache Flink 1.10 are on their way. I will
try to cover a good part of it in this update along with bugs in Flink
1.9.0 and and more...

Flink Development
==

* [roadmap] There are currently two great resources to get an overview
of *Flink's
Roadmap* for 1.10 and beyond. The first one is the recently updated roadmap
on the Project website [1] and the other one is a discussion thread
launched by Gary on the features for Flink 1.10 [2]. Gary and Yu Li stepped
up as release managers for Flink 1.10 and proposed a feature freeze around
end of November 2019 and a release beginning of January 2020. Most of the
FLIP discussions covered in this update are mentioned on these roadmaps.

* [releases] The vote for *Apache Flink 1.8.2 *RC1 [3] is currently
ongoing. Checkout the corresponding discussion thread [4] for a list of
fixes.

* [development] Following up on the repository split discussion, the
community is now looking into other ways to *reduce the build time* of
Apache Flink. Chesnay has proposed several options, some of which are
investigated in more detailed as of writing. Among these are sharing JVMs
between tests for more modules, moving to gradle has a build system (better
incremental builds) and moving to a different CI system (Azure Pipelines?).
[5]

* [state] Yu Li proposes to add a new state backend to Flink, the
*SpillableHeapStatebackend.* [6] State will primarly live on the Java heap,
but the coldest state will be spilled to disk if memory becomes scarce. The
vote has already passed. [7]

* [python] Jincheng has started a discussion on adding support for
*user-defined
functions* in the Python Table API. The high-level architecture follows the
approach of Beam's portability framework of executing user-defined
functions in a separate language specific environment. The first FLIP
(FLIP-58) will only deal with stateless user-defined functions and will lay
the ground work.[8]

* [sql] Xu Forward has started a discussion on adding functions to *construct
and query JSON* objects in Flink SQL. The proposal has generally been
well-received, but there is no FLIP yet. [9]

* [sql] Bowen has started a discussion on reworking the *function catalog*,
which among other goals aims to support external built-in functions (Hive),
to revisit the resolution order of function names and to support fully
qualified function names. [10]

* [connectors] Yijie Shen proposes to contribute the *Apache Pulsar
connector* (currently in Apache Pulsar) back to Apache Flink. While
everyone agrees that a strong Apache Pulsar connector is a valuable
contribution to the project, there are concerns about build time,
maintainability in the long-run and dependencies on FLIP-27 (New Source
Interface). The discussion is ongoing. [11]

* [connectors] From Apache Flink 1.10 onwards the* Kinesis Connector* will
be part of the Apache Flink release. In the past this was blocked by the
license of its dependencies, which have recently been changed to Apache
2.0. [12]

* [recovery] Till has published to small FLIPs on *Flink's restart
strategies*. The first one, FLIP-61, proposes to change the logic to
determine the restart strategy to ignore restart strategy configuration
properties, when the corresponding restart strategy was not set via
"restart-strategy". The other one, FLIP-62, proposes to change the default
restart delay for all strategies from 0s to 1s. The vote has passed for
both of them [13, 14].

* [resource management] Following up on FLIP-49, Xintong Song has started a
discussion on FLIP-53 to add *fine grained operator resource management* to
Flink [15]. If I understand it correctly, the feature will only be
available via the Blink Planner of the Table API at first, and might later
be extended to the DataStream API. The DataSet API will not be affected.
The vote [16] is currently ongoing.

* [configuration] Dawid introduced a FLIP that adds support to configure
ExecutionConfig (and similar classes) from a file or more generally from
layers above the StreamExecutionEnvironment, which you currently need
access to change these configurations. [17]

* [development] Stephan proposed to switch *Java's Duration class* instead
of Flink's time class for non-API parts of Flink (API maybe in Flink 2.0).
[18]

* [development] Gyula started a discussion to unify the implementation of
the *Builder pattern in Flink*. Following the discussion he will add some
guidelines to the code style guide. [19]

* [releases] *Apache Flink-shaded 8.0* has been released. [20]

[1] https://flink.apache.org/roadmap.html
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Features-for-Apache-Flink-1-10-tp32824p32844.html
[3]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-Release-1-8-2-release-candidate-1-tp32808.html
[4]

[jira] [Created] (FLINK-14004) Define SourceReader interface to verify the integration with StreamOneInputProcessor

2019-09-08 Thread zhijiang (Jira)

zhijiang created FLINK-14004:


 Summary: Define SourceReader interface to verify the integration 
with StreamOneInputProcessor
 Key: FLINK-14004
 URL: https://issues.apache.org/jira/browse/FLINK-14004
 Project: Flink
  Issue Type: Sub-task
  Components: Runtime / Task
Reporter: zhijiang
Assignee: zhijiang


We already refactored the task input and output sides based on the new source 
characters in FLIP-27. In order to further verify that the new source reader 
could work well with the unified StreamOneInputProcessor in mailbox model, we 
would design a unit test for integrating the whole process. In detail:
 * Define SourceReader and SourceOutput relevant interfaces based on FLIP-27

 * Implement an example of stateless SourceReader (bounded sequence of integers)

 * Define SourceReaderOperator to integrate the SourceReader with 
StreamOneInputProcessor

 * Define SourceReaderStreamTask to execute the source input and implement a 
unit test for it.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [DISCUSS] Support customize state in customized KeyedStateBackend

2019-09-08 Thread Yu Li

Hi Shimin,

Thanks for bring this discussion up.

First of all, I'd like to confirm/clarify that this discussion is mainly
about managed state with customized state descriptor rather than raw state,
right? Asking because raw state was the very first thing came to my mind
when seeing the title.

And this is actually the first topic/question we need to discuss, that
whether we should support user-defined state descriptor and still ask
framework to manage the state life cycle. Personally I'm +1 on this since
the "official" state (data-structure) types (currently mainly value, list
and map) may not be optimized for customer case, but we'd better ask
others' opinion.

Secondly, if the result of the first question is "Yes", then it's truly a
problem that "Although we can implement customized StateDescriptors for
different kind of data structures, we do not really have access to such
customized state in RichFunctions", and how to resolve it is the second
topic/question to discuss.

I've noticed your proposal of exposing "getParitionedState" method out in
"RuntimeContext" and "KeyedStateStore" in JIRA (FLINK-14003), but IMO
adding a specific interface like below is better than exposing the internal
one:
 State getCustomizedState(StateDescriptor
stateProperties);

Finally, I think this is a user-facing and definitely worthwhile
discussion, and requires a FLIP to document the conclusion and
design/implementation (if any) down. What's your opinion?

Thanks.

Best Regards,
Yu

On Fri, 6 Sep 2019 at 13:27, shimin yang  wrote:

> Hi every,
>
> I would like to start a discussion on supporting customize state
> in customized KeyedStateBackend.
>
> In Flink, users can customize KeyedStateBackend to support different type
> of data store. Although we can implement customized StateDescriptors for
> different kind of data structrues, we do not really have access to such
> customized state in RichFunctions.
>
> I propose to add a getOtherState method in RuntimeContext and
> DefaultKeyedStateStore which directly takes StateDescriptor as parameter to
> allow user to get customized state.
>
> What do you think?
>
> Thanks.
>
> Best,
> Shimin
>

Re: [DISCUSS] Support customize state in customized KeyedStateBackend

2019-09-08 Thread Yun Tang

Hi all

First of all, I agreed with Yu that we should support to make state type 
pluginable.

If we take a look at current Flink implementation. Users could implement their 
pluginable state backend to satisfy their own meets now. However, even users 
could define their own state descriptor, they cannot store the customized state 
within their state backend. The root cause behind this is that current 
StateBackendFactory could accept user defined state backend factory while 
StateFactory (within HeapKeyedStateBackend [1] and RocksDBKeyedStateBackend [2] 
) cannot.

If we agreed that we should leave the right of implementing customized state 
backend to users, it's naturally to agree that we should also leave the right 
of implementing customized states to users.

[1] 
https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java#L79
[2] 
https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L114

Best
Yun Tang

From: Yu Li 
Sent: Monday, September 9, 2019 2:24
To: dev 
Subject: Re: [DISCUSS] Support customize state in customized KeyedStateBackend

Hi Shimin,

Thanks for bring this discussion up.

First of all, I'd like to confirm/clarify that this discussion is mainly
about managed state with customized state descriptor rather than raw state,
right? Asking because raw state was the very first thing came to my mind
when seeing the title.

And this is actually the first topic/question we need to discuss, that
whether we should support user-defined state descriptor and still ask
framework to manage the state life cycle. Personally I'm +1 on this since
the "official" state (data-structure) types (currently mainly value, list
and map) may not be optimized for customer case, but we'd better ask
others' opinion.

Secondly, if the result of the first question is "Yes", then it's truly a
problem that "Although we can implement customized StateDescriptors for
different kind of data structures, we do not really have access to such
customized state in RichFunctions", and how to resolve it is the second
topic/question to discuss.

I've noticed your proposal of exposing "getParitionedState" method out in
"RuntimeContext" and "KeyedStateStore" in JIRA (FLINK-14003), but IMO
adding a specific interface like below is better than exposing the internal
one:
 State getCustomizedState(StateDescriptor
stateProperties);

Finally, I think this is a user-facing and definitely worthwhile
discussion, and requires a FLIP to document the conclusion and
design/implementation (if any) down. What's your opinion?

Thanks.

Best Regards,
Yu

On Fri, 6 Sep 2019 at 13:27, shimin yang  wrote:

> Hi every,
>
> I would like to start a discussion on supporting customize state
> in customized KeyedStateBackend.
>
> In Flink, users can customize KeyedStateBackend to support different type
> of data store. Although we can implement customized StateDescriptors for
> different kind of data structrues, we do not really have access to such
> customized state in RichFunctions.
>
> I propose to add a getOtherState method in RuntimeContext and
> DefaultKeyedStateStore which directly takes StateDescriptor as parameter to
> allow user to get customized state.
>
> What do you think?
>
> Thanks.
>
> Best,
> Shimin
>

Re: Checkpointing clarification

2019-09-08 Thread Dominik Wosiński

Okay, thanks for clarifying. I have some followup question here. If we
consider Kafka offsets commits, this basically means that
the offsets committed during the checkpoint are not necessarily the
offsets that were really processed by the pipeline and written to sink ? I
mean If there is a window in the pipeline, then the records are saved in
the window state if the window was not emitted yet, but they are considered
as processed, thus will not be replayed in case of restart, because Flink
considers them as processed when committing offsets. Am I correct ?

Best,
Dom.

确定取消

2019-09-08 Thread Lynn Chen

确定取消

Re: [VOTE] FLIP-58: Flink Python User-Defined Function for Table API

2019-09-08 Thread jincheng sun

Hi all,

This VOTE looks like everyone agrees with the current FLIP.

Hi Time & Aljoscha Do you have any other comments after the ML discussion?
[1]

Hi Dian, Could you announce the VOTE result and create a JIRA. for the FLIP
today later, if there no other feedback?

Cheers,
Jincheng

[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-User-Defined-Function-for-Table-API-td31673i20.html#a32669

Becket Qin  于2019年9月2日周一 下午1:32写道：

> +1
>
> It is extremely useful for ML users.
>
> On Mon, Sep 2, 2019 at 9:46 AM Shaoxuan Wang  wrote:
>
> > +1 (binding)
> >
> > This will be a great feature for Flink users, especially for the data
> > science and AI engineers.
> >
> > Regards,
> > Shaoxuan
> >
> >
> > On Fri, Aug 30, 2019 at 1:35 PM Jeff Zhang  wrote:
> >
> > > +1， very looking forward this feature in flink 1.10
> > >
> > >
> > > Yu Li  于2019年8月30日周五 上午11:08写道：
> > >
> > > > +1 (non-binding)
> > > >
> > > > Thanks for driving this!
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > >
> > > > On Fri, 30 Aug 2019 at 11:01, Terry Wang  wrote:
> > > >
> > > > > +1. That would be very helpful.
> > > > > Best,
> > > > > Terry Wang
> > > > >
> > > > >
> > > > >
> > > > > > 在 2019年8月30日，上午10:18，Jark Wu  写道：
> > > > > >
> > > > > > +1
> > > > > >
> > > > > > Thanks for the great work!
> > > > > >
> > > > > > On Fri, 30 Aug 2019 at 10:04, Xingbo Huang 
> > > wrote:
> > > > > >
> > > > > >> Hi Dian,
> > > > > >>
> > > > > >> +1,
> > > > > >> Thanks a lot for driving this.
> > > > > >>
> > > > > >> Best,
> > > > > >> Xingbo
> > > > > >>> 在 2019年8月30日，上午9:39，Wei Zhong  写道：
> > > > > >>>
> > > > > >>> Hi Dian,
> > > > > >>>
> > > > > >>> +1 non-binding
> > > > > >>> Thanks for driving this!
> > > > > >>>
> > > > > >>> Best, Wei
> > > > > >>>
> > > > >  在 2019年8月29日，09:25，Hequn Cheng  写道：
> > > > > 
> > > > >  Hi Dian,
> > > > > 
> > > > >  +1
> > > > >  Thanks a lot for driving this.
> > > > > 
> > > > >  Best, Hequn
> > > > > 
> > > > >  On Wed, Aug 28, 2019 at 2:01 PM jincheng sun <
> > > > > sunjincheng...@gmail.com>
> > > > >  wrote:
> > > > > 
> > > > > > Hi Dian,
> > > > > >
> > > > > > +1, Thanks for your great job!
> > > > > >
> > > > > > Best,
> > > > > > Jincheng
> > > > > >
> > > > > > Dian Fu  于2019年8月28日周三 上午11:04写道：
> > > > > >
> > > > > >> Hi all,
> > > > > >>
> > > > > >> I'd like to start a voting thread for FLIP-58 [1] since that
> > we
> > > > have
> > > > > >> reached an agreement on the design in the discussion thread
> > [2],
> > > > > >>
> > > > > >> This vote will be open for at least 72 hours. Unless there
> is
> > an
> > > > > >> objection, I will try to close it by Sept 2, 2019 00:00 UTC
> if
> > > we
> > > > > have
> > > > > >> received sufficient votes.
> > > > > >>
> > > > > >> PS: This doesn't mean that we cannot further improve the
> > design.
> > > > We
> > > > > >> can
> > > > > >> still discuss the implementation details case by case in the
> > > JIRA
> > > > as
> > > > > >> long
> > > > > >> as it doesn't affect the overall design.
> > > > > >>
> > > > > >> [1]
> > > > > >>
> > > > > >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58%3A+Flink+Python+User-Defined+Function+for+Table+API
> > > > > >> <
> > > > > >>
> > > > > >
> > > > > >>
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-58:+Flink+Python+User-Defined+Function+for+Table+API
> > > > > >>>
> > > > > >> [2]
> > > > > >>
> > > > > >
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-User-Defined-Function-for-Table-API-td31673.html
> > > > > >> <
> > > > > >>
> > > > > >
> > > > > >>
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Flink-Python-User-Defined-Function-for-Table-API-td31673.html
> > > > > >>>
> > > > > >>
> > > > > >> Thanks,
> > > > > >> Dian
> > > > > >
> > > > > >>>
> > > > > >>
> > > > > >>
> > > > >
> > > > >
> > > >
> > >
> > >
> > > --
> > > Best Regards
> > >
> > > Jeff Zhang
> > >
> >
>

Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

2019-09-08 Thread Becket Qin

Congrats, Kostas!

On Sun, Sep 8, 2019 at 11:48 PM myasuka  wrote:

> Congratulations Kostas!
>
> Best
> Yun Tang
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>

Re: Checkpointing clarification

2019-09-08 Thread Paul Lam

Hi Dom,

There are sync phase and async phase in checkpointing. When a operator receives 
a barrier, it performs snapshot aka the sync phase. And when the barriers pass 
through all the operators including sinks, the operators will get a 
notification, after which they do the async part, like committing the Kafka 
offsets. WRT your question, the offsets would only be committed when the whole 
checkpoint is successfully finished. For more information, you can refer to 
this post[1].

[1] 
https://flink.apache.org/features/2018/03/01/end-to-end-exactly-once-apache-flink.html
 


Best,
Paul Lam

> 在 2019年9月9日，07:06，Dominik Wosiński  写道：
> 
> Okay, thanks for clarifying. I have some followup question here. If we
> consider Kafka offsets commits, this basically means that
> the offsets committed during the checkpoint are not necessarily the
> offsets that were really processed by the pipeline and written to sink ? I
> mean If there is a window in the pipeline, then the records are saved in
> the window state if the window was not emitted yet, but they are considered
> as processed, thus will not be replayed in case of restart, because Flink
> considers them as processed when committing offsets. Am I correct ?
> 
> Best,
> Dom.

Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

2019-09-08 Thread Yun Gao

  Congratulations, Kostas!

 Best,
 Yun‍‍‍




--
From:Becket Qin 
Send Time:2019 Sep. 9 (Mon.) 10:47
To:dev 
Subject:Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

Congrats, Kostas!

On Sun, Sep 8, 2019 at 11:48 PM myasuka  wrote:

> Congratulations Kostas!
>
> Best
> Yun Tang
>
>
>
> --
> Sent from: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/
>

[jira] [Created] (FLINK-14005) Support Hive version 2.2.0

2019-09-08 Thread Xuefu Zhang (Jira)

Xuefu Zhang created FLINK-14005:
---

 Summary: Support Hive version 2.2.0
 Key: FLINK-14005
 URL: https://issues.apache.org/jira/browse/FLINK-14005
 Project: Flink
  Issue Type: Improvement
  Components: Connectors / Hive
Reporter: Xuefu Zhang
Assignee: Xuefu Zhang
 Fix For: 1.10.0


Including 2.0.0 and 2.0.1.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [DISCUSS] Features for Apache Flink 1.10

2019-09-08 Thread Xuefu Z

Looking at feature list, I don't see an item for complete the data type
support. Specifically, high precision timestamp is needed to Hive
integration, as it's so common. Missing it would damage the completeness of
our Hive effort.

Thanks,
Xuefu

On Sat, Sep 7, 2019 at 7:06 PM Xintong Song  wrote:

> Thanks Gray and Yu for compiling the feature list and kicking off this
> discussion.
>
> +1 for Gary and Yu being the release managers for Flink 1.10.
>
> Thank you~
>
> Xintong Song
>
>
>
> On Sat, Sep 7, 2019 at 4:58 PM Till Rohrmann  wrote:
>
> > Thanks for compiling the list of 1.10 efforts for the community Gary. I
> > think this helps a lot to better understand what the community is
> currently
> > working on.
> >
> > Thanks for volunteering as the release managers for the next major
> > release. +1 for Gary and Yu being the RMs for Flink 1.10.
> >
> > Cheers,
> > Till
> >
> > On Sat, Sep 7, 2019 at 7:26 AM Zhu Zhu  wrote:
> >
> > > Thanks Gary for kicking off this discussion.
> > > Really appreciate that you and Yu offer to help to manage 1.10 release.
> > >
> > > +1 for Gary and Yu as release managers.
> > >
> > > Thanks,
> > > Zhu Zhu
> > >
> > > Dian Fu  于2019年9月7日周六 下午12:26写道：
> > >
> > > > Hi Gary,
> > > >
> > > > Thanks for kicking off the release schedule of 1.10. +1 for you and
> Yu
> > Li
> > > > as the release manager.
> > > >
> > > > The feature freeze/release time sounds reasonable.
> > > >
> > > > Thanks,
> > > > Dian
> > > >
> > > > > 在 2019年9月7日，上午11:30，Jark Wu  写道：
> > > > >
> > > > > Thanks Gary for kicking off the discussion for 1.10 release.
> > > > >
> > > > > +1 for Gary and Yu as release managers. Thank you for you effort.
> > > > >
> > > > > Best,
> > > > > Jark
> > > > >
> > > > >
> > > > >> 在 2019年9月7日，00:52，zhijiang 
> 写道：
> > > > >>
> > > > >> Hi Gary,
> > > > >>
> > > > >> Thanks for kicking off the features for next release 1.10.  I am
> > very
> > > > supportive of you and Yu Li to be the relaese managers.
> > > > >>
> > > > >> Just mention another two improvements which want to be covered in
> > > > FLINK-1.10 and I already confirmed with Piotr to reach an agreement
> > > before.
> > > > >>
> > > > >> 1. Data serialize and copy only once for broadcast partition [1]:
> It
> > > > would improve the throughput performance greatly in broadcast mode
> and
> > > was
> > > > actually proposed in Flink-1.8. Most of works already done before and
> > > only
> > > > left the last critical jira/PR. It will not take much efforts to make
> > it
> > > > ready.
> > > > >>
> > > > >> 2. Let Netty use Flink's buffers directly in credit-based mode
> [2] :
> > > It
> > > > could avoid memory copy from netty stack to flink managed network
> > buffer.
> > > > The obvious benefit is decreasing the direct memory overhead greatly
> in
> > > > large-scale jobs. I also heard of some user cases encounter direct
> OOM
> > > > caused by netty memory overhead. Actually this improvment was
> proposed
> > by
> > > > nico in FLINK-1.7 and always no time to focus then. Yun Gao already
> > > > submitted a PR half an year ago but have not been reviewed yet. I
> could
> > > > help review the deign and PR codes to make it ready.
> > > > >>
> > > > >> And you could make these two items as lowest priority if possible.
> > > > >>
> > > > >> [1] https://issues.apache.org/jira/browse/FLINK-10745
> > > > >> [2] https://issues.apache.org/jira/browse/FLINK-10742
> > > > >>
> > > > >> Best,
> > > > >> Zhijiang
> > > > >> --
> > > > >> From:Gary Yao 
> > > > >> Send Time:2019年9月6日(星期五) 17:06
> > > > >> To:dev 
> > > > >> Cc:carp84 
> > > > >> Subject:[DISCUSS] Features for Apache Flink 1.10
> > > > >>
> > > > >> Hi community,
> > > > >>
> > > > >> Since Apache Flink 1.9.0 has been released more than 2 weeks ago,
> I
> > > > want to
> > > > >> start kicking off the discussion about what we want to achieve for
> > the
> > > > 1.10
> > > > >> release.
> > > > >>
> > > > >> Based on discussions with various people as well as observations
> > from
> > > > >> mailing
> > > > >> list threads, Yu Li and I have compiled a list of features that we
> > > deem
> > > > >> important to be included in the next release. Note that the
> features
> > > > >> presented
> > > > >> here are not meant to be exhaustive. As always, I am sure that
> there
> > > > will be
> > > > >> other contributions that will make it into the next release. This
> > > email
> > > > >> thread
> > > > >> is merely to kick off a discussion, and to give users and
> > contributors
> > > > an
> > > > >> understanding where the focus of the next release lies. If there
> is
> > > > anything
> > > > >> we have missed that somebody is working on, please reply to this
> > > thread.
> > > > >>
> > > > >>
> > > > >> ** Proposed features and focus
> > > > >>
> > > > >> Following the contribution of Blink to Apache Flink, the community
> > > > released
> > > > >> a
> > > > >> preview of the Blink SQL Query Process

[jira] [Created] (FLINK-14006) Add doc for how to using Java UDFs in Python API

2019-09-08 Thread sunjincheng (Jira)

sunjincheng created FLINK-14006:
---

 Summary: Add doc for how to using Java UDFs in Python API
 Key: FLINK-14006
 URL: https://issues.apache.org/jira/browse/FLINK-14006
 Project: Flink
  Issue Type: Improvement
  Components: API / Python, Documentation
Reporter: sunjincheng
 Fix For: 1.10.0


Currently, user can not find out the doc for  how to using Java UDFs in Python 
API. So we should add the detail doc.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

[jira] [Created] (FLINK-14007) Add doc for how to using Java user-defined source/sink in Python API

2019-09-08 Thread sunjincheng (Jira)

sunjincheng created FLINK-14007:
---

 Summary: Add doc for how to using Java user-defined source/sink in 
Python API
 Key: FLINK-14007
 URL: https://issues.apache.org/jira/browse/FLINK-14007
 Project: Flink
  Issue Type: Improvement
  Components: API / Python, Documentation
Reporter: sunjincheng
 Fix For: 1.10.0


Currently, user can not find out the doc for how to using Java user-defined 
source/sink in Python API. So we should add the detail doc.

 



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

2019-09-08 Thread Bowen Li

Hi,

W.r.t temp functions, I feel both options have their benefits and can
theoretically achieve similar functionalities one way or another. In the
end, it's more about use cases, users habits, and trade-offs.

Re> Not always users are in full control of the catalog functions. There is
also the case where different teams manage the catalog & use the catalog.

Temp functions live within a session, and not within a catalog. Having
3-part paths may implies temp functions are tied to a catalog in two
aspects.
1) it may indicate each catalog manages their temp functions, which is not
true as we seem all agree they should reside at a central place, either in
FunctionCatalog or CatalogManager
2) it may indicate there's some access control. When users are forbidden to
manipulate some objects in the catalog that's managed by other teams, but
are allowed to manipulate some other objects (temp functions in this case)
belonging to the catalog in namespaces, users may think we introduced extra
complexity and confusion with some kind of access control into the problem.
It doesn't feel intuitive enough for end users.

Thus, I'd be in favor of 1-part path for temporary functions, and other
temp objects.

Thanks,
Bowen



On Fri, Sep 6, 2019 at 2:16 AM Dawid Wysakowicz 
wrote:

> I agree the consequences of the decision are substantial. Let's see what
> others think.
>
> -- Catalog functions are defined by users, and we suppose they can
> drop/alter it in any way they want. Thus, overwriting a catalog function
> doesn't seem to be a strong use case that we should be concerned about.
> Rather, there are known use case for overwriting built-in functions.
>
> Not always users are in full control of the catalog functions. There is
> also the case where different teams manage the catalog & use the catalog.
> As for overriding built-in functions with 3-part approach user can always
> use an equally named function from a catalog. E.g. to override
>
> *SELECT explode(arr) FROM ...*
>
> user can always write:
>
> *SELECT db.explode(arr) FROM ...*
>
> Best,
>
> Dawid
> On 06/09/2019 10:54, Xuefu Z wrote:
>
> Hi Dawid,
>
> Thank you for your summary. While the only difference in the two proposals
> is one- or three-part in naming, the consequence would be substantial.
>
> To me, there are two major use cases of temporary functions compared to
> persistent ones:
> 1. Temporary in nature and auto managed by the session. More often than
> not, admin doesn't even allow user to create persistent functions.
> 2. Provide an opportunity to overwriting system built-in functions.
>
> Since built-in functions has one-part name, requiring three-part name for
> temporary functions eliminates the overwriting opportunity.
>
> One-part naming essentially puts all temp functions under a single
> namespace and simplifies function resolution, such as we don't need to
> consider the case of a temp function and a persistent function with the
> same name under the same database.
>
> I agree having three-parts does have its merits, such as consistency with
> other temporary objects (table) and minor difference between temp vs
> catalog functions. However, there is a slight difference between tables and
> function in that there is no built-in table in SQL so there is no need to
> overwrite it.
>
> I'm not sure if I fully agree the benefits you listed as the advantages of
> the three-part naming of temp functions.
>   -- Allowing overwriting built-in functions is a benefit and the solution
> for disallowing certain overwriting shouldn't be totally banning it.
>   -- Catalog functions are defined by users, and we suppose they can
> drop/alter it in any way they want. Thus, overwriting a catalog function
> doesn't seem to be a strong use case that we should be concerned about.
> Rather, there are known use case for overwriting built-in functions.
>
> Thus, personally I would prefer one-part name for temporary functions. In
> lack of SQL standard on this, I certainly like to get opinions from others
> to see if a consensus can be eventually reached.
>
> (To your point on modular approach to support external built-in functions,
> we saw the value and are actively looking into it. Thanks for sharing your
> opinion on that.)
>
> Thanks,
> Xuefu
>
> On Fri, Sep 6, 2019 at 3:48 PM Dawid Wysakowicz  
> 
> wrote:
>
>
> Hi Xuefu,
>
> Thank you for your answers.
>
> Let me summarize my understanding. In principle we differ only in regards
> to the fact if a temporary function can be only 1-part or only 3-part
> identified. I can reconfirm that if the community decides it prefers the
> 1-part approach I will commit to that, with the assumption that we will
> force ONLY 1-part function names. (We will parse identifier and throw
> exception if a user tries to register e.g. db.temp_func).
>
> My preference is though the 3-part approach:
>
>- there are some functions that it makes no sense to override, e.g.
>CAST, moreover I'm afraid that allowing overriding

Re: [DISCUSS] Support notifyOnMaster for notifyCheckpointComplete

2019-09-08 Thread JingsongLee

Thanks jark and dian:
1.jark's approach: do the work in task-0. Simple way.
2.dian's approach: use StreamingRuntimeContext#getGlobalAggregateManager Can do 
more operation. But these accumulators are not fault-tolerant?

Best,
Jingsong Lee


--
From:shimin yang 
Send Time:2019年9月6日(星期五) 15:21
To:dev 
Subject:Re: [DISCUSS] Support notifyOnMaster for notifyCheckpointComplete

Hi Fu,

That'll be nice.

Thanks.

Best,
Shimin

Dian Fu  于2019年9月6日周五 下午3:17写道：

> Hi Shimin,
>
> It can be guaranteed to be an atomic operation. This is ensured by the RPC
> framework. You could take a look at RpcEndpoint for more details.
>
> Regards,
> Dian
>
> > 在 2019年9月6日，下午2:35，shimin yang  写道：
> >
> > Hi Fu,
> >
> > Thank you for the remind. I think it would work in my case as long as
> it's
> > an atomic operation.
> >
> > Dian Fu  于2019年9月6日周五 下午2:22写道：
> >
> >> Hi Jingsong,
> >>
> >> Thanks for bring up this discussion. You can try to look at the
> >> GlobalAggregateManager to see if it can meet your requirements. It can
> be
> >> got via StreamingRuntimeContext#getGlobalAggregateManager().
> >>
> >> Regards,
> >> Dian
> >>
> >>> 在 2019年9月6日，下午1:39，shimin yang  写道：
> >>>
> >>> Hi Jingsong,
> >>>
> >>> Big fan of this idea. We faced the same problem and resolved by adding
> a
> >>> distributed lock. It would be nice to have this feature in JobMaster,
> >> which
> >>> can replace the lock.
> >>>
> >>> Best,
> >>> Shimin
> >>>
> >>> JingsongLee  于2019年9月6日周五 下午12:20写道：
> >>>
>  Hi devs:
> 
>  I try to implement streaming file sink for table[1] like
> >> StreamingFileSink.
>  If the underlying is a HiveFormat, or a format that updates visibility
>  through a metaStore, I have to update the metaStore in the
>  notifyCheckpointComplete, but this operation occurs on the task side,
>  which will lead to distributed access to the metaStore, which will
>  lead to bottleneck.
> 
>  So I'm curious if we can support notifyOnMaster for
>  notifyCheckpointComplete like FinalizeOnMaster.
> 
>  What do you think?
> 
>  [1]
> 
> >>
> https://docs.google.com/document/d/15R3vZ1R_pAHcvJkRx_CWleXgl08WL3k_ZpnWSdzP7GY/edit?usp=sharing
> 
>  Best,
>  Jingsong Lee
> >>
> >>
>
>

Re: [DISCUSS] Features for Apache Flink 1.10

2019-09-08 Thread Yu Li

Hi Xuefu,

If I understand it correctly, the data type support work should be included
in the "Table API improvements->Finish type system" part, please check it
and let us know if anything missing there. Thanks.

Best Regards,
Yu


On Mon, 9 Sep 2019 at 11:14, Xuefu Z  wrote:

> Looking at feature list, I don't see an item for complete the data type
> support. Specifically, high precision timestamp is needed to Hive
> integration, as it's so common. Missing it would damage the completeness of
> our Hive effort.
>
> Thanks,
> Xuefu
>
> On Sat, Sep 7, 2019 at 7:06 PM Xintong Song  wrote:
>
> > Thanks Gray and Yu for compiling the feature list and kicking off this
> > discussion.
> >
> > +1 for Gary and Yu being the release managers for Flink 1.10.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Sat, Sep 7, 2019 at 4:58 PM Till Rohrmann 
> wrote:
> >
> > > Thanks for compiling the list of 1.10 efforts for the community Gary. I
> > > think this helps a lot to better understand what the community is
> > currently
> > > working on.
> > >
> > > Thanks for volunteering as the release managers for the next major
> > > release. +1 for Gary and Yu being the RMs for Flink 1.10.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Sat, Sep 7, 2019 at 7:26 AM Zhu Zhu  wrote:
> > >
> > > > Thanks Gary for kicking off this discussion.
> > > > Really appreciate that you and Yu offer to help to manage 1.10
> release.
> > > >
> > > > +1 for Gary and Yu as release managers.
> > > >
> > > > Thanks,
> > > > Zhu Zhu
> > > >
> > > > Dian Fu  于2019年9月7日周六 下午12:26写道：
> > > >
> > > > > Hi Gary,
> > > > >
> > > > > Thanks for kicking off the release schedule of 1.10. +1 for you and
> > Yu
> > > Li
> > > > > as the release manager.
> > > > >
> > > > > The feature freeze/release time sounds reasonable.
> > > > >
> > > > > Thanks,
> > > > > Dian
> > > > >
> > > > > > 在 2019年9月7日，上午11:30，Jark Wu  写道：
> > > > > >
> > > > > > Thanks Gary for kicking off the discussion for 1.10 release.
> > > > > >
> > > > > > +1 for Gary and Yu as release managers. Thank you for you effort.
> > > > > >
> > > > > > Best,
> > > > > > Jark
> > > > > >
> > > > > >
> > > > > >> 在 2019年9月7日，00:52，zhijiang 
> > 写道：
> > > > > >>
> > > > > >> Hi Gary,
> > > > > >>
> > > > > >> Thanks for kicking off the features for next release 1.10.  I am
> > > very
> > > > > supportive of you and Yu Li to be the relaese managers.
> > > > > >>
> > > > > >> Just mention another two improvements which want to be covered
> in
> > > > > FLINK-1.10 and I already confirmed with Piotr to reach an agreement
> > > > before.
> > > > > >>
> > > > > >> 1. Data serialize and copy only once for broadcast partition
> [1]:
> > It
> > > > > would improve the throughput performance greatly in broadcast mode
> > and
> > > > was
> > > > > actually proposed in Flink-1.8. Most of works already done before
> and
> > > > only
> > > > > left the last critical jira/PR. It will not take much efforts to
> make
> > > it
> > > > > ready.
> > > > > >>
> > > > > >> 2. Let Netty use Flink's buffers directly in credit-based mode
> > [2] :
> > > > It
> > > > > could avoid memory copy from netty stack to flink managed network
> > > buffer.
> > > > > The obvious benefit is decreasing the direct memory overhead
> greatly
> > in
> > > > > large-scale jobs. I also heard of some user cases encounter direct
> > OOM
> > > > > caused by netty memory overhead. Actually this improvment was
> > proposed
> > > by
> > > > > nico in FLINK-1.7 and always no time to focus then. Yun Gao already
> > > > > submitted a PR half an year ago but have not been reviewed yet. I
> > could
> > > > > help review the deign and PR codes to make it ready.
> > > > > >>
> > > > > >> And you could make these two items as lowest priority if
> possible.
> > > > > >>
> > > > > >> [1] https://issues.apache.org/jira/browse/FLINK-10745
> > > > > >> [2] https://issues.apache.org/jira/browse/FLINK-10742
> > > > > >>
> > > > > >> Best,
> > > > > >> Zhijiang
> > > > > >>
> --
> > > > > >> From:Gary Yao 
> > > > > >> Send Time:2019年9月6日(星期五) 17:06
> > > > > >> To:dev 
> > > > > >> Cc:carp84 
> > > > > >> Subject:[DISCUSS] Features for Apache Flink 1.10
> > > > > >>
> > > > > >> Hi community,
> > > > > >>
> > > > > >> Since Apache Flink 1.9.0 has been released more than 2 weeks
> ago,
> > I
> > > > > want to
> > > > > >> start kicking off the discussion about what we want to achieve
> for
> > > the
> > > > > 1.10
> > > > > >> release.
> > > > > >>
> > > > > >> Based on discussions with various people as well as observations
> > > from
> > > > > >> mailing
> > > > > >> list threads, Yu Li and I have compiled a list of features that
> we
> > > > deem
> > > > > >> important to be included in the next release. Note that the
> > features
> > > > > >> presented
> > > > > >> here are not meant to be exhaustive. As always, I am sure that
> > there
> > > > > will be
> > > > > >> other contributions that

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

2019-09-08 Thread JingsongLee

Hi dawid:

It is difficult to describe specific examples.
Sometimes users will generate some java converters through some
Java code, or generate some Java classes through third-party
libraries. Of course, these can be best done through properties.
But this requires additional work from users.My suggestion is to
keep this Java instance class way that is user-friendly.

Best,
Jingsong Lee

--
From:Dawid Wysakowicz
Send Time:2019年9月6日(星期五) 16:21
To:dev
Subject:Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Hi all,
@Jingsong Could you elaborate a bit more what do you mean by
"some Connectors are difficult to convert all states to properties"
All the Flink provided connectors will definitely be expressible with
properties (In the end you should be able to use them from DDL). I think if a
TableSource is complex enough that it handles filter push down, partition
support etc. should rather be made available both from DDL & java/scala code.
I'm happy to reconsider adding registerTemporaryTable(String path, TableSource
source) if you have some concrete examples in mind.

@Xuefu: We also considered the ObjectIdentifier (or actually introducing a new
identifier representation to differentiate between resolved and unresolved
identifiers) with the same concerns. We decided to suggest the string & parsing
logic because of usability.
tEnv.from("cat.db.table")
is shorter and easier to write than
tEnv.from(Identifier.for("cat", "db", "name")
And also implicitly solves the problem what happens if a user (e.g. used to
other systems) uses that API in a following manner:
tEnv.from(Identifier.for("db.name")
I'm happy to revisit it if the general consensus is that it's better to use the
OO aproach.
Best,
Dawid

On 06/09/2019 10:00, Xuefu Z wrote:

Thanks to Dawid for starting the discussion and writeup. It looks pretty
good to me except that I'm a little concerned about the object reference
and string parsing in the code, which seems to an anti-pattern to OOP. Have
we considered using ObjectIdenitifier with optional catalog and db parts,
esp. if we are worried about arguments of variable length or method
overloading? It's quite likely that the result of string parsing is an
ObjectIdentifier instance any way.

Having string parsing logic in the code is a little dangerous as it
duplicates part of the DDL/DML parsing, and they can easily get out of sync.

Thanks,
Xuefu

On Fri, Sep 6, 2019 at 1:57 PM JingsongLee
wrote:

Thanks dawid, +1 for this approach.

One concern is the removal of registerTableSink & registerTableSource
in TableEnvironment. It has two alternatives:
1.the properties approach (DDL, descriptor).
2.from/toDataStream.

#1 can only be properties, not java states, and some Connectors
are difficult to convert all states to properties.
#2 can contain java state. But can't use TableSource-related features,
like project & filter push down, partition support, etc..

Any idea about this?

Best,
Jingsong Lee

--
From:Dawid Wysakowicz
Send Time:2019年9月4日(星期三) 22:20
To:dev
Subject:[DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Hi all,
As part of FLIP-30 a Catalog API was introduced that enables storing table
meta objects permanently. At the same time the majority of current APIs
create temporary objects that cannot be serialized. We should clarify the
creation of meta objects (tables, views, functions) in a unified way.
Another current problem in the API is that all the temporary objects are
stored in a special built-in catalog, which is not very intuitive for many
users, as they must be aware of that catalog to reference temporary objects.
Lastly, different APIs have different ways of providing object paths:

String path…,
String path, String pathContinued…
String name
We should choose one approach and unify it across all APIs.
I suggest a FLIP to address the above issues.
Looking forward to your opinions.
FLIP link:
https://cwiki.apache.org/confluence/display/FLINK/FLIP-64%3A+Support+for+Temporary+Objects+in+Table+module

Re: [VOTE] Release 1.8.2, release candidate #1

2019-09-08 Thread jincheng sun

+1 (binding)

- checked signatures [SUCCESS]
- built from source without tests [SUCCESS]
- ran some tests in IDE [SUCCESS]
- start local cluster and submit word count example [SUCCESS]
- announcement PR for website looks good! (I have left a few comments)

Best,
Jincheng

Jark Wu  于2019年9月6日周五 下午8:47写道：

>  Hi everyone,
>
> Please review and vote on the release candidate #1 for the version 1.8.2,
> as follows:
> [ ] +1, Approve the release
> [ ] -1, Do not approve the release (please provide specific comments)
>
>
> The complete staging area is available for your review, which includes:
> * JIRA release notes [1],
> * the official Apache source release and binary convenience releases to be
> deployed to dist.apache.org [2], which are signed with the key with
> fingerprint E2C45417BED5C104154F341085BACB5AEFAE3202 [3],
> * all artifacts to be deployed to the Maven Central Repository [4],
> * source code tag "release-1.8.2-rc1" [5],
> * website pull request listing the new release and adding announcement blog
> post [6].
>
> The vote will be open for at least 72 hours.
> Please cast your votes before *Sep. 11th 2019, 13:00 UTC*.
>
> It is adopted by majority approval, with at least 3 PMC affirmative votes.
>
> Thanks,
> Jark
>
> [1]
>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12315522&version=12345670
> [2] https://dist.apache.org/repos/dist/dev/flink/flink-1.8.2-rc1/
> [3] https://dist.apache.org/repos/dist/release/flink/KEYS
> [4] https://repository.apache.org/content/repositories/orgapacheflink-1262
> [5]
>
> https://github.com/apache/flink/commit/6322618bb0f1b7942d86cb1b2b7bc55290d9e330
> [6] https://github.com/apache/flink-web/pull/262
>

Re: [DISCUSS] Support customize state in customized KeyedStateBackend

2019-09-08 Thread shimin yang

Hi Yu,

For the first question, I would say yes. I was talking about managed
states, to be more specific, it's managed keyed states. And the reason why
we need the framework to manage life cycle is that we need checkpoint to
guarantee exact once semantic in our customized keyed state backend.

For the second question, I am quite agree with your proposal.

Finally, I would be glad to provide documentation if needed.

Best,
Shimin

Yun Tang  于2019年9月9日周一 上午2:46写道：

> Hi all
>
> First of all, I agreed with Yu that we should support to make state type
> pluginable.
>
> If we take a look at current Flink implementation. Users could implement
> their pluginable state backend to satisfy their own meets now. However,
> even users could define their own state descriptor, they cannot store the
> customized state within their state backend. The root cause behind this is
> that current StateBackendFactory could accept user defined state backend
> factory while StateFactory (within HeapKeyedStateBackend [1] and
> RocksDBKeyedStateBackend [2] ) cannot.
>
> If we agreed that we should leave the right of implementing customized
> state backend to users, it's naturally to agree that we should also leave
> the right of implementing customized states to users.
>
> [1]
> https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java#L79
> [2]
> https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L114
>
>
> Best
> Yun Tang
>
>
> 
> From: Yu Li 
> Sent: Monday, September 9, 2019 2:24
> To: dev 
> Subject: Re: [DISCUSS] Support customize state in customized
> KeyedStateBackend
>
> Hi Shimin,
>
> Thanks for bring this discussion up.
>
> First of all, I'd like to confirm/clarify that this discussion is mainly
> about managed state with customized state descriptor rather than raw state,
> right? Asking because raw state was the very first thing came to my mind
> when seeing the title.
>
> And this is actually the first topic/question we need to discuss, that
> whether we should support user-defined state descriptor and still ask
> framework to manage the state life cycle. Personally I'm +1 on this since
> the "official" state (data-structure) types (currently mainly value, list
> and map) may not be optimized for customer case, but we'd better ask
> others' opinion.
>
> Secondly, if the result of the first question is "Yes", then it's truly a
> problem that "Although we can implement customized StateDescriptors for
> different kind of data structures, we do not really have access to such
> customized state in RichFunctions", and how to resolve it is the second
> topic/question to discuss.
>
> I've noticed your proposal of exposing "getParitionedState" method out in
> "RuntimeContext" and "KeyedStateStore" in JIRA (FLINK-14003), but IMO
> adding a specific interface like below is better than exposing the internal
> one:
>  State getCustomizedState(StateDescriptor
> stateProperties);
>
> Finally, I think this is a user-facing and definitely worthwhile
> discussion, and requires a FLIP to document the conclusion and
> design/implementation (if any) down. What's your opinion?
>
> Thanks.
>
> Best Regards,
> Yu
>
>
> On Fri, 6 Sep 2019 at 13:27, shimin yang  wrote:
>
> > Hi every,
> >
> > I would like to start a discussion on supporting customize state
> > in customized KeyedStateBackend.
> >
> > In Flink, users can customize KeyedStateBackend to support different type
> > of data store. Although we can implement customized StateDescriptors for
> > different kind of data structrues, we do not really have access to such
> > customized state in RichFunctions.
> >
> > I propose to add a getOtherState method in RuntimeContext and
> > DefaultKeyedStateStore which directly takes StateDescriptor as parameter
> to
> > allow user to get customized state.
> >
> > What do you think?
> >
> > Thanks.
> >
> > Best,
> > Shimin
> >
>

Re: [DISCUSS] Support customize state in customized KeyedStateBackend

2019-09-08 Thread shimin yang

Hi Tang,

Actually in my case we implement a totally different KeyedStateBackend and
its' factory based on data store other than Heap or RocksDB.

Also for state factory of heap and rocksdb, you've made a quite good point
and I agree with you opinion.

Best,
Shimin

shimin yang  于2019年9月9日周一 下午2:31写道：

> Hi Yu,
>
> For the first question, I would say yes. I was talking about managed
> states, to be more specific, it's managed keyed states. And the reason why
> we need the framework to manage life cycle is that we need checkpoint to
> guarantee exact once semantic in our customized keyed state backend.
>
> For the second question, I am quite agree with your proposal.
>
> Finally, I would be glad to provide documentation if needed.
>
> Best,
> Shimin
>
> Yun Tang  于2019年9月9日周一 上午2:46写道：
>
>> Hi all
>>
>> First of all, I agreed with Yu that we should support to make state type
>> pluginable.
>>
>> If we take a look at current Flink implementation. Users could implement
>> their pluginable state backend to satisfy their own meets now. However,
>> even users could define their own state descriptor, they cannot store the
>> customized state within their state backend. The root cause behind this is
>> that current StateBackendFactory could accept user defined state backend
>> factory while StateFactory (within HeapKeyedStateBackend [1] and
>> RocksDBKeyedStateBackend [2] ) cannot.
>>
>> If we agreed that we should leave the right of implementing customized
>> state backend to users, it's naturally to agree that we should also leave
>> the right of implementing customized states to users.
>>
>> [1]
>> https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-runtime/src/main/java/org/apache/flink/runtime/state/heap/HeapKeyedStateBackend.java#L79
>> [2]
>> https://github.com/apache/flink/blob/576228651382db040aaa006cf9142f6568930cb1/flink-state-backends/flink-statebackend-rocksdb/src/main/java/org/apache/flink/contrib/streaming/state/RocksDBKeyedStateBackend.java#L114
>>
>>
>> Best
>> Yun Tang
>>
>>
>> 
>> From: Yu Li 
>> Sent: Monday, September 9, 2019 2:24
>> To: dev 
>> Subject: Re: [DISCUSS] Support customize state in customized
>> KeyedStateBackend
>>
>> Hi Shimin,
>>
>> Thanks for bring this discussion up.
>>
>> First of all, I'd like to confirm/clarify that this discussion is mainly
>> about managed state with customized state descriptor rather than raw
>> state,
>> right? Asking because raw state was the very first thing came to my mind
>> when seeing the title.
>>
>> And this is actually the first topic/question we need to discuss, that
>> whether we should support user-defined state descriptor and still ask
>> framework to manage the state life cycle. Personally I'm +1 on this since
>> the "official" state (data-structure) types (currently mainly value, list
>> and map) may not be optimized for customer case, but we'd better ask
>> others' opinion.
>>
>> Secondly, if the result of the first question is "Yes", then it's truly a
>> problem that "Although we can implement customized StateDescriptors for
>> different kind of data structures, we do not really have access to such
>> customized state in RichFunctions", and how to resolve it is the second
>> topic/question to discuss.
>>
>> I've noticed your proposal of exposing "getParitionedState" method out in
>> "RuntimeContext" and "KeyedStateStore" in JIRA (FLINK-14003), but IMO
>> adding a specific interface like below is better than exposing the
>> internal
>> one:
>>  State getCustomizedState(StateDescriptor
>> stateProperties);
>>
>> Finally, I think this is a user-facing and definitely worthwhile
>> discussion, and requires a FLIP to document the conclusion and
>> design/implementation (if any) down. What's your opinion?
>>
>> Thanks.
>>
>> Best Regards,
>> Yu
>>
>>
>> On Fri, 6 Sep 2019 at 13:27, shimin yang  wrote:
>>
>> > Hi every,
>> >
>> > I would like to start a discussion on supporting customize state
>> > in customized KeyedStateBackend.
>> >
>> > In Flink, users can customize KeyedStateBackend to support different
>> type
>> > of data store. Although we can implement customized StateDescriptors for
>> > different kind of data structrues, we do not really have access to such
>> > customized state in RichFunctions.
>> >
>> > I propose to add a getOtherState method in RuntimeContext and
>> > DefaultKeyedStateStore which directly takes StateDescriptor as
>> parameter to
>> > allow user to get customized state.
>> >
>> > What do you think?
>> >
>> > Thanks.
>> >
>> > Best,
>> > Shimin
>> >
>>
>

[jira] [Created] (FLINK-14003) Add access to customized state other than built in state types

Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

[ANNOUNCE] Weekly Community Update 2019/33-36

[jira] [Created] (FLINK-14004) Define SourceReader interface to verify the integration with StreamOneInputProcessor

Re: [DISCUSS] Support customize state in customized KeyedStateBackend

Re: [DISCUSS] Support customize state in customized KeyedStateBackend

Re: Checkpointing clarification

确定取消

Re: [VOTE] FLIP-58: Flink Python User-Defined Function for Table API

Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

Re: Checkpointing clarification

Re: [ANNOUNCE] Kostas Kloudas joins the Flink PMC

[jira] [Created] (FLINK-14005) Support Hive version 2.2.0

Re: [DISCUSS] Features for Apache Flink 1.10

[jira] [Created] (FLINK-14006) Add doc for how to using Java UDFs in Python API

[jira] [Created] (FLINK-14007) Add doc for how to using Java user-defined source/sink in Python API

Re: [DISCUSS] FLIP-57 - Rework FunctionCatalog

Re: [DISCUSS] Support notifyOnMaster for notifyCheckpointComplete

Re: [DISCUSS] Features for Apache Flink 1.10

Re: [DISCUSS] FLIP-64: Support for Temporary Objects in Table module

Re: [VOTE] Release 1.8.2, release candidate #1

Re: [DISCUSS] Support customize state in customized KeyedStateBackend

Re: [DISCUSS] Support customize state in customized KeyedStateBackend

23 matches

Site Navigation

Mail list logo

Footer information