Hi all,
the FLIP looks awesome. However, I would like to discuss the changes to
the user-facing parts again. Some feedback:
1. DataViews: With the current non-annotation design for DataViews, we
cannot perform eager state declaration, right? At which point during
execution do we know which state is required by the function? We need to
instantiate the function first, right?
2. Serializability of functions: How do we ensure serializability of
functions for catalog persistence? In the Scala/Java API, we would like
to register classes instead of instances soon. This is the only way to
store a function properly in a catalog or we need some
serialization/deserialization logic in the function interfaces to
convert an instance to string properties.
3. TableEnvironment: What is the signature of `register_function(self,
name, function)`? Does it accept both a class and function? Like `class
Sum` and `def split()`? Could you add some examples for registering both
kinds of functions?
4. FunctionDefinition: Function definition is not a user-defined
function definition. It is the highest interface for both user-defined
and built-in functions. I'm not sure if getLanguage() should be part of
this interface or one-level down which would be `UserDefinedFunction`.
Built-in functions will never be implemented in a different language. In
any case, I would vote for removing the UNKNOWN language, because it
does not solve anything. Why should a user declare a function that the
runtime can not handle? I also find the term `JAVA` confusing for Scala
users. How about `FunctionLanguage.JVM` instead?
5. Function characteristics: In the current design, function classes do
not extend from any upper class. How can users declare characteristics
that are present in `FunctionDefinition` like determinism, requirements,
or soon also monotonism.
Thanks,
Timo
On 02.09.19 03:38, Shaoxuan Wang wrote:
Hi Jincheng, Fudian, and Aljoscha,
I am assuming the proposed python UDX can also be applied to Flink SQL.
Is this correct? If yes, I would suggest to title the FLIP as "Flink Python
User-Defined Function" or "Flink Python User-Defined Function for Table".
Regards,
Shaoxuan
On Wed, Aug 28, 2019 at 12:22 PM jincheng sun <sunjincheng...@gmail.com>
wrote:
Thanks for the feedback Bowen!
Great thanks for create the FLIP and bring up the VOTE Dian!
Best, Jincheng
Dian Fu <dian0511...@gmail.com> 于2019年8月28日周三 上午11:32写道:
Hi all,
I have started a voting thread [1]. Thanks a lot for your help during
creating the FLIP @Jincheng.
Hi Bowen,
Very appreciated for your comments. I have replied you in the design doc.
As it seems that the comments doesn't affect the overall design, I'll not
cancel the vote for now and we can continue the discussion in the design
doc.
[1]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html
<
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/VOTE-FLIP-58-Flink-Python-User-Defined-Function-for-Table-API-td32295.html
Regards,
Dian
在 2019年8月28日,上午11:05,Bowen Li <bowenl...@gmail.com> 写道:
Hi Jincheng and Dian,
Sorry for being late to the party. I took a glance at the proposal,
LGTM
in
general, and I left only a couple comments.
Thanks,
Bowen
On Mon, Aug 26, 2019 at 8:05 PM Dian Fu <dian0511...@gmail.com> wrote:
Hi Jincheng,
Thanks! It works.
Thanks,
Dian
在 2019年8月27日,上午10:55,jincheng sun <sunjincheng...@gmail.com> 写道:
Hi Dian, can you check if you have edit access? :)
Dian Fu <dian0511...@gmail.com> 于2019年8月26日周一 上午10:52写道:
Hi Jincheng,
Appreciated for the kind tips and offering of help. Definitely need
it!
Could you grant me write permission for confluence? My Id: Dian Fu
Thanks,
Dian
在 2019年8月26日,上午9:53,jincheng sun <sunjincheng...@gmail.com> 写道:
Thanks for your feedback Hequn & Dian.
Dian, I am glad to see that you want help to create the FLIP!
Everyone will have first time, and I am very willing to help you
complete
your first FLIP creation. Here some tips:
- First I'll give your account write permission for confluence.
- Before create the FLIP, please have look at the FLIP Template
[1],
(It's
better to know more about FLIP by reading [2])
- Create Flink Python UDFs related JIRAs after completing the VOTE
of
FLIP.(I think you also can bring up the VOTE thread, if you want! )
Any problems you encounter during this period,feel free to tell me
that
we
can solve them together. :)
Best,
Jincheng
[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP+Template
[2]
https://cwiki.apache.org/confluence/display/FLINK/Flink+Improvement+Proposals
Hequn Cheng <chenghe...@gmail.com> 于2019年8月23日周五 上午11:54写道:
+1 for starting the vote.
Thanks Jincheng a lot for the discussion.
Best, Hequn
On Fri, Aug 23, 2019 at 10:06 AM Dian Fu <dian0511...@gmail.com>
wrote:
Hi Jincheng,
+1 to start the FLIP create and VOTE on this feature. I'm willing
to
help
on the FLIP create if you don't mind. As I haven't created a FLIP
before,
it will be great if you could help on this. :)
Regards,
Dian
在 2019年8月22日,下午11:41,jincheng sun <sunjincheng...@gmail.com>
写道:
Hi all,
Thanks a lot for your feedback. If there are no more suggestions
and
comments, I think it's better to initiate a vote to create a
FLIP
for
Apache Flink Python UDFs.
What do you think?
Best, Jincheng
jincheng sun <sunjincheng...@gmail.com> 于2019年8月15日周四
上午12:54写道:
Hi Thomas,
Thanks for your confirmation and the very important reminder
about
bundle
processing.
I have had add the description about how to perform bundle
processing
from
the perspective of checkpoint and watermark. Feel free to leave
comments if
there are anything not describe clearly.
Best,
Jincheng
Dian Fu <dian0511...@gmail.com> 于2019年8月14日周三 上午10:08写道:
Hi Thomas,
Thanks a lot the suggestions.
Regarding to bundle processing, there is a section
"Checkpoint"[1]
in
the
design doc which talks about how to handle the checkpoint.
However, I think you are right that we should talk more about
it,
such
as
what's bundle processing, how it affects the checkpoint and
watermark,
how
to handle the checkpoint and watermark, etc.
[1]
https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
<
https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit#heading=h.urladt565yo3
Regards,
Dian
在 2019年8月14日,上午1:01,Thomas Weise <t...@apache.org> 写道:
Hi Jincheng,
Thanks for putting this together. The proposal is very
detailed,
thorough
and for me as a Beam Flink runner contributor easy to
understand
:)
One thing that you should probably detail more is the bundle
processing. It
is critically important for performance that multiple
elements
are
processed in a bundle. The default bundle size in the Flink
runner
is
1s or
1000 elements, whichever comes first. And for streaming, you
can
find
the
logic necessary to align the bundle processing with
watermarks
and
checkpointing here:
https://github.com/apache/beam/blob/release-2.14.0/runners/flink/src/main/java/org/apache/beam/runners/flink/translation/wrappers/streaming/ExecutableStageDoFnOperator.java
Thomas
On Tue, Aug 13, 2019 at 7:05 AM jincheng sun <
sunjincheng...@gmail.com>
wrote:
Hi all,
The Python Table API(without Python UDF support) has already
been
supported
and will be available in the coming release 1.9.
As Python UDF is very important for Python users, we'd like
to
start
the
discussion about the Python UDF support in the Python Table
API.
Aljoscha Krettek, Dian Fu and I have discussed offline and
have
drafted a
design doc[1]. It includes the following items:
- The user-defined function interfaces.
- The user-defined function execution architecture.
As mentioned by many guys in the previous discussion
thread[2],
a
portability framework was introduced in Apache Beam in
latest
releases. It
provides well-defined, language-neutral data structures and
protocols
for
language-neutral user-defined function execution. This
design
is
based
on
Beam's portability framework. We will introduce how to make
use
of
Beam's
portability framework for user-defined function execution:
data
transmission, state access, checkpoint, metrics, logging,
etc.
Considering that the design relies on Beam's portability
framework
for
Python user-defined function execution and not all the
contributors
in
Flink community are familiar with Beam's portability
framework,
we
have
done a prototype[3] for proof of concept and also ease of
understanding of
the design.
Welcome any feedback.
Best,
Jincheng
[1]
https://docs.google.com/document/d/1WpTyCXAQh8Jr2yWfz7MWCD2-lou05QaQFb810ZvTefY/edit?usp=sharing
[2]
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-38-Support-python-language-in-flink-TableAPI-td28061.html
[3] https://github.com/dianfu/flink/commits/udf_poc