Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Timo Walther Thu, 03 Jan 2019 08:14:42 -0800

Hi everyone,

I updated FLIP-28 according to the feedback that I received (online andoffline).

The biggest change is that a user now needs to add two dependencies (apiand planner) if a table program should be runnable in an IDE (asAljoscha suggested). This allows for a clear separation of API andplanner/runtime. It might even be possible to *not* expose Calcitethrough the API and thus have minimal external dependencies.

Furthermore, I renamed `flink-table-spi` back to `flink-table-common`because `spi` looks too similar to `api` and could cause confusion.Aljoscha and Stephan both mentioned that `common` would fit better inour current naming scheme.


I will open a PR for FLIP-28 step 1 shortly and looking forward to feedback.

Thanks,
Timo


Am 11.12.18 um 09:10 schrieb Timo Walther:

Hi Aljoscha,
thanks for your feedback. I also don't like the fact that an APIdepends on runtime. I will try to come up with a better design whileimplementing a PoC. The general goal should be to make table programsstill runnable in an IDE. So maybe there is a better way of doing it.
Regards,
Timo


Am 07.12.18 um 16:20 schrieb Aljoscha Krettek:
Hi,

this is a very nice effort!
There is one thing that we should change, though. In the batch API wehave a clear separation between API and runtime, and using the API(depending on flink-batch) does not "expose" the runtime classes thatare in flink-runtime. For the streaming API, we made the mistake ofletting flink-streaming depend on flink-runtime. This means thatdepending on flink-streaming pulls in flink-runtime transitively,which enlarges the surface that users see from Flink and (forexample) makes it harder to package a user fat jar (we have theexcludes/provided, whatnot).
We should avoid this error and have flink-table-api not depend onflink-table-runtime, but the other way round, as we have it for thebatch API.
Btw, another project that has gotten this separation very nicely isBeam, where there is an sdk package, that has all the user facing APIthat people use to create programs and they see nothing of therunner/runtime specifics. In this project it comes out of necessity,because there can be widely different runners, but we should stillstrive for this here.
Off topic: we also have to achieve this for the streaming API.

Best,
Aljoscha
On 29. Nov 2018, at 16:58, Timo Walther <twal...@apache.org> wrote:

Thanks for the feedback, everyone!
I created a FLIP for these efforts:https://cwiki.apache.org/confluence/display/FLINK/FLIP-28%3A+Long-term+goal+of+making+flink-table+Scala-free
I will open an umbrella Jira ticket for FLIP-28 with concretesubtasks shortly.
Thanks,
Timo

Am 29.11.18 um 12:44 schrieb Jark Wu:
Thanks Timo,
That makes sense to me. And I left the comment about codegeneration in doc.
Looking forward to participate in it!

Best,
Jark

On Thu, 29 Nov 2018 at 16:42, Timo Walther <twal...@apache.org> wrote:
@Kurt: Yes, I don't think that that forks of Flink will have ahard timeto keep up with the porting. That is also why I called this`long-term
goal` because I don't see big resources for the porting to happen
quicker. But at least new features, API, and runtime profit fromJava to
Scala conversion.

@Jark: I updated the document:

1. flink-table-common has been renamed to flink-table-spi by request.

2. Yes, good point. flink-sql-client can be moved there as well.
3. I added a paragraph to the document. Porting the codegeneration to
Java makes only sense if acceptable tooling for it is in place.


Thanks for the feedback,

Timo


Am 29.11.18 um 08:28 schrieb Jark Wu:
Hi Timo,

Thanks for the great work!
Moving flink-table to Java is a long-awaited things but willinvolve much
effort. Agree with that we should make it as a long-term goal.

I have read the google doc and +1 for the proposal. Here I have some
questions:
1. Where should the flink-table-common module place ? Will wemove the
flink-table-common classes to the new modules?
2. Should flink-sql-client also as a sub-module under flink-table ?
3. The flink-table-planner contains code generation and will beconvertedto Java. Actually, I prefer using Scala to code generate becauseof theMultiline-String and String-Interpolation (i.e. s"hello $user")features
in
Scala. It makes code of code-generation more readable. Do we really
want to migrate
code generation to Java?

Best,
Jark


On Wed, 28 Nov 2018 at 09:14, Kurt Young <ykt...@gmail.com> wrote:
Hi Timo and Vino,

I agree that table is very active and there is no guarantee for not
producing any conflicts if you decide
to develop based on community version. I think this part is therisk
what
we can imagine in the first place. But massively
language replacing is something you can not imagine and be readyfor,
there
is no feature added, no refactor is done, simply changing
from scala to java will cause lots of conflicts.

But I also agree that this is a "technical debt" that we should
eventually
pay, as you said, we can do this slowly, even one file each time,
let other people have more time to resolve the conflicts.

Best,
Kurt


On Tue, Nov 27, 2018 at 8:37 PM Timo Walther <twal...@apache.org>
wrote:
Hi Kurt,
I understand your concerns. However, there is no concreteroadmap for
Flink 2.0 and (as Vino said) the flink-table is developed very
actively.
Major refactorings happened in the past and will also happenwith orwithout Scala migration. A good example, is the proper catalogsupportwhich will refactor big parts of the TableEnvironment class. Ortheintroduction of "retractions" which needed a big refactoring ofthe
planning phase. Stability is only guaranteed for the API and the
general
behavior, however, currently flink-table is not using @Public or
@PublicEvolving annotations for a reason.
I think the migration will still happen slowly because it needspeoplethat allocate time for that. Therefore, even Flink forks canslowly
adapt to the evolving Scala-to-Java code base.

Regards,
Timo


Am 27.11.18 um 13:16 schrieb vino yang:
Hi Kurt,

Currently, there is still a long time to go from flink 2.0.
Considering
that the flink-table
is one of the most active modules in the current flinkproject, each
version has
a number of changes and features added. I think that refactoring
faster
will reduce subsequent
complexity and workload. And this may be a gradual and longprocess.
We
should be able to
regard it as a "technical debt", and if it does not changeit, it
will
also affect the decision-making of other issues.

Thanks, vino.

Kurt Young <ykt...@gmail.com> 于2018年11月27日周二 下午7:34写道：
Hi Timo,

Thanks for writing up the document. I'm +1 for reorganizing the
module
structure and make table scala free. But I have
a little concern abount the timing. Is it more appropriate toget
this
done
when Flink decide to bump to next big version, like 2.x.
It's true you can keep all the class's package path as it is,and
will
not
introduce API change. But if some company are developingtheir ownFlink, and sync with community version by rebasing, may facea lot ofconflicts. Although you can avoid conflicts by always movingsource
codes
between packages, but I assume you still need to delete theoriginal
scala
file and add a new java file when you want to change program
language.
Best,
Kurt
On Tue, Nov 27, 2018 at 5:57 PM Timo Walther<twal...@apache.org>
wrote:
Hi Hequn,
thanks for your feedback. Yes, migrating the test cases isanother
issue
that is not represented in the document but should naturally go
along
with the migration.
I agree that we should migrate the main API classes quicklywithin
this
1.8 release after the module split has been performed. Helphere is
highly appreciated!
I forgot that Java supports static methods in interfacesnow, but
actually I don't like the design of calling
`TableEnvironment.get(env)`.
Because people often use `TableEnvironment tEnd =
TableEnvironment.get(env)` and then wonder why there is no
`toAppendStream` or `toDataSet` because they are using the base
class.
However, things like that can be discussed in the corresponding
issue
when it comes to implementation.

@Vino: I think your work fits nicely to these efforts.
@everyone: I will wait for more feedback until end of thisweek.
Then I
will convert the design document into a FLIP and opensubtasks in
Jira,
if there are no objections?

Regards,
Timo

Am 24.11.18 um 13:45 schrieb vino yang:
Hi hequn,

I am very glad to hear that you are interested in this work.
As we all know, this process involves a lot.
Currently, the migration work has begun. I started with the
Kafka connector's dependency on flink-table and moved the
related dependencies to flink-table-common.
This work is tracked by FLINK-9461.  [1]
I don't know if it will conflict with what you expect todo, but
from
the
impact I have observed,
it will involve many classes that are currently inflink-table.
*Just a statement to prevent unnecessary conflicts.*

Thanks, vino.

[1]: https://issues.apache.org/jira/browse/FLINK-9461
Hequn Cheng <chenghe...@gmail.com> 于2018年11月24日周六下午7:20写道：
Hi Timo,
Thanks for the effort and writing up this document. I likethe
idea
to
make
flink-table scala free, so +1 for the proposal!
It's good to make Java the first-class citizen. For a longtime,
we
have
neglected java so that many features in Table are missedin Java
Test
cases, such as this one[1] I found recently. And I thinkwe may
also
need
to migrate our test cases, i.e, add java tests.
This definitely is a big change and will break APIcompatible. In
order
to
bring a smaller impact on users, I think we should go fastwhen we
migrate
APIs targeted to users. It's better to introduce the user
sensitive
changes
within a release. However, it may be not that easy. I canhelp to
contribute.
Separation of interface and implementation is a good idea.This
may
introduce a minimum of dependencies or even nodependencies. I saw
your
reply in the google doc. Java8 has already supportedstatic method
for
interfaces, I think we can make use of it?

Best,
Hequn

[1] https://issues.apache.org/jira/browse/FLINK-11001
On Fri, Nov 23, 2018 at 5:36 PM Timo Walther<twal...@apache.org>
wrote:
Hi everyone,
thanks for the great feedback so far. I updated thedocument with
the
input I got so far
@Fabian: I moved the porting of flink-table-runtimeclasses up in
the
list.
@Xiaowei: Could you elaborate what "interface only" meansto you?
Do
you
mean a module containing pure Java `interface`s? Or is the
validation
logic also part of the API module? Are 50+ expressionclasses
part
of
the API interface or already too implementation-specific?
@Xuefu: I extended the document by almost a page toclarify when
we
should develop in Scala and when in Java. As Piotr said,every
new
Scala
line is instant technical debt.

Thanks,
Timo


Am 23.11.18 um 10:29 schrieb Piotr Nowojski:
Hi Timo,

Thanks for writing this down +1 from my side :)
I'm wondering that whether we can have rule in theinterim when
Java
and Scala coexist that dependency can only be one-way. Ifound
that
in
the
current code base there are cases where a Scala classextends
Java
and
vise
versa. This is quite painful. I'm thinking if we couldsay that
extension
can only be from Java to Scala, which will help thesituation.
However,
I'm
not sure if this is practical.
Xuefu: I’m also not sure what’s the best approach here,probably
we
will
have to work it out as we go. One thing to consider isthat from
now
on,
every single new code line written in Scala anywhere in
Flink-table
(except
of Flink-table-api-scala) is an instant technologicaldebt. From
this
perspective I would be in favour of tolerating quite big
inchonvieneces
just to avoid any new Scala code.
Piotrek
On 23 Nov 2018, at 03:25, Zhang, Xuefu <
xuef...@alibaba-inc.com
wrote:
Hi Timo,

Thanks for the effort and the Google writeup. During our
external
catalog rework, we found much confusion between Java andScala,
and
this
Scala-free roadmap should greatly mitigate that.
I'm wondering that whether we can have rule in theinterim when
Java
and Scala coexist that dependency can only be one-way. Ifound
that
in
the
current code base there are cases where a Scala classextends
Java
and
vise
versa. This is quite painful. I'm thinking if we couldsay that
extension
can only be from Java to Scala, which will help thesituation.
However,
I'm
not sure if this is practical.
Thanks,
Xuefu
------------------------------------------------------------------
Sender:jincheng sun <sunjincheng...@gmail.com>
Sent at:2018 Nov 23 (Fri) 09:49
Recipient:dev <dev@flink.apache.org>
Subject:Re: [DISCUSS] Long-term goal of making flink-table
Scala-free
Hi Timo,
Thanks for initiating this great discussion.

Currently when using SQL/TableAPI should include many
dependence.
In
particular, it is not necessary to introduce the specific
implementation
dependencies which users do not care about. So I amglad to see
your
proposal, and hope when we consider splitting the APIinterface
into
a
separate module, so that the user can introduce minimum of
dependencies.
So, +1 to [separation of interface and implementation;e.g.
`Table` &
`TableImpl`] which you mentioned in the google doc.
Best,
Jincheng
Xiaowei Jiang <xiaow...@gmail.com> 于2018年11月22日周四下午10:50写道：
Hi Timo, thanks for driving this! I think that this isa nice
thing
to
do.
While we are doing this, can we also keep in mind thatwe want
to
eventually have a TableAPI interface only module whichusers
can
take
dependency on, but without including any implementation
details?
Xiaowei

On Thu, Nov 22, 2018 at 6:37 PM Fabian Hueske <
fhue...@gmail.com
wrote:
Hi Timo,

Thanks for writing up this document.
I like the new structure and agree to prioritize theporting
of
the
flink-table-common classes.
Since flink-table-runtime is (or should be)independent of
the
API
and
planner modules, we could start porting these classesonce
the
code
is
split into the new module structure.
The benefits of a Scala-free flink-table-runtimewould be a
Scala-free
execution Jar.

Best, Fabian
Am Do., 22. Nov. 2018 um 10:54 Uhr schrieb TimoWalther <
twal...@apache.org
:
Hi everyone,
I would like to continue this discussion thread andconvert
the
outcome
into a FLIP such that users and contributors knowwhat to
expect
in
the
upcoming releases.

I created a design document [1] that clarifies our
motivation
why
we
want to do this, how a Maven module structure couldlook
like,
and
a
suggestion for a migration plan.

It would be great to start with the efforts for the 1.8
release
such
that new features can be developed in Java and major
refactorings
such
as improvements to the connectors and external catalog
support
are
not
blocked.

Please let me know what you think.

Regards,
Timo

[1]
https://docs.google.com/document/d/1PPo6goW7tOwxmpFuvLSjFnx7BF8IVz0w3dcmPPyqvoY/edit?usp=sharing
Am 02.07.18 um 17:08 schrieb Fabian Hueske:
Hi Piotr,
thanks for bumping this thread and thanks forXingcan for
the
comments.
I think the first step would be to separate theflink-table
module
into
multiple sub modules. These could be:
- flink-table-api: All API facing classes. Can belater
divided
further
into Java/Scala Table API/SQL
- flink-table-planning: involves all planning(basically
everything
we
do
with Calcite)
- flink-table-runtime: the runtime code

IMO, a realistic mid-term goal is to have the runtime
module
and
certain
parts of the planning module ported to Java.
The api module will be much harder to port because of
several
dependencies
to Scala core classes (the parser framework, tree
iterations,
etc.).
I'm
not saying we should not port this to Java, but itis not
clear
to
me
(yet)
how to do it.
I think flink-table-runtime should not be too hardto port.
The
code
does
not make use of many Scala features, i.e., it'swriting
very
Java-like.
Also, there are not many dependencies and operatorscan be
individually
ported step-by-step.
For flink-table-planning, we can have certainpackages that
we
port
to
Java
like planning rules or plan nodes. The related classes
mostly
extend
Calcite's Java interfaces/classes and would be natural
choices
for
being
ported. The code generation classes will require more
effort
to
port.
There
are also some dependencies in planning on the apimodule
that
we
would
need
to resolve somehow.
For SQL most work when adding new features is donein the
planning
and
runtime modules. So, this separation should alreadyreduce
"technological
dept" quite a lot.
The Table API depends much more on Scala than SQL.

Cheers, Fabian
2018-07-02 16:26 GMT+02:00 Xingcan Cui<xingc...@gmail.com
:
Hi all,
I also think about this problem these days andhere are my
thoughts.
1) We must admit that it’s really a tough task to
interoperate
with
Java
and Scala. E.g., they have different collection types
(Scala
collections
v.s. java.util.*) and in Java, it's hard toimplement a
method
which
takes
Scala functions as parameters. Considering themajor part
of
the
code
base
is implemented in Java, +1 for this goal from along-term
view.
2) The ideal solution would be to just expose aScala API
and
make
all
the
other parts Scala-free. But I am not sure if itcould be
achieved
even
in a
long-term. Thus as Timo suggested, keep the Scalacodes in
"flink-table-core" would be a compromise solution.
3) If the community makes the final decision,maybe any
new
features
should be added in Java (regardless of themodules), in
order
to
prevent
the Scala codes from growing.

Best,
Xingcan
On Jul 2, 2018, at 9:30 PM, Piotr Nowojski <
pi...@data-artisans.com>
wrote:
Bumping the topic.
If we want to do this, the sooner we decide, theless
code
we
will
have
to rewrite. I have some objections/counterproposals to
Fabian's
proposal
of doing it module wise and one module at a time.
First, I do not see a problem of havingjava/scala code
even
within
one
module, especially not if there are cleanboundaries. Like
we
could
have
API in Scala and optimizer rules/logical nodeswritten in
Java
in
the
same
module. However I haven’t previously maintained mixed
scala/java
code
bases
before, so I might be missing something here.
Secondly this whole migration might and most likewill
take
longer
then
expected, so that creates a problem for a new codethat we
will
be
creating. After making a decision to migrate to Java,
almost
any
new
Scala
line of code will be immediately a technologicaldebt and
we
will
have
to
rewrite it to Java later.
Thus I would propose first to state our end goal -
modules
structure
and
which parts of modules we want to have eventually
Scala-free.
Secondly
taking all steps necessary that will allow us towrite new
code
complaint
with our end goal. Only after that we should/couldfocus
on
incrementally
rewriting the old code. Otherwise we could be
stuck/blocked
for
years
writing new code in Scala (and increasingtechnological
debt),
because
nobody have found a time to rewrite some nonimportant and
not
actively
developed part of some module.
Piotrek
On 14 Jun 2018, at 15:34, Fabian Hueske <
fhue...@gmail.com
wrote:
Hi,
In general, I think this is a good effort.However, it
won't
be
easy
and I
think we have to plan this well.
I don't like the idea of having the whole code base
fragmented
into
Java
and Scala code for too long.
I think we should do this one step at a time andfocus
on
migrating
one
module at a time.
IMO, the easiest start would be to port theruntime to
Java.
Extracting the API classes into an own module,porting
them
to
Java,
and
removing the Scala dependency won't be possiblewithout
breaking
the
API
since a few classes depend on the Scala Table API.

Best, Fabian


2018-06-14 10:33 GMT+02:00 Till Rohrmann <
trohrm...@apache.org
:
I think that is a noble and honorable goal and we
should
strive
for
it.
This, however, must be an iterative processgiven the
sheer
size
of
the
code base. I like the approach to define commonJava
modules
which
are
used
by more specific Scala modules and slowly moving
classes
from
Scala
to
Java. Thus +1 for the proposal.

Cheers,
Till

On Wed, Jun 13, 2018 at 12:01 PM Piotr Nowojski <
pi...@data-artisans.com>
wrote:
Hi,
I do not have an experience with how scala andjava
interacts
with
each
other, so I can not fully validate yourproposal, but
generally
speaking
+1
from me.

Does it also mean, that we should slowly migrate
`flink-table-core`
to
Java? How would you envision it? It would benice to
be
able
to
add
new
classes/features written in Java and so thatthey can
coexist
with
old
Scala code until we gradually switch fromScala to
Java.
Piotrek
On 13 Jun 2018, at 11:32, Timo Walther <
twal...@apache.org
wrote:
Hi everyone,
as you all know, currently the Table & SQLAPI is
implemented
in
Scala.
This decision was made a long-time ago when the
initital
code
base
was
created as part of a master's thesis. Thecommunity
kept
Scala
because of
the nice language features that enable afluent Table
API
like
table.select('field.trim()) and because Scalaallows
for
quick
prototyping
(e.g. multi-line comments for codegeneration). The
committers
enforced
not
splitting the code-base into two programming
languages.
However, nowadays the flink-table module moreand
more
becomes
an
important part in the Flink ecosystem.Connectors,
formats,
and
SQL
client
are actually implemented in Java but need to
interoperate
with
flink-table
which makes these modules dependent on Scala. As
mentioned
in
an
earlier
mail thread, using Scala for API classes alsoexposes
member
variables
and
methods in Java that should not be exposed tousers
[1].
Java
is
still
the
most important API language and right now wetreat it
as
a
second-class
citizen. I just noticed that you even need to add
Scala
if
you
just
want
to
implement a ScalarFunction because of methodclashes
between
`public
String
toString()` and `public scala.Predef.String
toString()`.
Given the size of the current code base,
reimplementing
the
entire
flink-table code in Java is a goal that wemight never
reach.
However, we
should at least treat the symptoms and havethis as a
long-term
goal
in
mind. My suggestion would be to convertuser-facing
and
runtime
classes
and
split the code base into multiple modules:
flink-table-java {depends on flink-table-core}
Implemented in Java. Java users can use this.This
would
require
to
convert classes like TableEnvironment, Table.
flink-table-scala {depends on flink-table-core}
Implemented in Scala. Scala users can use this.
flink-table-common
Implemented in Java. Connectors, formats, andUDFs
can
use
this.
It
contains interface classes such asdescriptors, table
sink,
table
source.
flink-table-core {depends onflink-table-common and
flink-table-runtime}
Implemented in Scala. Contains the currentmain code
base.
flink-table-runtime
Implemented in Java. This would require toconvert
classes
in
o.a.f.table.runtime but would improve the runtime
potentially.
What do you think?


Regards,

Timo

[1]
http://apache-flink-mailing-list-archive.1008284.n3.
nabble.com/DISCUSS-Convert-main-Table-API-classes-into-
traits-tp21335.html

Re: [DISCUSS] Long-term goal of making flink-table Scala-free

Reply via email to