Re: Community fork of Flink Scala API for Scala 2.12/2.13/3.x

Konstantin Knauf Mon, 16 May 2022 05:10:51 -0700

Great work! Thank you for sharing.

Am Do., 12. Mai 2022 um 17:19 Uhr schrieb Jeff Zhang <zjf...@gmail.com>:


> That's true scala shell is removed from flink . Fortunately, Apache
> Zeppelin has its own scala repl for Flink. So if Flink can support scala
> 2.13, I am wondering whether it is possible to integrate it into scala
> shell so that user can run flink scala code in notebook like spark.
>
> On Thu, May 12, 2022 at 11:06 PM Roman Grebennikov <g...@dfdx.me> wrote:
>
>> Hi,
>>
>> AFAIK scala REPL was removed completely in Flink 1.15 (
>> https://issues.apache.org/jira/browse/FLINK-24360), so there is nothing
>> to cross-build.
>>
>> Roman Grebennikov | g...@dfdx.me
>>
>>
>> On Thu, May 12, 2022, at 14:55, Jeff Zhang wrote:
>>
>> Great work Roman, do you think it is possible to run in scala shell as
>> well?
>>
>> On Thu, May 12, 2022 at 10:43 PM Roman Grebennikov <g...@dfdx.me> wrote:
>>
>>
>> Hello,
>>
>> As far as I understand discussions in this mailist, now there is almost
>> no people maintaining the official Scala API in Apache Flink. Due to some
>> technical complexities it will be probably stuck for a very long time on
>> Scala 2.12 (which is not EOL yet, but quite close to):
>> * Traversable serializer relies a lot on CanBuildFrom (so it's read and
>> compiled on restore), which is missing in Scala 2.13 and 3.x - migrating
>> off from this approach maintaining a savepoint compatibility can be quite a
>> complex task.
>> * Scala API uses an implicitly generated TypeInformation, which is
>> generated by a giant scary mkTypeInfo macro, which should be completely
>> rewritten for Scala 3.x.
>>
>> But even in the current state, scala support in Flink has some issues
>> with ADT (sealed traits, popular data modelling pattern) not being natively
>> supported, so if you use them, you have to fall back to Kryo, which is not
>> that fast: we've seed 3x-4x throughput drops in performance tests.
>>
>> In my current company we made a library (
>> https://github.com/findify/flink-adt) which used Magnolia (
>> https://github.com/softwaremill/magnolia) to do all the compile-time
>> TypeInformation generation to make Scala ADT nice & fast in Flink. With a
>> couple of community contributions it was now possible to cross-build it
>> also for scala3.
>>
>> As Flink 1.15 core is scala free, we extracted the DataStream part of
>> Flink Scala API into a separate project, glued it together with flink-adt
>> and ClosureCleaner from Spark 3.2 (supporting Scala 2.13 and jvm17) and
>> cross-compiled it for 2.12/2.13/3.x. You can check out the result on this
>> github project: https://github.com/findify/flink-scala-api
>>
>> So technically speaking, now it's possible to migrate a scala flink job
>> from 2.12 to 3.x with:
>> * replace flink-streaming-scala dependency with flink-scala-api
>> (optional, both libs can co-exist in classpath on 2.12)
>> * replace all imports of org.apache.flink.streaming.api.scala._ with ones
>> from the new library
>> * rebuild the job for 3.x
>>
>> The main drawback is that there is no savepoint compatibility due to
>> CanBuildFrom and different way of handling ADTs. But if you can afford
>> re-bootstrapping the state - migration is quite straightforward.
>>
>> The README on github https://github.com/findify/flink-scala-api#readme
>> has some more details on how and why this project was done in this way. And
>> the project is a bit experimental, so if you're interested in scala3 on
>> Flink, you're welcome to share your feedback and ideas.
>>
>> with best regards,
>> Roman Grebennikov | g...@dfdx.me
>>
>>
>>
>> --
>> Best Regards
>>
>> Jeff Zhang
>>
>>
>>
>
> --
> Best Regards
>
> Jeff Zhang
>


-- 
https://twitter.com/snntrable
https://github.com/knaufk

Re: Community fork of Flink Scala API for Scala 2.12/2.13/3.x

Reply via email to