Great work Roman, do you think it is possible to run in scala shell as well?

On Thu, May 12, 2022 at 10:43 PM Roman Grebennikov <g...@dfdx.me> wrote:

> Hello,
>
> As far as I understand discussions in this mailist, now there is almost no
> people maintaining the official Scala API in Apache Flink. Due to some
> technical complexities it will be probably stuck for a very long time on
> Scala 2.12 (which is not EOL yet, but quite close to):
> * Traversable serializer relies a lot on CanBuildFrom (so it's read and
> compiled on restore), which is missing in Scala 2.13 and 3.x - migrating
> off from this approach maintaining a savepoint compatibility can be quite a
> complex task.
> * Scala API uses an implicitly generated TypeInformation, which is
> generated by a giant scary mkTypeInfo macro, which should be completely
> rewritten for Scala 3.x.
>
> But even in the current state, scala support in Flink has some issues with
> ADT (sealed traits, popular data modelling pattern) not being natively
> supported, so if you use them, you have to fall back to Kryo, which is not
> that fast: we've seed 3x-4x throughput drops in performance tests.
>
> In my current company we made a library (
> https://github.com/findify/flink-adt) which used Magnolia (
> https://github.com/softwaremill/magnolia) to do all the compile-time
> TypeInformation generation to make Scala ADT nice & fast in Flink. With a
> couple of community contributions it was now possible to cross-build it
> also for scala3.
>
> As Flink 1.15 core is scala free, we extracted the DataStream part of
> Flink Scala API into a separate project, glued it together with flink-adt
> and ClosureCleaner from Spark 3.2 (supporting Scala 2.13 and jvm17) and
> cross-compiled it for 2.12/2.13/3.x. You can check out the result on this
> github project: https://github.com/findify/flink-scala-api
>
> So technically speaking, now it's possible to migrate a scala flink job
> from 2.12 to 3.x with:
> * replace flink-streaming-scala dependency with flink-scala-api (optional,
> both libs can co-exist in classpath on 2.12)
> * replace all imports of org.apache.flink.streaming.api.scala._ with ones
> from the new library
> * rebuild the job for 3.x
>
> The main drawback is that there is no savepoint compatibility due to
> CanBuildFrom and different way of handling ADTs. But if you can afford
> re-bootstrapping the state - migration is quite straightforward.
>
> The README on github https://github.com/findify/flink-scala-api#readme
> has some more details on how and why this project was done in this way. And
> the project is a bit experimental, so if you're interested in scala3 on
> Flink, you're welcome to share your feedback and ideas.
>
> with best regards,
> Roman Grebennikov | g...@dfdx.me
>
>

-- 
Best Regards

Jeff Zhang

Reply via email to