Great work Roman, do you think it is possible to run in scala shell as well?
On Thu, May 12, 2022 at 10:43 PM Roman Grebennikov <g...@dfdx.me> wrote: > Hello, > > As far as I understand discussions in this mailist, now there is almost no > people maintaining the official Scala API in Apache Flink. Due to some > technical complexities it will be probably stuck for a very long time on > Scala 2.12 (which is not EOL yet, but quite close to): > * Traversable serializer relies a lot on CanBuildFrom (so it's read and > compiled on restore), which is missing in Scala 2.13 and 3.x - migrating > off from this approach maintaining a savepoint compatibility can be quite a > complex task. > * Scala API uses an implicitly generated TypeInformation, which is > generated by a giant scary mkTypeInfo macro, which should be completely > rewritten for Scala 3.x. > > But even in the current state, scala support in Flink has some issues with > ADT (sealed traits, popular data modelling pattern) not being natively > supported, so if you use them, you have to fall back to Kryo, which is not > that fast: we've seed 3x-4x throughput drops in performance tests. > > In my current company we made a library ( > https://github.com/findify/flink-adt) which used Magnolia ( > https://github.com/softwaremill/magnolia) to do all the compile-time > TypeInformation generation to make Scala ADT nice & fast in Flink. With a > couple of community contributions it was now possible to cross-build it > also for scala3. > > As Flink 1.15 core is scala free, we extracted the DataStream part of > Flink Scala API into a separate project, glued it together with flink-adt > and ClosureCleaner from Spark 3.2 (supporting Scala 2.13 and jvm17) and > cross-compiled it for 2.12/2.13/3.x. You can check out the result on this > github project: https://github.com/findify/flink-scala-api > > So technically speaking, now it's possible to migrate a scala flink job > from 2.12 to 3.x with: > * replace flink-streaming-scala dependency with flink-scala-api (optional, > both libs can co-exist in classpath on 2.12) > * replace all imports of org.apache.flink.streaming.api.scala._ with ones > from the new library > * rebuild the job for 3.x > > The main drawback is that there is no savepoint compatibility due to > CanBuildFrom and different way of handling ADTs. But if you can afford > re-bootstrapping the state - migration is quite straightforward. > > The README on github https://github.com/findify/flink-scala-api#readme > has some more details on how and why this project was done in this way. And > the project is a bit experimental, so if you're interested in scala3 on > Flink, you're welcome to share your feedback and ideas. > > with best regards, > Roman Grebennikov | g...@dfdx.me > > -- Best Regards Jeff Zhang