Re: Case style anonymous functions not supported by Scala API

Till Rohrmann Mon, 08 Feb 2016 05:52:52 -0800

I like the idea to support partial functions with Flink’s Scala API.
However, I think that breaking the API and making it inconsistent with
respect to the Java API is not the best option. I would rather be in favour
of the first proposal where we add a new method xxxWith via implicit
conversions.


Cheers,
Till


On Sun, Feb 7, 2016 at 12:44 PM, Stefano Baghino <
[email protected]> wrote:

> It took me a little time but I was able to put together some code.
>
> In this commit I just added a few methods renamed to prevent overloading,
> thus usable with PartialFunction instead of functions:
>
> https://github.com/radicalbit/flink/commit/aacd59e0ce98cccb66d48a30d07990ac8f345748
>
> In this other commit I coded the original proposal, renaming the methods to
> obtain the same effect as before, but with lower friction for Scala
> developers (and provided some usage examples):
>
> https://github.com/radicalbit/flink/commit/33403878eebba70def42f73a1cb671d13b1521b5
>
> On Thu, Jan 28, 2016 at 6:16 PM, Stefano Baghino <
> [email protected]> wrote:
>
> > Hi Stephan,
> >
> > thank you for the quick reply and for your feedback; I agree with you
> that
> > breaking changes have to taken very seriously.
> >
> > The rationale behind my proposal is that Scala users are already
> > accustomed to higher-order functions that manipulate collections and it
> > would beneficial for them to have an API that tries to adhere as much as
> > possible to the interface provided by the Scala Collections API. IMHO
> being
> > able to manipulate a DataSet or DataStream like a Scala collection
> > idiomatically would appeal to developers and reduce the friction for them
> > to learn Flink.
> >
> > If we want to pursue the renaming path, I think these changes (and
> porting
> > the rest of the codebase, like `flink-ml` and `flink-contrib`, to the new
> > method names) can be done in relatively little time. Since Flink is
> > approaching a major release, I think it's a good time to consider this
> > change, if the community deems it relevant.
> >
> > While we await for feedback on the proposal, I can start working on both
> > paths to see how it would affect the codebase, what do you think?
> >
> > On Thu, Jan 28, 2016 at 2:14 PM, Stephan Ewen <[email protected]> wrote:
> >
> >> Hi!
> >>
> >> Would be nice to support that, agreed.
> >>
> >> Such a fundamental break in the API worries me a bit, though - I would
> opt
> >> for a non-breaking addition.
> >> Wrapping the RichFunctions into Scala functions (which are actually
> >> wrapped
> >> as rich functions) with implicits seems like a workaround for something
> >> that should be very simple. Would probably also cost a bit of
> performance.
> >>
> >>
> >> I like the idea of "mapWith(...)" - if that were a simple non overloaded
> >> function accepting a Scala function, it should accept case-style
> >> functions,
> >> right?
> >> Simply adding that would probably solve things, but add a second variant
> >> of
> >> each function to the DataSet. An implicit conversion from DataSet to
> >> DataSetExtended (which implements the mapWith, reduceWith, ...) methods
> >> could help there...
> >>
> >> What do you think?
> >>
> >> Greetings,
> >> Stephan
> >>
> >>
> >> On Thu, Jan 28, 2016 at 2:05 PM, Stefano Baghino <
> >> [email protected]> wrote:
> >>
> >> > Hello everybody,
> >> >
> >> > as I'm getting familiar with Flink I've found a possible improvement
> to
> >> the
> >> > Scala APIs: in Scala it's a common pattern to perform tuple extraction
> >> > using pattern matching, making functions working on tuples more
> >> readable,
> >> > like this:
> >> >
> >> > // referring to the mail count example in the training
> >> > // assuming `mails` is a DataSet[(String, String)]
> >> > // a pair of date and a string with username and email
> >> > val monthsAndEmails =
> >> >   mails.map {
> >> >     case (date, sender) =>
> >> >       (extractMonth(date), extractEmail(sender))
> >> >   }
> >> >
> >> > However, this is not possible when using the Scala APIs because of the
> >> > overloading of the `map` function in the `DataSet` and `DataStream`
> >> classes
> >> > (along with other higher-order function such as `flatMap` and
> >> `filter`). My
> >> > understanding is that the main reason to have two different overloaded
> >> > functions is to provide support for `RichFunction`s.
> >> > I've found out there has been some interest around the issue in the
> >> past (
> >> > [FLINK-1159] <https://issues.apache.org/jira/browse/FLINK-1159>).
> >> > In the past couple of days me and my colleague Andrea have tried
> several
> >> > ways to address the problem, coming to two possible solutions:
> >> >
> >> >    1. don't overload and use different names, e.g. `map` taking a
> Scala
> >> >    function and `mapWith` taking a Flink MapFunction
> >> >    2. keep only the method taking a Scala function (which would be
> more
> >> >    idiomatic from a Scala perspective, IMHO) and providing an implicit
> >> >    conversion from the Flink function to the Scala function within the
> >> >    `org.apache.flink.api.scala` package object
> >> >
> >> > We've also evaluated several other approaches using union types and
> type
> >> > classes but we've found them to be too complex. Regarding the two
> >> > approaches I've cited, the first would imply a breaking change to the
> >> APIs,
> >> > while the second is giving me a hard time at figuring out some
> >> compilation
> >> > errors in `flink-libraries` and `flink-contrib` and as we tested it we
> >> > found out `RichMapFunction`s lose state (possibly because of the
> double
> >> > conversion, first to a Scala function, then to a simple
> `MapFunction`).
> >> >
> >> > You can have a look at the code I've written so far here (last 2
> >> commits):
> >> > https://github.com/radicalbit/flink/commits/1159
> >> >
> >> > We had a little exchange of ideas and thought that the first solution
> >> would
> >> > be easier and also interesting from the standpoint of the ergonomics
> of
> >> the
> >> > API (e.g. `line mapWith new LineSplitter`) and would like to gather
> some
> >> > feedback on the feasibility of this change.
> >> >
> >> > Would this still be regarded as a relevant improvement? What do you
> >> think
> >> > about it? Do you think there's time to work on them before the 1.0
> >> release?
> >> > What do you think about introducing breaking changes to make this
> >> pattern
> >> > available to Scala users?
> >> >
> >> > Thank you all in advance for your feedback.
> >> >
> >> > --
> >> > BR,
> >> > Stefano Baghino
> >> >
> >> > Software Engineer @ Radicalbit
> >> >
> >>
> >
> >
> >
> > --
> > BR,
> > Stefano Baghino
> >
> > Software Engineer @ Radicalbit
> >
>
>
>
> --
> BR,
> Stefano Baghino
>
> Software Engineer @ Radicalbit
>

Re: Case style anonymous functions not supported by Scala API

Reply via email to