[ https://issues.apache.org/jira/browse/FLINK-1159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193199#comment-15193199 ]
ASF GitHub Bot commented on FLINK-1159: --------------------------------------- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/1704#discussion_r55993999 --- Diff: docs/apis/scala_api_extensions.md --- @@ -0,0 +1,392 @@ +--- +title: "Scala API Extensions" +# Top-level navigation +top-nav-group: apis +top-nav-pos: 11 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +In order to keep a fair amount of consistency between the Scala and Java APIs, some +of the features that allow a high-level of expressiveness in Scala have been left +out from the standard APIs for both batch and streaming. + +If you want to _enjoy the full Scala experience_ you can choose to opt-in to +extensions that enhance the Scala API via implicit conversions. + +To use all the available extensions, you can just add a simple `import` for the +DataSet API + +{% highlight scala %} +import org.apache.flink.api.scala.extensions._ +{% endhighlight %} + +or the DataStream API + +{% highlight scala %} +import org.apache.flink.streaming.api.scala.extensions._ +{% endhighlight %} + +Alternatively, you can import individual extensions _a-là-carte_ to only use those +you prefer. + +## Accept partial functions + +Normally, both the DataSet and DataStream APIs don't accept anonymous pattern +matching functions to deconstruct tuples, case classes or collections, like the +following: + +{% highlight scala %} +val data: DataSet[(Int, String, Double)] = // [...] +data.map { + case (id, name, temperature) => // [...] + // The previous line causes the following compilation error: + // "The argument types of an anonymous function must be fully known. (SLS 8.5)" +} +{% endhighlight %} + +This extension introduces new methods in both the DataSet and DataStream Scala API +that have a one-to-one correspondance in the extended API. These delegating methods +do support anonymous pattern matching functions. + +#### DataSet API + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 20%">Method</th> + <th class="text-left" style="width: 20%">Original</th> + <th class="text-center">Example</th> + </tr> + </thead> + + <tbody> + <tr> + <td><strong>mapWith</strong></td> + <td><strong>map (DataSet)</strong></td> + <td> +{% highlight scala %} +data.mapWith { + case (_, value) => value.toString +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>mapPartitionWith</strong></td> + <td><strong>mapPartition (DataSet)</strong></td> + <td> +{% highlight scala %} +data.mapPartitionWith { + case head +: _ => head +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>flatMapWith</strong></td> + <td><strong>flatMap (DataSet)</strong></td> + <td> +{% highlight scala %} +data.flatMapWith { + case (_, name, visitTimes) => visitTimes.map(name -> _) +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>filterWith</strong></td> + <td><strong>filter (DataSet)</strong></td> + <td> +{% highlight scala %} +data.filterWith { + case Train(_, isOnTime) => isOnTime +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>reduceWith</strong></td> + <td><strong>reduce (DataSet, GroupedDataSet)</strong></td> + <td> +{% highlight scala %} +data.reduceWith { + case ((_, amount1), (_, amount2)) => amount1 + amount2 +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>reduceGroupWith</strong></td> + <td><strong>reduceGroup (GroupedDataSet)</strong></td> + <td> +{% highlight scala %} +data.reduceGroupWith { + case id +: value +: _ => id -> value +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>groupingBy</strong></td> + <td><strong>groupBy (DataSet)</strong></td> + <td> +{% highlight scala %} +data.groupingBy { + case (id, _, _) => id +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>sortGroupWith</strong></td> + <td><strong>sortGroup (GroupedDataSet)</strong></td> + <td> +{% highlight scala %} +grouped.sortGroupWith(Order.ASCENDING) { + case House(_, value) => value +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>combineGroupWith</strong></td> + <td><strong>combineGroup (GroupedDataSet)</strong></td> + <td> +{% highlight scala %} +grouped.combineGroupWith { + case header +: amounts => amounts.sum +} +{% endhighlight %} + </td> + <tr> + <td><strong>projecting</strong></td> + <td><strong>apply (JoinDataSet, CrossDataSet)</strong></td> + <td> +{% highlight scala %} +data1.join(data2).where(0).equalTo(1).projecting { + case ((pk, tx), (products, fk)) => tx -> products +} + +data1.cross(data2).projecting { + case ((a, _), (_, b) => a -> b +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>projecting</strong></td> + <td><strong>apply (CoGroupDataSet)</strong></td> + <td> +{% highlight scala %} +data1.coGroup(data2).where(0).equalTo(1).projecting { + case (head1 +: _, head2 +: _) => head1 -> head2 +} +{% endhighlight %} + </td> + </tr> + </tr> + </tbody> +</table> + +#### DataStream API + +<table class="table table-bordered"> + <thead> + <tr> + <th class="text-left" style="width: 20%">Method</th> + <th class="text-left" style="width: 20%">Original</th> + <th class="text-center">Example</th> + </tr> + </thead> + + <tbody> + <tr> + <td><strong>mapWith</strong></td> + <td><strong>map (DataStream)</strong></td> + <td> +{% highlight scala %} +data.mapWith { + case (_, value) => value.toString +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>mapPartitionWith</strong></td> + <td><strong>mapPartition (DataStream)</strong></td> + <td> +{% highlight scala %} +data.mapPartitionWith { + case head +: _ => head +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>flatMapWith</strong></td> + <td><strong>flatMap (DataStream)</strong></td> + <td> +{% highlight scala %} +data.flatMapWith { + case (_, name, visits) => visits.map(name -> _) +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>filterWith</strong></td> + <td><strong>filter (DataStream)</strong></td> + <td> +{% highlight scala %} +data.filterWith { + case Train(_, isOnTime) => isOnTime +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>keyingBy</strong></td> + <td><strong>keyBy (DataStream)</strong></td> + <td> +{% highlight scala %} +data.keyingBy { + case (id, _, _) => id +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>mapWith</strong></td> + <td><strong>map (ConnectedDataStream)</strong></td> + <td> +{% highlight scala %} +data.mapWith( + map1 = case (_, value) => value.toString, + map2 = case (_, _, value, _) => value + 1 +) +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>flatMapWith</strong></td> + <td><strong>flatMap (ConnectedDataStream)</strong></td> + <td> +{% highlight scala %} +data.flatMapWith( + flatMap1 = case (_, json) => parse(json), + flatMap2 = case (_, _, json, _) => parse(json) +) +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>keyingBy</strong></td> + <td><strong>keyBy (ConnectedDataStream)</strong></td> + <td> +{% highlight scala %} +data.keyingBy( + key1 = case (_, timestamp) => timestamp, + key2 = case (id, _, _) => id +) +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>reduceWith</strong></td> + <td><strong>reduce (KeyedDataStream, WindowedDataStream)</strong></td> + <td> +{% highlight scala %} +data.reduceWith { + case ((_, sum1), (_, sum2) => sum1 + sum2 +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>foldWith</strong></td> + <td><strong>fold (KeyedDataStream, WindowedDataStream)</strong></td> + <td> +{% highlight scala %} +data.foldWith(User(bought = 0)) { + case (User(b), (_, items)) => User(b + items.size) +} +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>applyWith</strong></td> + <td><strong>apply (WindowedDataStream)</strong></td> + <td> +{% highlight scala %} +data.applyWith(0)( + foldFunction = case (sum, amount) => sum + amount + windowFunction = case (k, w, sum) => // [...] +) +{% endhighlight %} + </td> + </tr> + <tr> + <td><strong>projecting</strong></td> + <td><strong>apply (JoinedDataStream)</strong></td> + <td> +{% highlight scala %} +data1.join(data2).where(0).equalTo(1).projecting { + case ((pk, tx), (products, fk)) => tx -> products +} +{% endhighlight %} + </td> + </tr> + </tbody> +</table> + + + +For more information on the semantics of each method, please refer to the +[DataStream](batch/index.html) and [DataSet](streaming/index.html) API documentation. + +To use this extension exclusively, you can add the following `import`: + +{% highlight scala %} +import org.apache.flink.api.scala.extensions.acceptPartialFunctions --- End diff -- Does this really work? Don't you have to import `o.a.f.api.scala.extensions.acceptPartialFunctionsOnDataSet` etc.? > Case style anonymous functions not supported by Scala API > --------------------------------------------------------- > > Key: FLINK-1159 > URL: https://issues.apache.org/jira/browse/FLINK-1159 > Project: Flink > Issue Type: Bug > Components: Scala API > Reporter: Till Rohrmann > Assignee: Stefano Baghino > > In Scala it is very common to define anonymous functions of the following form > {code} > { > case foo: Bar => foobar(foo) > case _ => throw new RuntimeException() > } > {code} > These case style anonymous functions are not supported yet by the Scala API. > Thus, one has to write redundant code to name the function parameter. > What works is the following pattern, but it is not intuitive for someone > coming from Scala: > {code} > dataset.map{ > _ match{ > case foo:Bar => ... > } > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)