Re: NegativeArraySizeException / segfault

2016-06-08 Thread Andres Perez
We were able to reproduce it with a minimal example. I've opened a jira issue: https://issues.apache.org/jira/browse/SPARK-15825 On Wed, Jun 8, 2016 at 12:43 PM, Koert Kuipers wrote: > great! > > we weren't able to reproduce it because the unit tests use a > broadcast-join while on the cluster

Dataset reduceByKey

2016-05-19 Thread Andres Perez
Hi all, We were in the process of porting an RDD program to one which uses Datasets. Most things were easy to transition, but one hole in functionality we found was the ability to reduce a Dataset by key, something akin to PairRDDFunctions.reduceByKey. Our first attempt of adding the functionality

right outer joins on Datasets

2016-05-19 Thread Andres Perez
Hi all, I'm getting some odd behavior when using the joinWith functionality for Datasets. Here is a small test case: val left = List(("a", 1), ("a", 2), ("b", 3), ("c", 4)).toDS() val right = List(("a", "x"), ("b", "y"), ("d", "z")).toDS() val joined = left.toDF("k", "v").as[(String, Int)].