I filed https://issues.apache.org/jira/browse/SPARK-15441
On Thu, May 19, 2016 at 8:48 AM, Andres Perez <and...@tresata.com> wrote: > Hi all, I'm getting some odd behavior when using the joinWith > functionality for Datasets. Here is a small test case: > > val left = List(("a", 1), ("a", 2), ("b", 3), ("c", 4)).toDS() > val right = List(("a", "x"), ("b", "y"), ("d", "z")).toDS() > > val joined = left.toDF("k", "v").as[(String, Int)].alias("left") > .joinWith(right.toDF("k", "u").as[(String, String)].alias("right"), > functions.col("left.k") === functions.col("right.k"), "right_outer") > .as[((String, Int), (String, String))] > .map { case ((k, v), (_, u)) => (k, (v, u)) }.as[(String, (Int, > String))] > > I would expect the result of this right-join to be: > > (a,(1,x)) > (a,(2,x)) > (b,(3,y)) > (d,(null,z)) > > but instead I'm getting: > > (a,(1,x)) > (a,(2,x)) > (b,(3,y)) > (null,(-1,z)) > > Not that the key for the final tuple is null instead of "d". (Also, is > there a reason the value for the left-side of the last tuple is -1 and not > null?) > > -Andy >