Re: Spark SQL DataFrame: Nullable column and filtering

2015-08-01 Thread Martin Senne
Dear all, after some fiddling I have arrived at this solution: /** * Customized left outer join on common column. */ def leftOuterJoinWithRemovalOfEqualColumn(leftDF: DataFrame, rightDF: DataFrame, commonColumnName: String): DataFrame = { val joinedDF = leftDF.as('left).join(rightDF.as('right

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-31 Thread Martin Senne
Dear Michael, dear all, a minimal example is listed below. After some further analysis I could figure out, that the problem is related to the *leftOuterJoinWithRemovalOfEqualColumn*-Method, as I use columns of the left and right dataframes when doing the select on the joined table. /** * Cu

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread Martin Senne
Dear Michael, dear all, distinguishing those records that have a match in mapping from those that don't is the crucial point. Record(x : Int, a: String) Mapping(x: Int, y: Int) Thus Record(1, "hello") Record(2, "bob") Mapping(2, 5) yield (2, "bob", 5) on an inner join. BUT I'm also interested

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread Michael Armbrust
Perhaps I'm missing what you are trying to accomplish, but if you'd like to avoid the null values do an inner join instead of an outer join. Additionally, I'm confused about how the result of joinedDF.filter(joinedDF( "y").isNotNull).show still contains null values in the column y. This doesn't re

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread Martin Senne
Dear Michael, dear all, motivation: object OtherEntities { case class Record( x:Int, a: String) case class Mapping( x: Int, y: Int ) val records = Seq( Record(1, "hello"), Record(2, "bob")) val mappings = Seq( Mapping(2, 5) ) } Now I want to perform an *left outer join* on records and

Re: Spark SQL DataFrame: Nullable column and filtering

2015-07-30 Thread Michael Armbrust
We don't yet updated nullability information based on predicates as we don't actually leverage this information in many places yet. Why do you want to update the schema? On Thu, Jul 30, 2015 at 11:19 AM, martinibus77 wrote: > Hi all, > > 1. *Columns in dataframes can be nullable and not nullabl