Re: Dataframe schema...

2016-10-26 Thread Michael Armbrust
On Fri, Oct 21, 2016 at 8:40 PM, Koert Kuipers wrote: > This rather innocent looking optimization flag nullable has caused a lot > of bugs... Makes me wonder if we are better off without it > Yes... my most regretted design decision :( Please give thoughts here: https://issues.apache.org/jira/b

Re: Dataframe schema...

2016-10-21 Thread Koert Kuipers
This rather innocent looking optimization flag nullable has caused a lot of bugs... Makes me wonder if we are better off without it On Oct 21, 2016 8:37 PM, "Muthu Jayakumar" wrote: > Thanks Cheng Lian for opening the JIRA. I found this with Spark 2.0.0. > > Thanks, > Muthu > > On Fri, Oct 21, 2

Re: Dataframe schema...

2016-10-21 Thread Muthu Jayakumar
Thanks Cheng Lian for opening the JIRA. I found this with Spark 2.0.0. Thanks, Muthu On Fri, Oct 21, 2016 at 3:30 PM, Cheng Lian wrote: > Yea, confirmed. While analyzing unions, we treat StructTypes with > different field nullabilities as incompatible types and throws this error. > > Opened htt

Re: Dataframe schema...

2016-10-21 Thread Cheng Lian
Yea, confirmed. While analyzing unions, we treat StructTypes with different field nullabilities as incompatible types and throws this error. Opened https://issues.apache.org/jira/browse/SPARK-18058 to track this issue. Thanks for reporting! Cheng On 10/21/16 3:15 PM, Cheng Lian wrote: Hi

Re: Dataframe schema...

2016-10-21 Thread Cheng Lian
Hi Muthu, What is the version of Spark are you using? This seems to be a bug in the analysis phase. Cheng On 10/21/16 12:50 PM, Muthu Jayakumar wrote: Sorry for the late response. Here is what I am seeing... Schema from parquet file. d1.printSchema() root |-- task_id: string (nullable =

Re: Dataframe schema...

2016-10-21 Thread Muthu Jayakumar
Sorry for the late response. Here is what I am seeing... Schema from parquet file. d1.printSchema() root |-- task_id: string (nullable = true) |-- task_name: string (nullable = true) |-- some_histogram: struct (nullable = true) ||-- values: array (nullable = true) |||-- element

Re: Dataframe schema...

2016-10-20 Thread Michael Armbrust
What is the issue you see when unioning? On Wed, Oct 19, 2016 at 6:39 PM, Muthu Jayakumar wrote: > Hello Michael, > > Thank you for looking into this query. In my case there seem to be an > issue when I union a parquet file read from disk versus another dataframe > that I construct in-memory. Th

Re: Dataframe schema...

2016-10-19 Thread Muthu Jayakumar
Hello Michael, Thank you for looking into this query. In my case there seem to be an issue when I union a parquet file read from disk versus another dataframe that I construct in-memory. The only difference I see is the containsNull = true. In fact, I do not see any errors with union on the simple

Re: Dataframe schema...

2016-10-19 Thread Michael Armbrust
Nullable is just a hint to the optimizer that its impossible for there to be a null value in this column, so that it can avoid generating code for null-checks. When in doubt, we set nullable=true since it is always safer to check. Why in particular are you trying to change the nullability of the