Re: How to set nullable field when create DataFrame using case class

Mich Talebzadeh Fri, 05 Aug 2016 02:30:13 -0700

Hi Jacek,

Is this line correct?


spark.createDataset(Seq(MyProduct(new Timestamp(0), 10))).printSchema

Thanks


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 5 August 2016 at 10:21, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi Michael,
>
> Since we're at it, could you please point at the code where the
> optimization happens? I assume you're talking about Catalyst when
> whole-gening the code for queries. Is this nullability (NULL value)
> propagation perhaps? I'd appreciate (hoping that would improve my
> understanding of the low-level bits quite substantially). Thanks!
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
>
> On Fri, Aug 5, 2016 at 1:24 AM, Michael Armbrust <mich...@databricks.com>
> wrote:
> > Nullable is an optimization for Spark SQL.  It is telling spark to not
> even
> > do an if check when accessing that field.
> >
> > In this case, your data is nullable, because timestamp is an object in
> java
> > and you could put null there.
> >
> > On Thu, Aug 4, 2016 at 2:56 PM, luismattor <luismat...@gmail.com> wrote:
> >>
> >> Hi all,
> >>
> >> Consider the following case:
> >>
> >> import java.sql.Timestamp
> >> case class MyProduct(t: Timestamp, a: Float)
> >> val rdd = sc.parallelize(List(MyProduct(new Timestamp(0), 10))).toDF()
> >> rdd.printSchema()
> >>
> >> The output is:
> >> root
> >>  |-- t: timestamp (nullable = true)
> >>  |-- a: float (nullable = false)
> >>
> >> How can I set the timestamp column to be NOT nullable?
> >>
> >> Regards,
> >> Luis
> >>
> >>
> >>
> >> --
> >> View this message in context:
> >> http://apache-spark-user-list.1001560.n3.nabble.com/How-to-
> set-nullable-field-when-create-DataFrame-using-case-class-tp27479.html
> >> Sent from the Apache Spark User List mailing list archive at Nabble.com.
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >>
> >
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Re: How to set nullable field when create DataFrame using case class

Reply via email to