There are two type systems in play here.  Spark SQL's and Scala's.

>From the Scala side, this is type-safe.  After calling as[String]the
Dataset will only return Strings. It is impossible to ever get a class cast
exception unless you do your own incorrect casting after the fact.

Underneath the covers, calling as[String] will cause Spark SQL to
implicitly insert an "upcast".  An upcast will automatically perform safe
(lossless) casts (i.e., Int -> Long, Number -> String).  In the case where
there is no safe conversion, we'll throw an AnalysisException and require
you to explicitly do the conversion.  This upcasting happens when you
specify a primitive type or when you specify a more complicated class that
is mapping multiple columns to fields.

On Sat, Aug 13, 2016 at 1:17 PM, Jacek Laskowski <ja...@japila.pl> wrote:

> Hi,
>
> Just ran into it and can't explain why it works. Please help me understand
> it.
>
> Q1: Why can I `as[String]` with Ints? Is this type safe?
>
> scala> (0 to 9).toDF("num").as[String]
> res12: org.apache.spark.sql.Dataset[String] = [num: int]
>
> Q2: Why can I map over strings even though there are really ints?
>
> scala> (0 to 9).toDF("num").as[String].map(_.toUpperCase)
> res11: org.apache.spark.sql.Dataset[String] = [value: string]
>
> Why are the two lines possible?
>
> Pozdrawiam,
> Jacek Laskowski
> ----
> https://medium.com/@jaceklaskowski/
> Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark
> Follow me at https://twitter.com/jaceklaskowski
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>

Reply via email to