There are two type systems in play here. Spark SQL's and Scala's. >From the Scala side, this is type-safe. After calling as[String]the Dataset will only return Strings. It is impossible to ever get a class cast exception unless you do your own incorrect casting after the fact.
Underneath the covers, calling as[String] will cause Spark SQL to implicitly insert an "upcast". An upcast will automatically perform safe (lossless) casts (i.e., Int -> Long, Number -> String). In the case where there is no safe conversion, we'll throw an AnalysisException and require you to explicitly do the conversion. This upcasting happens when you specify a primitive type or when you specify a more complicated class that is mapping multiple columns to fields. On Sat, Aug 13, 2016 at 1:17 PM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > Just ran into it and can't explain why it works. Please help me understand > it. > > Q1: Why can I `as[String]` with Ints? Is this type safe? > > scala> (0 to 9).toDF("num").as[String] > res12: org.apache.spark.sql.Dataset[String] = [num: int] > > Q2: Why can I map over strings even though there are really ints? > > scala> (0 to 9).toDF("num").as[String].map(_.toUpperCase) > res11: org.apache.spark.sql.Dataset[String] = [value: string] > > Why are the two lines possible? > > Pozdrawiam, > Jacek Laskowski > ---- > https://medium.com/@jaceklaskowski/ > Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > --------------------------------------------------------------------- > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >