Hey Michael, Thanks for the clarification. I was actually assuming the query would fail. Ok, so this means I will have to do the validation in an RDD transformation feeding into the SchemaRDD.
On Wed, Dec 10, 2014 at 6:27 PM, Michael Armbrust <mich...@databricks.com> wrote: > As the scala doc for applySchema says, "It is important to make sure that > the structure of every [[Row]] of the provided RDD matches the provided > schema. Otherwise, there will be runtime exceptions." We don't check as > doing runtime reflection on all of the data would be very expensive. You > will only get errors if you try to manipulate the data, but otherwise it > will pass it though. > > I have written some debugging code (developer API, not guaranteed to be > stable) though that you can use. > > import org.apache.spark.sql.execution.debug._ > schemaRDD.typeCheck() > > On Wed, Dec 10, 2014 at 6:19 PM, Alessandro Baretta <alexbare...@gmail.com > > wrote: > >> Hello, >> >> I defined a SchemaRDD by applying a hand-crafted StructType to an RDD. >> Some >> of the Rows in the RDD are malformed--that is, they do not conform to the >> schema defined by the StructType. When running a select statement on this >> SchemaRDD I would expect SparkSQL to either reject the malformed rows or >> fail. Instead, it returns whatever data it finds, even if malformed. Is >> this the desired behavior? Is there no method in SparkSQL to check for >> validity with respect to the schema? >> >> Thanks. >> >> Alex >> > >