So, there might be a shorter path to success, I'd be curious too. What I was able to do is
1. Create the RDD 2. Apply a schema that is 1 column wider 3. register as table 4. insert new data with 1 extra column I believe you'd have to do step 2 -- if you're inserting into a schema, and you have extra columns, it would be logical that they get dropped. I believe in a scenario where this is done over time you'd have a step 1a, where you register your table, but once your schema grows, you'd have to register the table again, this time from a schemaRDD that has more columns On Mon, Dec 22, 2014 at 12:11 AM, Adam Gilmore <dragoncu...@gmail.com> wrote: > Hi all, > > I understand that parquet allows for schema versioning automatically in > the format; however, I'm not sure whether Spark supports this. > > I'm saving a SchemaRDD to a parquet file, registering it as a table, then > doing an insertInto with a SchemaRDD with an extra column. > > The second SchemaRDD does in fact get inserted, but the extra column isn't > present when I try to query it with Spark SQL. > > Is there anything I can do to get this working how I'm hoping? >