Re: Parquet schema changes

Yana Kadiyska Mon, 22 Dec 2014 11:22:20 -0800

So, there might be a shorter path to success, I'd be curious too. What I
was able to do is

1. Create the RDD
2. Apply a schema that is 1 column wider
3. register as table
4. insert new data with 1 extra column

I believe you'd have to do step 2 -- if you're inserting into a schema, and
you have extra columns, it would be logical that they get dropped. I
believe in a scenario where this is done over time you'd have a step 1a,
where you register your table, but once your schema grows, you'd have to
register the table again, this time from a schemaRDD that has more columns

On Mon, Dec 22, 2014 at 12:11 AM, Adam Gilmore <dragoncu...@gmail.com>
wrote:

> Hi all,
>
> I understand that parquet allows for schema versioning automatically in
> the format; however, I'm not sure whether Spark supports this.
>
> I'm saving a SchemaRDD to a parquet file, registering it as a table, then
> doing an insertInto with a SchemaRDD with an extra column.
>
> The second SchemaRDD does in fact get inserted, but the extra column isn't
> present when I try to query it with Spark SQL.
>
> Is there anything I can do to get this working how I'm hoping?
>

Re: Parquet schema changes

Reply via email to