Re: Parquet schema migrations

Michael Armbrust Wed, 08 Oct 2014 13:21:07 -0700

>
> The kind of change we've made that it probably makes most sense to support
> is adding a nullable column. I think that also implies supporting
> "removing" a nullable column, as long as you don't end up with columns of
> the same name but different type.
>


Filed here: https://issues.apache.org/jira/browse/SPARK-3851


> I'm not sure semantically that it makes sense to do schema merging as part
> of union all, and definitely doesn't make sense to do it by default.  I
> wouldn't want two accidentally compatible schema to get merged without
> warning.  It's also a little odd since unlike a normal sql database union
> all can happen before there are any projections or filters... e.g. what
> order do columns come back in if someone does select *.
>

I was proposing you manually convert each different format into one unified
format  (by adding literal nulls and such for missing columns) and then
union these converted datasets.  It would be weird to have union all try
and do this automatically.

Re: Parquet schema migrations

Reply via email to