Hi Michael,
Does this affect people who use Hive for their metadata store as well? I'm
wondering if the issue is as bad as I think it is - namely that if you
build up a year's worth of data, adding a field forces you to have to
migrate that entire year's data.
Gary
On Wed, Oct 8, 2014 at 5:08 P
On Wed, Oct 8, 2014 at 3:19 PM, Michael Armbrust
wrote:
>
> I was proposing you manually convert each different format into one
> unified format (by adding literal nulls and such for missing columns) and
> then union these converted datasets. It would be weird to have union all
> try and do thi
>
> The kind of change we've made that it probably makes most sense to support
> is adding a nullable column. I think that also implies supporting
> "removing" a nullable column, as long as you don't end up with columns of
> the same name but different type.
>
Filed here: https://issues.apache.org
Sorry, by "raw parquet" I just meant there is no external metadata store,
only the schema written as part of the parquet format.
We've done several different kinds of changes, including column rename and
widening the data type of an existing column. I don't think it's feasible
to support those.
Hi Cody,
Assuming you are talking about 'safe' changes to the schema (i.e. existing
column names are never reused with incompatible types), this is something
I'd love to support. Perhaps you can describe more what sorts of changes
you are making, and if simple merging of the schemas would be suff
Hi Cody,
I wasn't aware there were different versions of the parquet format. What's
the difference between "raw parquet" and the Hive-written parquet files?
As for your migration question, the approaches I've often seen are
convert-on-read and convert-all-at-once. Apache Cassandra for example d