I have not tried it on Spark but the column added in Hive to an existing
table cannot be updated for existing rows. In other words the new column is
set to null which does not require the change in the existing file length.

So basically as I understand when a  column is added to an already table.

1.    The metadata for the underlying table will be updated
2.    The new column will by default have null value
3.    The existing rows cannot have new column updated to a non null value
4.    New rows can have non null values set for the new column
5.    No sql operation can be done on that column. For example select *
from <TABLE> where new_column IS NOT NULL
6.    The easiest option is to create a new table with the new column and
do insert/select from the existing table with values set for the new column

HTH

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 10 April 2016 at 05:06, Maurin Lenglart <mau...@cuberonlabs.com> wrote:

> Hi,
> I am trying to add columns to table that I created with the “saveAsTable”
> api.
> I update the columns using sqlContext.sql(‘alter table myTable add columns
> (mycol string)’).
> The next time I create a df and save it in the same table, with the new
> columns I get a :
> “ParquetRelation
>  requires that the query in the SELECT clause of the INSERT INTO/OVERWRITE
> statement generates the same number of columns as its schema.”
>
> Also thise two commands don t return the same columns :
> 1. sqlContext.table(‘myTable’).schema.fields    <— wrong result
> 2. sqlContext.sql(’show columns in mytable’)  <—— good results
>
> It seems to be a known bug :
> https://issues.apache.org/jira/browse/SPARK-9764 (see related bugs)
>
> But I am wondering, how else can I update the columns or make sure that
> spark take the new columns?
>
> I already tried to refreshTable and to restart spark.
>
> thanks
>
>

Reply via email to