Re:

Mich Talebzadeh Thu, 06 Oct 2016 23:08:11 -0700

Hi Ayan,

Depends on the version of Spark you are using.


Have you tried updating stats in Hive?

ANALYZE TABLE ${DATABASE}.${TABLE} PARTITION (${PARTITION_NAME}) COMPUTE
STATISTICS FOR COLUMNS

and then do

show create table ${TABLE}

HTH


Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 7 October 2016 at 02:37, ayan guha <guha.a...@gmail.com> wrote:

> Hi
>
> Faced one issue:
>
> - Writing Hive Partitioned table using
>
> df.withColumn("partition_date",to_date(df["INTERVAL_DATE"]))
> .write.partitionBy('partition_date').saveAsTable("sometable"
> ,mode="overwrite")
>
> - Data got written to HDFS fine. I can see the folders with partition
> names such as
>
> /app/somedb/hive/somedb.db/sometable/partition_date=2016-09-28
> /app/somedb/hive/somedb.db/sometable/partition_date=2016-09-29
>
> and so on.
> - Also, _common_metadata & _metadata files are written properly
>
> - I can read data from spark fine using 
> read.parquet("/app/somedb/hive/somedb.db/sometable").
> Printschema showing all columns.
>
> - However, I can not read from hive.
>
> Problem 1: Hive does not think the table is partitioned
> Problem 2: Hive sees only 1 column
> array<string> from deserializer
> Problem 3: MSCK repair table failed, saying partitions are not in Metadata.
>
> Question: Is it a known issue with Spark to write to Hive partitioned
> table?
>
>
> --
> Best Regards,
> Ayan Guha
>

Re:

Reply via email to