actually how many tables are involved here. what is the version of Hive used? Sorry I have no idea about Cloudera 5.5.1 spec.
HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>* http://talebzadehmich.wordpress.com On 13 April 2016 at 19:20, pseudo oduesp <pseudo20...@gmail.com> wrote: > hi guys , > i have this error after 5 hours of processing i make lot of joins 14 left > joins > with small table : > > > > i saw in the spark ui and console log evrithing ok but when he save > last join i get this error > > Py4JJavaError: An error occurred while calling o115.parquet. _metadata is > not a Parquet file (too small) > > i use 4 containers 26 go each and 8 cores i increase number of partition > and i use broadcast join whithout succes i get log file but he s large 57 > mo i can't share with you . > > i use pyspark 1.5.0 on cloudera 5.5.1 and yarn and i use > hivecontext for dealing with data. > > > >