o:* user@spark.apache.org
> *Subject:* Inconsistent Persistence of DataFrames in Spark 1.5
>
>
>
> We recently switched to Spark 1.5.0 from 1.4.1 and have noticed some
> inconsistent behavior in persisting DataFrames.
>
>
>
> df1 = sqlContext.read.parquet(“df1.parquet”)
>
: user@spark.apache.org
Subject: Inconsistent Persistence of DataFrames in Spark 1.5
We recently switched to Spark 1.5.0 from 1.4.1 and have noticed some
inconsistent behavior in persisting DataFrames.
df1 = sqlContext.read.parquet(“df1.parquet”)
df1.count()
> 161,100,982
We recently switched to Spark 1.5.0 from 1.4.1 and have noticed some
inconsistent behavior in persisting DataFrames.
df1 = sqlContext.read.parquet(“df1.parquet”)
df1.count()
> 161,100,982
df2 = sqlContext.read.parquet(“df2.parquet”)
df2.count()
> 67,498,706
join_df = df1.join(df2, ‘id’)
join_df.