Re: Inconsistent Persistence of DataFrames in Spark 1.5

2015-10-29 Thread Michael Armbrust
o:* user@spark.apache.org > *Subject:* Inconsistent Persistence of DataFrames in Spark 1.5 > > > > We recently switched to Spark 1.5.0 from 1.4.1 and have noticed some > inconsistent behavior in persisting DataFrames. > > > > df1 = sqlContext.read.parquet(“df1.parquet”) >

RE: Inconsistent Persistence of DataFrames in Spark 1.5

2015-10-28 Thread Saif.A.Ellafi
: user@spark.apache.org Subject: Inconsistent Persistence of DataFrames in Spark 1.5 We recently switched to Spark 1.5.0 from 1.4.1 and have noticed some inconsistent behavior in persisting DataFrames. df1 = sqlContext.read.parquet(“df1.parquet”) df1.count() > 161,100,982

Inconsistent Persistence of DataFrames in Spark 1.5

2015-10-28 Thread Colin Alstad
We recently switched to Spark 1.5.0 from 1.4.1 and have noticed some inconsistent behavior in persisting DataFrames. df1 = sqlContext.read.parquet(“df1.parquet”) df1.count() > 161,100,982 df2 = sqlContext.read.parquet(“df2.parquet”) df2.count() > 67,498,706 join_df = df1.join(df2, ‘id’) join_df.