It worked fine and I was looking for this only as I do not want cache the
dataframe as the data in some of partitions will change. However, I have
much larger number of partitions(column is not just country but something
where values can be 100's of thousands). Now the metdata is much bigger
than i
s the original question.
Thanks
Yong
From: mich...@databricks.com
Date: Tue, 5 Apr 2016 13:28:46 -0700
Subject: Re: Partition pruning in spark 1.5.2
To: darshan.m...@gmail.com
CC: user@spark.apache.org
The following should ensure partition pruning happens:
df.write.partitionBy("country")
Thanks a lot. I will try this one as well.
On Tue, Apr 5, 2016 at 9:28 PM, Michael Armbrust
wrote:
> The following should ensure partition pruning happens:
>
> df.write.partitionBy("country").save("/path/to/data")
> sqlContext.read.load("/path/to/data").where("country = 'UK'")
>
> On Tue, Apr 5
The following should ensure partition pruning happens:
df.write.partitionBy("country").save("/path/to/data")
sqlContext.read.load("/path/to/data").where("country = 'UK'")
On Tue, Apr 5, 2016 at 1:13 PM, Darshan Singh
wrote:
> Thanks for the reply.
>
> Now I saved the part_movies as parquet file
Thanks for the reply.
Now I saved the part_movies as parquet file.
Then created new dataframe from the saved parquet file and I did not
persist it. The i ran the same query. It still read all 20 partitions and
this time from hdfs.
So what will be exact scenario when it will prune partitions. I a
For the in-memory cache, we still launch tasks, we just skip blocks when
possible using statistics about those blocks.
On Tue, Apr 5, 2016 at 12:14 PM, Darshan Singh
wrote:
> Thanks. It is not my exact scenario but I have tried to reproduce it. I
> have used 1.5.2.
>
> I have a part-movies data-
Thanks. It is not my exact scenario but I have tried to reproduce it. I
have used 1.5.2.
I have a part-movies data-frame which has 20 partitions 1 each for a movie.
I created following query
val part_sql = sqlContext.sql("select * from part_movies where movie = 10")
part_sql.count()
I expect t
Can you show your full code. How are you partitioning the data? How are
you reading it? What is the resulting query plan (run explain() or
EXPLAIN).
On Tue, Apr 5, 2016 at 10:02 AM, dsing001 wrote:
> HI,
>
> I am using 1.5.2. I have a dataframe which is partitioned based on the
> country. So I