I meant using |saveAsParquetFile|. As for partition number, you can
always control it with |spark.sql.shuffle.partitions| property.
Cheng
On 2/23/15 1:38 PM, nitin wrote:
I believe calling processedSchemaRdd.persist(DISK) and
processedSchemaRdd.checkpoint() only persists data and I will lose
I believe calling processedSchemaRdd.persist(DISK) and
processedSchemaRdd.checkpoint() only persists data and I will lose all the
RDD metadata and when I re-start my driver, that data is kind of useless for
me (correct me if I am wrong).
I thought of doing processedSchemaRdd.saveAsParquetFile (hdf
How about persisting the computed result table first before caching it?
So that you only need to cache the result table after restarting your
service without recomputing it. Somewhat like checkpointing.
Cheng
On 2/22/15 12:55 AM, nitin wrote:
Hi All,
I intend to build a long running spark ap