I think this is a popular issue, but need help figuring a way around if this
issue is unresolved. I have a dataset that has more than 70 columns. To have
all the columns fit into my RDD, I am experimenting the following. (I intend
to use the InputData to parse the file and have 3 or 4 columnsets to
I am running spark 1.0.0, Tachyon 0.5 and Hadoop 1.0.4.
I am selecting a subset of a large dataset and trying to run queries on the
cached schema RDD. Strangely, in web UI, I see the following.
150 Partitions
Block Name Storage Level Size in Memory â–´Size on Disk
Executors
rdd_