You keep mentioning that you're viewing this after the fact in the spark
history server. Also the spark-shell isn't a UI so I'm not sure what you
mean by saying that the storage tab is blank in the spark-shell. Just so
I'm clear about what you're doing, are you looking at this info while your
appli
The same problem is mentioned here :
https://forums.databricks.com/questions/117/why-is-my-rdd-not-showing-up-in-the-storage-tab-of.html
https://stackoverflow.com/questions/44792213/blank-storage-tab-in-spark-history-server
On Tue, Oct 16, 2018 at 8:06 AM Venkat Dabri wrote:
>
> I did try that mec
I did try that mechanism before but the data never shows up in the
storage tab. The storage tab is always blank. I have tried it in
Zeppelin as well as spark-shell.
scala> val classCount = spark.read.parquet("s3:// /classCount")
scala> classCount.persist
scala> classCount.count
Nothing shows
In your program persist the smaller table and use count to force it to
materialize. Then in the Spark UI go to the Storage tab. The size of your
table as spark sees it should be displayed there. Out of curiosity what
version / language of Spark are you using?
On Mon, Oct 15, 2018 at 11:53 AM Venka
I am trying to do a broadcast join on two tables. The size of the
smaller table will vary based upon the parameters but the size of the
larger table is close to 2TB. What I have noticed is that if I don't
set the spark.sql.autoBroadcastJoinThreshold to 10G some of these
operations do a SortMergeJoi