Have you tried calling SizeEstimator.estimate() on a DataFrame ? I did the following in REPL:
scala> SizeEstimator.estimate(df) res1: Long = 17769680 FYI On Fri, Aug 7, 2015 at 6:48 AM, Srikanth <srikanth...@gmail.com> wrote: > Hello, > > Is there a way to estimate the approximate size of a dataframe? I know we > can cache and look at the size in UI but I'm trying to do this > programatically. With RDD, I can sample and sum up size using > SizeEstimator. Then extrapolate it to the entire RDD. That will give me > approx size of RDD. With dataframes, its tricky due to columnar storage. > How do we do it? > > On a related note, I see size of RDD object to be ~60MB. Is that the > footprint of RDD in driver JVM? > > scala> val temp = sc.parallelize(Array(1,2,3,4,5,6)) > scala> SizeEstimator.estimate(temp) > res13: Long = 69507320 > > Srikanth >