Is this nested data or flat data? On Mon, Feb 9, 2015 at 1:53 PM, Manoj Samel <manojsamelt...@gmail.com> wrote:
> Hi Michael, > > The storage tab shows the RDD resides fully in memory (10 partitions) with > zero disk usage. Tasks for subsequent select on this table in cache shows > minimal overheads (GC, queueing, shuffle write etc. etc.), so overhead is > not issue. However, it is still twice as slow as reading uncached table. > > I have spark.rdd.compress = true, spark.sql.inMemoryColumnarStorage.compressed > = true, spark.serializer = org.apache.spark.serializer.KryoSerializer > > Something that may be of relevance ... > > The underlying table is Parquet, 10 partitions totaling ~350 MB. For > mapPartition phase of query on uncached table shows input size of 351 MB. > However, after the table is cached, the storage shows the cache size as > 12GB. So the in-memory representation seems much bigger than on-disk, even > with the compression options turned on. Any thoughts on this ? > > mapPartition phase same query for cache table shows input size of 12GB > (full size of cache table) and takes twice the time as mapPartition for > uncached query. > > Thanks, > > > > > > > On Fri, Feb 6, 2015 at 6:47 PM, Michael Armbrust <mich...@databricks.com> > wrote: > >> Check the storage tab. Does the table actually fit in memory? Otherwise >> you are rebuilding column buffers in addition to reading the data off of >> the disk. >> >> On Fri, Feb 6, 2015 at 4:39 PM, Manoj Samel <manojsamelt...@gmail.com> >> wrote: >> >>> Spark 1.2 >>> >>> Data stored in parquet table (large number of rows) >>> >>> Test 1 >>> >>> select a, sum(b), sum(c) from table >>> >>> Test >>> >>> sqlContext.cacheTable() >>> select a, sum(b), sum(c) from table - "seed cache" First time slow >>> since loading cache ? >>> select a, sum(b), sum(c) from table - Second time it should be faster >>> as it should be reading from cache, not HDFS. But it is slower than test1 >>> >>> Any thoughts? Should a different query be used to seed cache ? >>> >>> Thanks, >>> >>> >> >