subject:"Re\: Parquet\-like partitioning support in spark SQL's in\-memory columnar cache"

Re: Parquet-like partitioning support in spark SQL's in-memory columnar cache

2016-11-28 Thread Nitin Goyal

+Cheng Hi Reynold, I think you are referring to bucketing in in-memory columnar cache. I am proposing that if we have a parquet structure like following :- //file1/id=1/ //file1/id=2/ and if we read and cache it, it should create 2 RDD[CachedBatch] (each per value of "id") Is this what you we

Re: Parquet-like partitioning support in spark SQL's in-memory columnar cache

2016-11-24 Thread Reynold Xin

It's already there isn't it? The in-memory columnar cache format. On Thu, Nov 24, 2016 at 9:06 PM, Nitin Goyal wrote: > Hi, > > Do we have any plan of supporting parquet-like partitioning support in > Spark SQL in-memory cache? Something like one RDD[CachedBatch] per > in-memory cache partition