+Cheng
Hi Reynold,
I think you are referring to bucketing in in-memory columnar cache.
I am proposing that if we have a parquet structure like following :-
//file1/id=1/
//file1/id=2/
and if we read and cache it, it should create 2 RDD[CachedBatch] (each per
value of "id")
Is this what you we
It's already there isn't it? The in-memory columnar cache format.
On Thu, Nov 24, 2016 at 9:06 PM, Nitin Goyal wrote:
> Hi,
>
> Do we have any plan of supporting parquet-like partitioning support in
> Spark SQL in-memory cache? Something like one RDD[CachedBatch] per
> in-memory cache partition