Re: [Spark SQL] off-heap columnar store

2014-09-02 Thread Evan Chan
On Sun, Aug 31, 2014 at 8:27 PM, Ian O'Connell wrote: > I'm not sure what you mean here? Parquet is at its core just a format, you > could store that data anywhere. > > Though it sounds like you saying, correct me if i'm wrong: you basically > want a columnar abstraction layer where you can provid

Re: [Spark SQL] off-heap columnar store

2014-08-31 Thread Ian O'Connell
I'm not sure what you mean here? Parquet is at its core just a format, you could store that data anywhere. Though it sounds like you saying, correct me if i'm wrong: you basically want a columnar abstraction layer where you can provide a different backing implementation to keep the columns rather

Re: [Spark SQL] off-heap columnar store

2014-08-28 Thread Evan Chan
> >> The reason I'm asking about the columnar compressed format is that >> there are some problems for which Parquet is not practical. > > > Can you elaborate? Sure. - Organization or co has no Hadoop, but significant investment in some other NoSQL store. - Need to efficiently add a new column to

Re: [Spark SQL] off-heap columnar store

2014-08-26 Thread Michael Armbrust
> > Any initial proposal or design about the caching to Tachyon that you > can share so far? Caching parquet files in tachyon with saveAsParquetFile and then reading them with parquetFile should already work. You can use SQL on these tables by using registerTempTable. Some of the general parquet

Re: [Spark SQL] off-heap columnar store

2014-08-26 Thread Evan Chan
What would be the timeline for the parquet caching work? The reason I'm asking about the columnar compressed format is that there are some problems for which Parquet is not practical. On Mon, Aug 25, 2014 at 1:13 PM, Michael Armbrust wrote: >> What is the plan for getting Tachyon/off-heap suppor

Re: [Spark SQL] off-heap columnar store

2014-08-25 Thread Henry Saputra
Hi Michael, This is great news. Any initial proposal or design about the caching to Tachyon that you can share so far? I don't think there is a JIRA ticket open to track this feature yet. - Henry On Mon, Aug 25, 2014 at 1:13 PM, Michael Armbrust wrote: >> >> What is the plan for getting Tachyo

Re: [Spark SQL] off-heap columnar store

2014-08-25 Thread Michael Armbrust
> > What is the plan for getting Tachyon/off-heap support for the columnar > compressed store? It's not in 1.1 is it? It is not in 1.1 and there are not concrete plans for adding it at this point. Currently, there is more engineering investment going into caching parquet data in Tachyon instead