Re: [Spark SQL] off-heap columnar store

Ian O'Connell Sun, 31 Aug 2014 20:28:56 -0700

I'm not sure what you mean here? Parquet is at its core just a format, you
could store that data anywhere.

Though it sounds like you saying, correct me if i'm wrong: you basically
want a columnar abstraction layer where you can provide a different backing
implementation to keep the columns rather than parquet-mr?

I.e. you want to be able to produce a schema RDD from something like
vertica, where updates should act as a write through cache back to vertica
itself?

I'm sorry it just sounds like its worth clearly defining what your key
requirement/goal is.

On Thu, Aug 28, 2014 at 11:31 PM, Evan Chan <velvia.git...@gmail.com> wrote:

> >
> >> The reason I'm asking about the columnar compressed format is that
> >> there are some problems for which Parquet is not practical.
> >
> >
> > Can you elaborate?
>
> Sure.
>
> - Organization or co has no Hadoop, but significant investment in some
> other NoSQL store.
> - Need to efficiently add a new column to existing data
> - Need to mark some existing rows as deleted or replace small bits of
> existing data
>
> For these use cases, it would be much more efficient and practical if
> we didn't have to take the origin of the data from the datastore,
> convert it to Parquet first.  Doing so loses significant latency and
> causes Ops headaches in having to maintain HDFS.     It would be great
> to be able to load data directly into the columnar format, into the
> InMemoryColumnarCache.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
> For additional commands, e-mail: dev-h...@spark.apache.org
>
>

Re: [Spark SQL] off-heap columnar store

Reply via email to