I'm not sure what you mean here? Parquet is at its core just a format, you could store that data anywhere.
Though it sounds like you saying, correct me if i'm wrong: you basically want a columnar abstraction layer where you can provide a different backing implementation to keep the columns rather than parquet-mr? I.e. you want to be able to produce a schema RDD from something like vertica, where updates should act as a write through cache back to vertica itself? I'm sorry it just sounds like its worth clearly defining what your key requirement/goal is. On Thu, Aug 28, 2014 at 11:31 PM, Evan Chan <velvia.git...@gmail.com> wrote: > > > >> The reason I'm asking about the columnar compressed format is that > >> there are some problems for which Parquet is not practical. > > > > > > Can you elaborate? > > Sure. > > - Organization or co has no Hadoop, but significant investment in some > other NoSQL store. > - Need to efficiently add a new column to existing data > - Need to mark some existing rows as deleted or replace small bits of > existing data > > For these use cases, it would be much more efficient and practical if > we didn't have to take the origin of the data from the datastore, > convert it to Parquet first. Doing so loses significant latency and > causes Ops headaches in having to maintain HDFS. It would be great > to be able to load data directly into the columnar format, into the > InMemoryColumnarCache. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org > For additional commands, e-mail: dev-h...@spark.apache.org > >