Relevant link: http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files
On Wed, Nov 11, 2015 at 7:31 PM, Reynold Xin <r...@databricks.com> wrote: > Thanks for the email. Can you explain what the difference is between this > and existing formats such as Parquet/ORC? > > > On Wed, Nov 11, 2015 at 4:59 AM, Cristian O < > cristian.b.op...@googlemail.com> wrote: > >> Hi, >> >> I was wondering if there's any planned support for local disk columnar >> storage. >> >> This could be an extension of the in-memory columnar store, or possibly >> something similar to the recently added local checkpointing for RDDs >> >> This could also have the added benefit of enabling iterative usage for >> DataFrames by pruning the query plan through local checkpoints. >> >> A further enhancement would be to add update support to the columnar >> format (in the immutable copy-on-write sense of course), by maintaining >> references to unchanged row blocks and only copying and mutating the ones >> that have changed. >> >> A use case here is streaming and merging updates in a large dataset that >> can be efficiently stored internally in a columnar format, rather than >> accessing a more inefficient external data store like HDFS or Cassandra. >> >> Thanks, >> Cristian >> > >