Re: Support for local disk columnar storage for DataFrames

Andrew Duffy Thu, 12 Nov 2015 00:05:39 -0800

Relevant link:
http://spark.apache.org/docs/latest/sql-programming-guide.html#parquet-files


On Wed, Nov 11, 2015 at 7:31 PM, Reynold Xin <r...@databricks.com> wrote:

> Thanks for the email. Can you explain what the difference is between this
> and existing formats such as Parquet/ORC?
>
>
> On Wed, Nov 11, 2015 at 4:59 AM, Cristian O <
> cristian.b.op...@googlemail.com> wrote:
>
>> Hi,
>>
>> I was wondering if there's any planned support for local disk columnar
>> storage.
>>
>> This could be an extension of the in-memory columnar store, or possibly
>> something similar to the recently added local checkpointing for RDDs
>>
>> This could also have the added benefit of enabling iterative usage for
>> DataFrames by pruning the query plan through local checkpoints.
>>
>> A further enhancement would be to add update support to the columnar
>> format (in the immutable copy-on-write sense of course), by maintaining
>> references to unchanged row blocks and only copying and mutating the ones
>> that have changed.
>>
>> A use case here is streaming and merging updates in a large dataset that
>> can be efficiently stored internally in a columnar format, rather than
>> accessing a more inefficient external  data store like HDFS or Cassandra.
>>
>> Thanks,
>> Cristian
>>
>
>

Re: Support for local disk columnar storage for DataFrames

Reply via email to