Re: Comparing with Parquet

2016-02-26 Thread Sourav Mazumder
riginal Message- > From: Reynold Xin [mailto:r...@databricks.com] > Sent: Thursday, February 25, 2016 2:46 PM > To: dev@arrow.apache.org > Subject: Re: Comparing with Parquet > > To put it even more layman, on-disk formats are typically designed for > more permanent storage on di

Re: Comparing with Parquet

2016-02-25 Thread Venkat Krishnamurthy
but the performance couldn't support fast query. > > > > So for PB level data and interactively query(second level), both couldn't > > solve? > > > > Regards > > Liang > > -邮件原件----- > > 发件人: Henry Robinson [mailto:he...@cloudera.com] > >

Re: Comparing with Parquet

2016-02-25 Thread Pedro Miguel Duarte
rds > Liang > -邮件原件- > 发件人: Henry Robinson [mailto:he...@cloudera.com] > 发送时间: 2016年2月26日 0:20 > 收件人: dev@arrow.apache.org > 主题: Re: Comparing with Parquet > > Think of Parquet as a format well-suited to writing very large datasets to > disk, whereas Arrow is a for

答复: Comparing with Parquet

2016-02-25 Thread Chenliang (Liang, DataSight)
ery(second level), both couldn't solve? Regards Liang -邮件原件- 发件人: Henry Robinson [mailto:he...@cloudera.com] 发送时间: 2016年2月26日 0:20 收件人: dev@arrow.apache.org 主题: Re: Comparing with Parquet Think of Parquet as a format well-suited to writing very large datasets to disk, whereas Arrow i

Re: Comparing with Parquet

2016-02-25 Thread Jason Altekruse
] > Sent: Thursday, February 25, 2016 2:46 PM > To: dev@arrow.apache.org > Subject: Re: Comparing with Parquet > > To put it even more layman, on-disk formats are typically designed for > more permanent storage on disks/ssds, and as a result the format would want > to reduce the size,

RE: Comparing with Parquet

2016-02-25 Thread Andrew Brust
Also extremely helpful; thank you! -Original Message- From: Reynold Xin [mailto:r...@databricks.com] Sent: Thursday, February 25, 2016 2:46 PM To: dev@arrow.apache.org Subject: Re: Comparing with Parquet To put it even more layman, on-disk formats are typically designed for more

Re: Comparing with Parquet

2016-02-25 Thread Reynold Xin
:23 PM > To: dev@arrow.apache.org > Subject: Re: Comparing with Parquet > > I would say that another key difference is that Parquet puts a lot of > effort on encodings and compression, and Arrow is mostly about efficient > representation to directly run operators over. eg simple ar

RE: Comparing with Parquet

2016-02-25 Thread Andrew Brust
That's extremely helpful, thank you Todd. (And nice to "see" you again. I interviewed you years ago.) -Original Message- From: Todd Lipcon [mailto:t...@cloudera.com] Sent: Thursday, February 25, 2016 2:23 PM To: dev@arrow.apache.org Subject: Re: Comparing with Parquet I

Re: Comparing with Parquet

2016-02-25 Thread Todd Lipcon
. :-) > > -Original Message- > From: Wes McKinney [mailto:w...@cloudera.com] > Sent: Thursday, February 25, 2016 12:12 PM > To: dev@arrow.apache.org > Subject: Re: Comparing with Parquet > > We wrote about this in a recent blog post: > > http://blog.cloudera.com/blog/

RE: Comparing with Parquet

2016-02-25 Thread Andrew Brust
inney [mailto:w...@cloudera.com] Sent: Thursday, February 25, 2016 12:12 PM To: dev@arrow.apache.org Subject: Re: Comparing with Parquet We wrote about this in a recent blog post: http://blog.cloudera.com/blog/2016/02/introducing-apache-arrow-a-fast-interoperable-in-memory-columnar-data-structure-sta

Re: Comparing with Parquet

2016-02-25 Thread Wes McKinney
We wrote about this in a recent blog post: http://blog.cloudera.com/blog/2016/02/introducing-apache-arrow-a-fast-interoperable-in-memory-columnar-data-structure-standard/ "Apache Parquet is a compact, efficient columnar data storage designed for storing large amounts of data stored in HDFS. Arrow

Re: Comparing with Parquet

2016-02-25 Thread Henry Robinson
Think of Parquet as a format well-suited to writing very large datasets to disk, whereas Arrow is a format most suited to efficient storage in memory. You might read Parquet files from disk, and then materialize them in memory in Arrow's format. Both formats are designed around the idiosyncras

Comparing with Parquet

2016-02-25 Thread Sourav Mazumder
Hi All, New to this. And still trying to figure out where exactly Arrow fits in the ecosystem of various Big Data technologies. In that respect first thing which came to my mind is how does Arrow compare with parquet. In my understanding Parquet also supports a very efficient columnar format (wi