riginal Message-
> From: Reynold Xin [mailto:r...@databricks.com]
> Sent: Thursday, February 25, 2016 2:46 PM
> To: dev@arrow.apache.org
> Subject: Re: Comparing with Parquet
>
> To put it even more layman, on-disk formats are typically designed for
> more permanent storage on di
but the performance couldn't support fast query.
> >
> > So for PB level data and interactively query(second level), both couldn't
> > solve?
> >
> > Regards
> > Liang
> > -邮件原件-----
> > 发件人: Henry Robinson [mailto:he...@cloudera.com]
> >
rds
> Liang
> -邮件原件-
> 发件人: Henry Robinson [mailto:he...@cloudera.com]
> 发送时间: 2016年2月26日 0:20
> 收件人: dev@arrow.apache.org
> 主题: Re: Comparing with Parquet
>
> Think of Parquet as a format well-suited to writing very large datasets to
> disk, whereas Arrow is a for
]
> Sent: Thursday, February 25, 2016 2:46 PM
> To: dev@arrow.apache.org
> Subject: Re: Comparing with Parquet
>
> To put it even more layman, on-disk formats are typically designed for
> more permanent storage on disks/ssds, and as a result the format would want
> to reduce the size,
Also extremely helpful; thank you!
-Original Message-
From: Reynold Xin [mailto:r...@databricks.com]
Sent: Thursday, February 25, 2016 2:46 PM
To: dev@arrow.apache.org
Subject: Re: Comparing with Parquet
To put it even more layman, on-disk formats are typically designed for more
t mostly around aligning things for
> SIMD/vectorization?
> >
> > There is probably some ignorance in my question, but I'm comfortable
> > with that. :-)
> >
> > -Original Message-
> > From: Wes McKinney [mailto:w...@cloudera.com]
> > Sent: Th
That's extremely helpful, thank you Todd.
(And nice to "see" you again. I interviewed you years ago.)
-Original Message-
From: Todd Lipcon [mailto:t...@cloudera.com]
Sent: Thursday, February 25, 2016 2:23 PM
To: dev@arrow.apache.org
Subject: Re: Comparing with Parquet
I
. :-)
>
> -Original Message-
> From: Wes McKinney [mailto:w...@cloudera.com]
> Sent: Thursday, February 25, 2016 12:12 PM
> To: dev@arrow.apache.org
> Subject: Re: Comparing with Parquet
>
> We wrote about this in a recent blog post:
>
> http://blog.cloudera.com/blog/
inney [mailto:w...@cloudera.com]
Sent: Thursday, February 25, 2016 12:12 PM
To: dev@arrow.apache.org
Subject: Re: Comparing with Parquet
We wrote about this in a recent blog post:
http://blog.cloudera.com/blog/2016/02/introducing-apache-arrow-a-fast-interoperable-in-memory-columnar-data-structure-sta
We wrote about this in a recent blog post:
http://blog.cloudera.com/blog/2016/02/introducing-apache-arrow-a-fast-interoperable-in-memory-columnar-data-structure-standard/
"Apache Parquet is a compact, efficient columnar data storage designed
for storing large amounts of data stored in HDFS. Arrow
Think of Parquet as a format well-suited to writing very large datasets to
disk, whereas Arrow is a format most suited to efficient storage in memory. You
might read Parquet files from disk, and then materialize them in memory in
Arrow's format.
Both formats are designed around the idiosyncras
11 matches
Mail list logo