Re: pandas and Apache Arrow in context

Ted Dunning Mon, 22 Feb 2016 15:22:45 -0800

Put that answer on the front page of the web site.

Well said.




On Mon, Feb 22, 2016 at 2:05 PM, Wes McKinney <w...@cloudera.com> wrote:

> hi Stuart,
>
> Currently pandas and NumPy only support flat, non-nested data. Nested
> data includes column value types including arrays, structs, maps, and
> unions. This enables you to analyze JSON-like data natively in-memory
> without pre-flattening or normalization.
>
> There's also an open question about how to handle nested data results
> from SQL engines like Spark SQL, Drill, and Impala, since there are
> currently native / C-level data structures (outside of pure Python
> data structures) in wide use to place the data arriving via RPC. Arrow
> serves to fill this need.
>
> - Wes
>
> On Mon, Feb 22, 2016 at 1:51 PM, Stuart Axelbrooke
> <stu...@axelbrooke.com> wrote:
> > Hey Wes,
> >
> > Very exciting to see things moving along on the Python front.  As you
> state
> > in your post, fast, ubiquitous columnar data will be a great foundation,
> > especially for more modern data processing and ETL tools.  Though I am a
> > bit curious what you mean by nested columnar data...
> >
> > Thanks,
> > Stuart
> >
> > On Mon, Feb 22, 2016 at 10:25 AM Wes McKinney <w...@cloudera.com> wrote:
> >
> >> hi all,
> >>
> >> I did a little bit of analysis of the costs of serialization
> bottlenecks in
> >> data access for Python pandas users and how (at a high level, no perf
> >> numbers yet!) Apache Arrow will help:
> >>
> >> http://wesmckinney.com/blog/pandas-and-apache-arrow/
> >>
> >> Feedback and comments welcome.
> >>
> >> cheers,
> >> Wes
> >>
>

Re: pandas and Apache Arrow in context

Reply via email to