Put that answer on the front page of the web site. Well said.
On Mon, Feb 22, 2016 at 2:05 PM, Wes McKinney <w...@cloudera.com> wrote: > hi Stuart, > > Currently pandas and NumPy only support flat, non-nested data. Nested > data includes column value types including arrays, structs, maps, and > unions. This enables you to analyze JSON-like data natively in-memory > without pre-flattening or normalization. > > There's also an open question about how to handle nested data results > from SQL engines like Spark SQL, Drill, and Impala, since there are > currently native / C-level data structures (outside of pure Python > data structures) in wide use to place the data arriving via RPC. Arrow > serves to fill this need. > > - Wes > > On Mon, Feb 22, 2016 at 1:51 PM, Stuart Axelbrooke > <stu...@axelbrooke.com> wrote: > > Hey Wes, > > > > Very exciting to see things moving along on the Python front. As you > state > > in your post, fast, ubiquitous columnar data will be a great foundation, > > especially for more modern data processing and ETL tools. Though I am a > > bit curious what you mean by nested columnar data... > > > > Thanks, > > Stuart > > > > On Mon, Feb 22, 2016 at 10:25 AM Wes McKinney <w...@cloudera.com> wrote: > > > >> hi all, > >> > >> I did a little bit of analysis of the costs of serialization > bottlenecks in > >> data access for Python pandas users and how (at a high level, no perf > >> numbers yet!) Apache Arrow will help: > >> > >> http://wesmckinney.com/blog/pandas-and-apache-arrow/ > >> > >> Feedback and comments welcome. > >> > >> cheers, > >> Wes > >> >