Hi, @Li, same as Jieun , I'd like to start with a single machine but can imagine that there are use cases for a distributed approach. @Wes, thanks, I'll look into it,
Richard On Wed, 25 Jul 2018 at 03:59, Wes McKinney <wesmck...@gmail.com> wrote: > hi Richard, > > I might start here in the Spark codebase to see how Spark SQL tables > are converted to Arrow record batches: > > > https://github.com/apache/spark/blob/d8aaa771e249b3f54b57ce24763e53fd65a0dbf7/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala > > The code has been developed to send payloads over a socket to PySpark, > but it could be adapted for your needs perhaps without too much > effort. Li and Bryan and others have worked on this so should be able > to answer your questions about it. > > - Wes > > On Tue, Jul 24, 2018 at 8:21 AM, Li Jin <ice.xell...@gmail.com> wrote: > > Hi, > > > > Do you want to collect a Spark DataFrame into Arrow format on a single > > machine or do you still want to keep the data distributed? >