hi Richard, I might start here in the Spark codebase to see how Spark SQL tables are converted to Arrow record batches:
https://github.com/apache/spark/blob/d8aaa771e249b3f54b57ce24763e53fd65a0dbf7/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala The code has been developed to send payloads over a socket to PySpark, but it could be adapted for your needs perhaps without too much effort. Li and Bryan and others have worked on this so should be able to answer your questions about it. - Wes On Tue, Jul 24, 2018 at 8:21 AM, Li Jin <ice.xell...@gmail.com> wrote: > Hi, > > Do you want to collect a Spark DataFrame into Arrow format on a single > machine or do you still want to keep the data distributed?