hi Richard,

I might start here in the Spark codebase to see how Spark SQL tables
are converted to Arrow record batches:

https://github.com/apache/spark/blob/d8aaa771e249b3f54b57ce24763e53fd65a0dbf7/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala

The code has been developed to send payloads over a socket to PySpark,
but it could be adapted for your needs perhaps without too much
effort. Li and Bryan and others have worked on this so should be able
to answer your questions about it.

- Wes

On Tue, Jul 24, 2018 at 8:21 AM, Li Jin <ice.xell...@gmail.com> wrote:
> Hi,
>
> Do you want to collect a Spark DataFrame into Arrow format on a single
> machine or do you still want to keep the data distributed?

Reply via email to