Re: Load Spark dataframes in Arrow buffer using Scala (to be used by Gandiva)

Richard Siebeling Wed, 25 Jul 2018 00:35:54 -0700

Hi,

@Li, same as Jieun , I'd like to start with a single machine but can
imagine that there are use cases for a distributed approach.
@Wes, thanks, I'll look into it,


Richard

On Wed, 25 Jul 2018 at 03:59, Wes McKinney <wesmck...@gmail.com> wrote:

> hi Richard,
>
> I might start here in the Spark codebase to see how Spark SQL tables
> are converted to Arrow record batches:
>
>
> https://github.com/apache/spark/blob/d8aaa771e249b3f54b57ce24763e53fd65a0dbf7/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala
>
> The code has been developed to send payloads over a socket to PySpark,
> but it could be adapted for your needs perhaps without too much
> effort. Li and Bryan and others have worked on this so should be able
> to answer your questions about it.
>
> - Wes
>
> On Tue, Jul 24, 2018 at 8:21 AM, Li Jin <ice.xell...@gmail.com> wrote:
> > Hi,
> >
> > Do you want to collect a Spark DataFrame into Arrow format on a single
> > machine or do you still want to keep the data distributed?
>

Re: Load Spark dataframes in Arrow buffer using Scala (to be used by Gandiva)

Reply via email to