Hi Jeetendra and Liya,

I am actually having a similar use case. We have some data stored as *parquet
format in HDFS* and would like to make use of Apache Arrow to improve
compute performance if possible. Right now, I didn't see there is a direct
way to do in Java with Spark.

I have search the Spark documentation, it looks like python support is
added after 2.3.0 (
https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html),
any plan from Apache Arrow team to provide *Spark integration for Java*?

Thank you very much.


*Best Regards,WANG GAOXIANG*
* (Eric) *
National University of Singapore Graduate ::
API Craft Singapore Co-organiser ::
Singapore Python User Group Co-organiser
*+6597685360 (P) :: wgx...@gmail.com <wgx...@gmail.com> (E) ::
**https://medium.com/@wgx731
<https://medium.com/@wgx731> **(W)*


On Thu, Dec 5, 2019 at 6:58 PM Fan Liya <liya.fa...@gmail.com> wrote:

> Hi Jeetendra,
>
> I am not sure if I understand your question correctly.
>
> Arrow is an in-memory columnar data format, and Spark has its own in-memory
> data format for DataFrame, which is invisible to end users.
> So the Spark user has no control over the underlying in-memory layout.
>
> If you really want to convert a DataFrame into Arrow format, maybe you can
> save the results of a Spark job to some external store (e.g. in ORC
> format), and then load it back to memory in Arrow format (if this is what
> you want).
>
> Best,
> Liya Fan
>
> On Thu, Dec 5, 2019 at 5:53 PM Jeetendra Kumar Jaiswal
> <jeetendra.jais...@impetus.co.in.invalid> wrote:
>
> > Hi Dev Team,
> >
> > Can someone please let me know how to convert spark data frame to Arrow
> > format. I am coding in Java.
> >
> > Java documentation of Arrow just has function API information. It is
> > little hard to develop without proper documentation.
> >
> > Is there a way to directly convert spark dataframe to Arrow format
> > dataframes.
> >
> > Thanks,
> > Jeetendra
> >
> > ________________________________
> >
> >
> >
> >
> >
> >
> > NOTE: This message may contain information that is confidential,
> > proprietary, privileged or otherwise protected by law. The message is
> > intended solely for the named addressee. If received in error, please
> > destroy and notify the sender. Any use of this email is prohibited when
> > received in error. Impetus does not represent, warrant and/or guarantee,
> > that the integrity of this communication has been maintained nor that the
> > communication is free of errors, virus, interception or interference.
> >
>

Reply via email to