hi folks, I understand the question to be about serialization.
see * https://github.com/apache/spark/blob/master/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java * https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala * https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/util/ArrowUtils.scala This code is used to convert between Spark Data Frames and Arrow columnar format for UDF evaluation purposes On Thu, Dec 5, 2019 at 6:58 AM GaoXiang Wang <wgx...@gmail.com> wrote: > > Hi Jeetendra and Liya, > > I am actually having a similar use case. We have some data stored as *parquet > format in HDFS* and would like to make use of Apache Arrow to improve > compute performance if possible. Right now, I didn't see there is a direct > way to do in Java with Spark. > > I have search the Spark documentation, it looks like python support is > added after 2.3.0 ( > https://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html), > any plan from Apache Arrow team to provide *Spark integration for Java*? > > Thank you very much. > > > *Best Regards,WANG GAOXIANG* > * (Eric) * > National University of Singapore Graduate :: > API Craft Singapore Co-organiser :: > Singapore Python User Group Co-organiser > *+6597685360 (P) :: wgx...@gmail.com <wgx...@gmail.com> (E) :: > **https://medium.com/@wgx731 > <https://medium.com/@wgx731> **(W)* > > > On Thu, Dec 5, 2019 at 6:58 PM Fan Liya <liya.fa...@gmail.com> wrote: > > > Hi Jeetendra, > > > > I am not sure if I understand your question correctly. > > > > Arrow is an in-memory columnar data format, and Spark has its own in-memory > > data format for DataFrame, which is invisible to end users. > > So the Spark user has no control over the underlying in-memory layout. > > > > If you really want to convert a DataFrame into Arrow format, maybe you can > > save the results of a Spark job to some external store (e.g. in ORC > > format), and then load it back to memory in Arrow format (if this is what > > you want). > > > > Best, > > Liya Fan > > > > On Thu, Dec 5, 2019 at 5:53 PM Jeetendra Kumar Jaiswal > > <jeetendra.jais...@impetus.co.in.invalid> wrote: > > > > > Hi Dev Team, > > > > > > Can someone please let me know how to convert spark data frame to Arrow > > > format. I am coding in Java. > > > > > > Java documentation of Arrow just has function API information. It is > > > little hard to develop without proper documentation. > > > > > > Is there a way to directly convert spark dataframe to Arrow format > > > dataframes. > > > > > > Thanks, > > > Jeetendra > > > > > > ________________________________ > > > > > > > > > > > > > > > > > > > > > NOTE: This message may contain information that is confidential, > > > proprietary, privileged or otherwise protected by law. The message is > > > intended solely for the named addressee. If received in error, please > > > destroy and notify the sender. Any use of this email is prohibited when > > > received in error. Impetus does not represent, warrant and/or guarantee, > > > that the integrity of this communication has been maintained nor that the > > > communication is free of errors, virus, interception or interference. > > > > >