Hi Wes, Thanks for raising the ticket. So it seems like Spark 2.0 will not have support for Arrow. Also does SPARK-13534 cover Arrow serialization for Spark's JAVA API, or do we need to raise a separate ticket for that?
As of now, I only have a high-level understanding of Arrow and it's data structure but I'm willing to dive deeper and provide any help I can, mainly in testing, Java serializer or additional examples. Let me know how I can help. Thanks, Dima On 1 March 2016 at 00:46, Wes McKinney <w...@cloudera.com> wrote: > hi Dmitriy, > > I created the following JIRA > https://issues.apache.org/jira/browse/SPARK-13534 related to PySpark > which seems relevant. I would be happy to collaborate with you on > this. Since I understand that the Spark developers are exploring an > in-memory columnar layout for Spark DataFrames/Datasets and Spark SQL > any conversion code we write right now may end up being temporary. > Hopefully the Spark columnar memory layout will end up being very > nearly the same as the official Arrow layout so that limited or no > conversion will be necessary. > > Thanks > Wes > > On Wed, Feb 24, 2016 at 12:38 PM, Dmitriy Morozov <int.2...@gmail.com> > wrote: > > Hello everyone, > > > > I'm just starting with Arrow. I'd like to see how good Arrow at caching > > when used in conjunction with Allixio (Tachyon). The use case that I'm > > going to validate involves reading data from Spark's DataFrame, storing > in > > Tachyon in Arrow and then reading back into DataFrame. I checked the > source > > code of Arrow but couldn't find any examples or tests. Can anyone guide > me > > please where should I start looking at in order to convert DataFrame to a > > Arrow struct? > > > > Thanks! > > Dmitriy > -- Kind regards, Dima