hi Dmitriy, I created the following JIRA https://issues.apache.org/jira/browse/SPARK-13534 related to PySpark which seems relevant. I would be happy to collaborate with you on this. Since I understand that the Spark developers are exploring an in-memory columnar layout for Spark DataFrames/Datasets and Spark SQL any conversion code we write right now may end up being temporary. Hopefully the Spark columnar memory layout will end up being very nearly the same as the official Arrow layout so that limited or no conversion will be necessary.
Thanks Wes On Wed, Feb 24, 2016 at 12:38 PM, Dmitriy Morozov <int.2...@gmail.com> wrote: > Hello everyone, > > I'm just starting with Arrow. I'd like to see how good Arrow at caching > when used in conjunction with Allixio (Tachyon). The use case that I'm > going to validate involves reading data from Spark's DataFrame, storing in > Tachyon in Arrow and then reading back into DataFrame. I checked the source > code of Arrow but couldn't find any examples or tests. Can anyone guide me > please where should I start looking at in order to convert DataFrame to a > Arrow struct? > > Thanks! > Dmitriy