I believe we would benefit from modified language to describe the nature and scope of the Arrow project.
Currently, our GitHub project description (and what we use in release announcements) states: "Apache Arrow is a columnar in-memory analytics layer designed to accelerate big data. It houses a set of canonical in-memory representations of flat and hierarchical data along with multiple language-bindings for structure manipulation. It also provides IPC and common algorithm implementations." I think this could be perhaps restated in the following way: "Apache Arrow is a cross-language development platform for in-memory structured data access and analytics. It specifies a standardized language-independent columnar memory format for flat and hierarchical data, with support for zero-copy streaming messaging and interprocess communication. It also provides computational libraries for efficient in-memory analytics on modern hardware." It is true that we have been mostly focused on hardening the details of the Arrow format and related issues around messaging and IPC, which are necessary for everything else we may contemplate building in the future. Since I plan to be building a library of computational tools in C++ for the native code community (Python, Ruby, R, etc.), I think it would be a good idea to clearly state that building general purpose analytics implementations (i.e. the sorts of things you find in "data frame libraries" like pandas) is part of the mission of the project. Feedback on the above would be appreciated how we could do a better job representing our past, present, and future community goals. Thanks Wes