Your proposed version is definitely an improvement. > "Apache Arrow is a cross-language development platform for in-memory > structured data access and analytics. It specifies a standardized > language-independent columnar memory format for flat and hierarchical > data, with support for zero-copy streaming messaging and interprocess > communication. It also provides computational libraries for efficient > in-memory analytics on modern hardware.”
I propose a few tweaks: Simplify sentence 1 to Apache Arrow is a cross-language development platform for in-memory data. This is easier to parse, captures the gist, and the other parts are covered in later sentences. To me, the cache-efficient format is more fundamental important than streaming and IPC (you can build the latter). Therefore I’d change sentence 2 to It specifies a standardized language-independent columnar memory format for flat and hierarchical data, organized for efficient analytic operations on modern hardware. Which leaves sentence 3 as It also provides computational libraries for zero-copy streaming messaging and interprocess communication. And add sentence 4, Languages supported include C and C++, Java, and Python. Julian > On Oct 21, 2017, at 10:58 AM, Wes McKinney <wesmck...@gmail.com> wrote: > > I believe we would benefit from modified language to describe the > nature and scope of the Arrow project. > > Currently, our GitHub project description (and what we use in release > announcements) states: > > "Apache Arrow is a columnar in-memory analytics layer designed to > accelerate big data. It houses a set of canonical in-memory > representations of flat and hierarchical data along with multiple > language-bindings for structure manipulation. It also provides IPC and > common algorithm implementations." > > I think this could be perhaps restated in the following way: > > "Apache Arrow is a cross-language development platform for in-memory > structured data access and analytics. It specifies a standardized > language-independent columnar memory format for flat and hierarchical > data, with support for zero-copy streaming messaging and interprocess > communication. It also provides computational libraries for efficient > in-memory analytics on modern hardware." > > It is true that we have been mostly focused on hardening the details > of the Arrow format and related issues around messaging and IPC, which > are necessary for everything else we may contemplate building in the > future. Since I plan to be building a library of computational tools > in C++ for the native code community (Python, Ruby, R, etc.), I think > it would be a good idea to clearly state that building general purpose > analytics implementations (i.e. the sorts of things you find in "data > frame libraries" like pandas) is part of the mission of the project. > > Feedback on the above would be appreciated how we could do a better > job representing our past, present, and future community goals. > > Thanks > Wes