Your proposed version is definitely an improvement.

> "Apache Arrow is a cross-language development platform for in-memory
> structured data access and analytics. It specifies a standardized
> language-independent columnar memory format for flat and hierarchical
> data, with support for zero-copy streaming messaging and interprocess
> communication. It also provides computational libraries for efficient
> in-memory analytics on modern hardware.”

I propose a few tweaks:

Simplify sentence 1 to

  Apache Arrow is a cross-language development platform for in-memory
  data.

This is easier to parse, captures the gist, and the other parts are covered
in later sentences.

To me, the cache-efficient format is more fundamental important than
streaming and IPC (you can build the latter). Therefore I’d change
sentence 2 to

  It specifies a standardized language-independent columnar memory
  format for flat and hierarchical data, organized for efficient analytic
  operations on modern hardware.

Which leaves sentence 3 as

  It also provides computational libraries for zero-copy streaming
  messaging and interprocess communication.

And add sentence 4,

  Languages supported include C and C++, Java, and Python.

Julian

> On Oct 21, 2017, at 10:58 AM, Wes McKinney <wesmck...@gmail.com> wrote:
> 
> I believe we would benefit from modified language to describe the
> nature and scope of the Arrow project.
> 
> Currently, our GitHub project description (and what we use in release
> announcements) states:
> 
> "Apache Arrow is a columnar in-memory analytics layer designed to
> accelerate big data. It houses a set of canonical in-memory
> representations of flat and hierarchical data along with multiple
> language-bindings for structure manipulation. It also provides IPC and
> common algorithm implementations."
> 
> I think this could be perhaps restated in the following way:
> 
> "Apache Arrow is a cross-language development platform for in-memory
> structured data access and analytics. It specifies a standardized
> language-independent columnar memory format for flat and hierarchical
> data, with support for zero-copy streaming messaging and interprocess
> communication. It also provides computational libraries for efficient
> in-memory analytics on modern hardware."
> 
> It is true that we have been mostly focused on hardening the details
> of the Arrow format and related issues around messaging and IPC, which
> are necessary for everything else we may contemplate building in the
> future. Since I plan to be building a library of computational tools
> in C++ for the native code community (Python, Ruby, R, etc.), I think
> it would be a good idea to clearly state that building general purpose
> analytics implementations (i.e. the sorts of things you find in "data
> frame libraries" like pandas) is part of the mission of the project.
> 
> Feedback on the above would be appreciated how we could do a better
> job representing our past, present, and future community goals.
> 
> Thanks
> Wes

Reply via email to