Thank you Wes and Julian for taking the approach to improve the elevator pitch. I really like the improvements. Still, I would like to see "columnar" used in the first sentence as this is the main focus of the project.
Uwe On Sat, Oct 21, 2017, at 10:32 PM, Wes McKinney wrote: > Thanks Julian, I like the changes. > > For the last part I agree listing languages is good; we would do well > to include JavaScript and Ruby in that list. Hopefully the list will > keep growing longer! > > On Sat, Oct 21, 2017 at 4:20 PM, Julian Hyde <jh...@apache.org> wrote: > > Your proposed version is definitely an improvement. > > > >> "Apache Arrow is a cross-language development platform for in-memory > >> structured data access and analytics. It specifies a standardized > >> language-independent columnar memory format for flat and hierarchical > >> data, with support for zero-copy streaming messaging and interprocess > >> communication. It also provides computational libraries for efficient > >> in-memory analytics on modern hardware.” > > > > I propose a few tweaks: > > > > Simplify sentence 1 to > > > > Apache Arrow is a cross-language development platform for in-memory > > data. > > > > This is easier to parse, captures the gist, and the other parts are covered > > in later sentences. > > > > To me, the cache-efficient format is more fundamental important than > > streaming and IPC (you can build the latter). Therefore I’d change > > sentence 2 to > > > > It specifies a standardized language-independent columnar memory > > format for flat and hierarchical data, organized for efficient analytic > > operations on modern hardware. > > > > Which leaves sentence 3 as > > > > It also provides computational libraries for zero-copy streaming > > messaging and interprocess communication. > > > > And add sentence 4, > > > > Languages supported include C and C++, Java, and Python. > > > > Julian > > > >> On Oct 21, 2017, at 10:58 AM, Wes McKinney <wesmck...@gmail.com> wrote: > >> > >> I believe we would benefit from modified language to describe the > >> nature and scope of the Arrow project. > >> > >> Currently, our GitHub project description (and what we use in release > >> announcements) states: > >> > >> "Apache Arrow is a columnar in-memory analytics layer designed to > >> accelerate big data. It houses a set of canonical in-memory > >> representations of flat and hierarchical data along with multiple > >> language-bindings for structure manipulation. It also provides IPC and > >> common algorithm implementations." > >> > >> I think this could be perhaps restated in the following way: > >> > >> "Apache Arrow is a cross-language development platform for in-memory > >> structured data access and analytics. It specifies a standardized > >> language-independent columnar memory format for flat and hierarchical > >> data, with support for zero-copy streaming messaging and interprocess > >> communication. It also provides computational libraries for efficient > >> in-memory analytics on modern hardware." > >> > >> It is true that we have been mostly focused on hardening the details > >> of the Arrow format and related issues around messaging and IPC, which > >> are necessary for everything else we may contemplate building in the > >> future. Since I plan to be building a library of computational tools > >> in C++ for the native code community (Python, Ruby, R, etc.), I think > >> it would be a good idea to clearly state that building general purpose > >> analytics implementations (i.e. the sorts of things you find in "data > >> frame libraries" like pandas) is part of the mission of the project. > >> > >> Feedback on the above would be appreciated how we could do a better > >> job representing our past, present, and future community goals. > >> > >> Thanks > >> Wes > >