Thank you Wes and Julian for taking the approach to improve the elevator
pitch. I really like the improvements. Still, I would like to see
"columnar" used in the first sentence as this is the main focus of the
project.

Uwe

On Sat, Oct 21, 2017, at 10:32 PM, Wes McKinney wrote:
> Thanks Julian, I like the changes.
> 
> For the last part I agree listing languages is good; we would do well
> to include JavaScript and Ruby in that list. Hopefully the list will
> keep growing longer!
> 
> On Sat, Oct 21, 2017 at 4:20 PM, Julian Hyde <jh...@apache.org> wrote:
> > Your proposed version is definitely an improvement.
> >
> >> "Apache Arrow is a cross-language development platform for in-memory
> >> structured data access and analytics. It specifies a standardized
> >> language-independent columnar memory format for flat and hierarchical
> >> data, with support for zero-copy streaming messaging and interprocess
> >> communication. It also provides computational libraries for efficient
> >> in-memory analytics on modern hardware.”
> >
> > I propose a few tweaks:
> >
> > Simplify sentence 1 to
> >
> >   Apache Arrow is a cross-language development platform for in-memory
> >   data.
> >
> > This is easier to parse, captures the gist, and the other parts are covered
> > in later sentences.
> >
> > To me, the cache-efficient format is more fundamental important than
> > streaming and IPC (you can build the latter). Therefore I’d change
> > sentence 2 to
> >
> >   It specifies a standardized language-independent columnar memory
> >   format for flat and hierarchical data, organized for efficient analytic
> >   operations on modern hardware.
> >
> > Which leaves sentence 3 as
> >
> >   It also provides computational libraries for zero-copy streaming
> >   messaging and interprocess communication.
> >
> > And add sentence 4,
> >
> >   Languages supported include C and C++, Java, and Python.
> >
> > Julian
> >
> >> On Oct 21, 2017, at 10:58 AM, Wes McKinney <wesmck...@gmail.com> wrote:
> >>
> >> I believe we would benefit from modified language to describe the
> >> nature and scope of the Arrow project.
> >>
> >> Currently, our GitHub project description (and what we use in release
> >> announcements) states:
> >>
> >> "Apache Arrow is a columnar in-memory analytics layer designed to
> >> accelerate big data. It houses a set of canonical in-memory
> >> representations of flat and hierarchical data along with multiple
> >> language-bindings for structure manipulation. It also provides IPC and
> >> common algorithm implementations."
> >>
> >> I think this could be perhaps restated in the following way:
> >>
> >> "Apache Arrow is a cross-language development platform for in-memory
> >> structured data access and analytics. It specifies a standardized
> >> language-independent columnar memory format for flat and hierarchical
> >> data, with support for zero-copy streaming messaging and interprocess
> >> communication. It also provides computational libraries for efficient
> >> in-memory analytics on modern hardware."
> >>
> >> It is true that we have been mostly focused on hardening the details
> >> of the Arrow format and related issues around messaging and IPC, which
> >> are necessary for everything else we may contemplate building in the
> >> future. Since I plan to be building a library of computational tools
> >> in C++ for the native code community (Python, Ruby, R, etc.), I think
> >> it would be a good idea to clearly state that building general purpose
> >> analytics implementations (i.e. the sorts of things you find in "data
> >> frame libraries" like pandas) is part of the mission of the project.
> >>
> >> Feedback on the above would be appreciated how we could do a better
> >> job representing our past, present, and future community goals.
> >>
> >> Thanks
> >> Wes
> >

Reply via email to