One more suggestion for the bucket: "Apache Arrow is a computational platform for efficient in-memory data representation and processing."
On Mon, May 17, 2021 at 2:49 PM Wes McKinney <wesmck...@gmail.com> wrote: > I think less is better in the description, but unfortunately the > association of Arrow as being "just a data format" has been actively > harmful in some ways to community growth. We have a data format, yes, > but we are also creating a computational platform to go hand-in-hand > with the data format to make it easier to build fast applications that > use the data format. So the description needs to capture both of these > ideas. > > On Mon, May 17, 2021 at 12:15 PM Julian Hyde <jhyde.apa...@gmail.com> > wrote: > > > > I think that the “cross-language development platform for” is noise. > (I’m sure that JPEG developers think that JPEG is a “cross-language > development platform” too. But it isn’t. It is an image format.) > > > > "Apache Arrow is data format for efficient in-memory processing.” > > > > I’ll note that In marketing speak, we are developing a high-concept > pitch [1] here. Every company needs a name, a brand, a high-concept pitch, > and 3- or 4-sentence description. But every Apache project needs these too. > It’s worth spending the time on the description, also, and then use them in > all the places that we describe Arrow. > > > > Julian > > > > [1] https://www.growthink.com/content/whats-your-high-concept-pitch > > > > > > > > > On May 17, 2021, at 7:38 AM, Eduardo Ponce <edponc...@gmail.com> > wrote: > > > > > > I agree with Nate's and Brian's suggestions, but would like to add > that we > > > can make it a one-liner for more conciseness and consistency with other > > > Apache projects. > > > Apologies if it seems I am going around the suggestions loop again. > > > > > > "Apache Arrow is a cross-language development platform enabling > efficient > > > in-memory data processing and transport." > > > > > > > > > > > > > > > On Mon, May 17, 2021 at 10:11 AM Brian Hulette <bhule...@apache.org> > wrote: > > > > > >> Thank you for bringing this up Dominik. I sampled some of the > descriptions > > >> for other Apache projects I frequent, the ones with a meaningful > > >> description have a single sentence: > > >> > > >> github.com/apache/spark - Apache Spark - A unified analytics engine > for > > >> large-scale data processing > > >> github.com/apache/beam - Apache Beam is a unified programming model > for > > >> Batch and Streaming > > >> github.com/apache/avro - Apache Avro is a data serialization system > > >> > > >> Several others (Flink, Hadoop, ...) just have "[Mirror of] Apache > <name>" > > >> as the description. > > >> > > >> +1 for Nate's suggestion "Apache Arrow is a cross-language development > > >> platform for in-memory data. It enables systems to process and > transport > > >> data more efficiently." > > >> > > >> On Mon, May 17, 2021 at 5:23 AM Wes McKinney <wesmck...@gmail.com> > wrote: > > >> > > >>> It's probably best for description to limit mentions of specific > > >>> features. There are some high level features mentioned in the > > >>> description now ("computational libraries and zero-copy streaming > > >>> messaging and interprocess communication"), but now in 2021 since the > > >>> project has grown so much, it could leave people with a limited view > > >>> of what they might find here. > > >>> > > >>> On Mon, May 17, 2021 at 12:14 AM Mauricio Vargas > > >>> <mauri...@ursacomputing.com> wrote: > > >>>> > > >>>> How about > > >>>> 'Apache Arrow is a cross-language development platform for in-memory > > >>> data. > > >>>> It enables systems to process and transport data efficiently, > > >> providing a > > >>>> simple and fast library for partitioning of large tables'? > > >>>> > > >>>> Sorry the delay, long election day > > >>>> > > >>>> On Sun, May 16, 2021, 2:27 PM Nate Bauernfeind < > > >>> natebauernfe...@deephaven.io> > > >>>> wrote: > > >>>> > > >>>>> Suggestion: faster -> more efficiently > > >>>>> > > >>>>> "Apache Arrow is a cross-language development platform for > in-memory > > >>>>> data. It enables systems to process and transport data more > > >>> efficiently." > > >>>>> > > >>>>> On Sun, May 16, 2021 at 11:35 AM Wes McKinney <wesmck...@gmail.com > > > > >>> wrote: > > >>>>> > > >>>>>> Here's what there now: > > >>>>>> > > >>>>>> "Apache Arrow is a cross-language development platform for > > >> in-memory > > >>>>>> data. It specifies a standardized language-independent columnar > > >>> memory > > >>>>>> format for flat and hierarchical data, organized for efficient > > >>>>>> analytic operations on modern hardware. It also provides > > >>> computational > > >>>>>> libraries and zero-copy streaming messaging and interprocess > > >>>>>> communication…" > > >>>>>> > > >>>>>> How about something shorter like > > >>>>>> > > >>>>>> "Apache Arrow is a cross-language development platform for > > >> in-memory > > >>>>>> data. It enables systems to process and transport data faster." > > >>>>>> > > >>>>>> Suggestions / refinements from others welcome > > >>>>>> > > >>>>>> > > >>>>>> On Sat, May 15, 2021 at 9:12 PM Dominik Moritz <domor...@cmu.edu> > > >>> wrote: > > >>>>>>> > > >>>>>>> Super minor issue but could someone make the description on > > >> GitHub > > >>>>>> shorter? > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>> GitHub puts the description into the title of the page and makes > > >> it > > >>>>> hard > > >>>>>> to find it in URL autocomplete. > > >>>>>>> > > >>>>>> > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> > > >>> > > >> > > >