I agree with Nate's and Brian's suggestions, but would like to add that we can make it a one-liner for more conciseness and consistency with other Apache projects. Apologies if it seems I am going around the suggestions loop again.
"Apache Arrow is a cross-language development platform enabling efficient in-memory data processing and transport." On Mon, May 17, 2021 at 10:11 AM Brian Hulette <bhule...@apache.org> wrote: > Thank you for bringing this up Dominik. I sampled some of the descriptions > for other Apache projects I frequent, the ones with a meaningful > description have a single sentence: > > github.com/apache/spark - Apache Spark - A unified analytics engine for > large-scale data processing > github.com/apache/beam - Apache Beam is a unified programming model for > Batch and Streaming > github.com/apache/avro - Apache Avro is a data serialization system > > Several others (Flink, Hadoop, ...) just have "[Mirror of] Apache <name>" > as the description. > > +1 for Nate's suggestion "Apache Arrow is a cross-language development > platform for in-memory data. It enables systems to process and transport > data more efficiently." > > On Mon, May 17, 2021 at 5:23 AM Wes McKinney <wesmck...@gmail.com> wrote: > > > It's probably best for description to limit mentions of specific > > features. There are some high level features mentioned in the > > description now ("computational libraries and zero-copy streaming > > messaging and interprocess communication"), but now in 2021 since the > > project has grown so much, it could leave people with a limited view > > of what they might find here. > > > > On Mon, May 17, 2021 at 12:14 AM Mauricio Vargas > > <mauri...@ursacomputing.com> wrote: > > > > > > How about > > > 'Apache Arrow is a cross-language development platform for in-memory > > data. > > > It enables systems to process and transport data efficiently, > providing a > > > simple and fast library for partitioning of large tables'? > > > > > > Sorry the delay, long election day > > > > > > On Sun, May 16, 2021, 2:27 PM Nate Bauernfeind < > > natebauernfe...@deephaven.io> > > > wrote: > > > > > > > Suggestion: faster -> more efficiently > > > > > > > > "Apache Arrow is a cross-language development platform for in-memory > > > > data. It enables systems to process and transport data more > > efficiently." > > > > > > > > On Sun, May 16, 2021 at 11:35 AM Wes McKinney <wesmck...@gmail.com> > > wrote: > > > > > > > > > Here's what there now: > > > > > > > > > > "Apache Arrow is a cross-language development platform for > in-memory > > > > > data. It specifies a standardized language-independent columnar > > memory > > > > > format for flat and hierarchical data, organized for efficient > > > > > analytic operations on modern hardware. It also provides > > computational > > > > > libraries and zero-copy streaming messaging and interprocess > > > > > communication…" > > > > > > > > > > How about something shorter like > > > > > > > > > > "Apache Arrow is a cross-language development platform for > in-memory > > > > > data. It enables systems to process and transport data faster." > > > > > > > > > > Suggestions / refinements from others welcome > > > > > > > > > > > > > > > On Sat, May 15, 2021 at 9:12 PM Dominik Moritz <domor...@cmu.edu> > > wrote: > > > > > > > > > > > > Super minor issue but could someone make the description on > GitHub > > > > > shorter? > > > > > > > > > > > > > > > > > > > > > > > > GitHub puts the description into the title of the page and makes > it > > > > hard > > > > > to find it in URL autocomplete. > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > >