Re: Usage of the name Feather?

2022-08-30 Thread Dewey Dunnington
Ian has a very good point...I would be in favour of calling them "Arrow
files" wherever possible since there's no need to know or care what
interprocess communication is to use them!

On Mon, Aug 29, 2022 at 6:50 PM Ian Cook  wrote:

> +1 We should explicitly discourage further use of “Feather” to refer to
> Arrow IPC files.
>
> In this spirit of simplifying terminology: Does the “IPC” in the term
> “Arrow IPC files” serve a truly necessary purpose? Is there another type of
> “Arrow file” that the “IPC” serves to disambiguate? If not, can we simply
> refer to these files as “Arrow files” in most places in the documentation
> and website? (In a few important places we should clarify that when we say
> “Arrow file” we are referring to a file that uses the Arrow IPC file
> format.)
>
> Ian
>
> On Mon, Aug 29, 2022 at 17:33 Sutou Kouhei  wrote:
>
> > +1 for 1.
> >
> > Thanks,
> > --
> > kou
> >
> > In 
> >   "Re: Usage of the name Feather?" on Mon, 29 Aug 2022 20:18:37 +0200,
> >   Jorge Cardoso Leitão  wrote:
> >
> > > I agree.
> > >
> > > I suspect that the most widely used API with "feather" is Pandas'
> > > read_feather.
> > >
> > >
> > >
> > > On Mon, 29 Aug 2022, 19:55 Weston Pace,  wrote:
> > >
> > >> I agree as well.  I think most lingering uses of the term "feather"
> > >> are in pyarrow and R however, so it might be good to hear from some of
> > >> those maintainers.
> > >>
> > >>
> > >>
> > >> On Mon, Aug 29, 2022 at 9:35 AM Antoine Pitrou 
> > wrote:
> > >> >
> > >> >
> > >> > I agree with this as well.
> > >> >
> > >> > Regards
> > >> >
> > >> > Antoine.
> > >> >
> > >> >
> > >> > On Mon, 29 Aug 2022 11:29:45 -0400
> > >> > Andrew Lamb  wrote:
> > >> > > In the rust implementation we use the term "Arrow IPC" and I
> support
> > >> your
> > >> > > option 1:
> > >> > >
> > >> > > > The name Feather V2 is deprecated. Only the extension ".arrow"
> > will
> > >> be
> > >> > > used for IPC files.
> > >> > >
> > >> > > Andrew
> > >> > >
> > >> > > On Mon, Aug 29, 2022 at 11:21 AM Matthew Topol
> > >> 
> > >> > > wrote:
> > >> > >
> > >> > > > When I wrote "In-Memory Analytics with Apache Arrow" I
> definitely
> > >> > > > treated "Feather" as deprecated and mentioned it only in passing
> > >> > > > specifically indicating "Arrow IPC" as the terminology to use. I
> > only
> > >> > > > even mentioned "Feather" at all because there are still methods
> in
> > >> > > > pyarrow that reference it by name.
> > >> > > >
> > >> > > > That's just my opinion though...
> > >> > > >
> > >> > > > On Mon, Aug 29 2022 at 11:08:53 AM -0400, David Li
> > >> > > >  wrote:
> > >> > > > > This has come up before, e.g. see [1] [2] [3].
> > >> > > > >
> > >> > > > > I would say "Feather" is effectively deprecated and we are
> using
> > >> > > > > "Arrow IPC" now but I am not sure what others think. (From
> that
> > >> > > > > GitHub link, it seems to be mixed.) And ".arrow" is the
> official
> > >> > > > > extension now (since it is registered as part of our MIME
> type).
> > >> But
> > >> > > > > there's existing documentation and not everything has been
> > updated
> > >> to
> > >> > > > > be consistent (as you saw).
> > >> > > > >
> > >> > > > > [1]:
> > >> > > > > <
> > https://lists.apache.org/thread/0s6lgvd3g56ymd60vl5lgzhf4ro6hts5>
> > >> > > > > [2]:
> > >> > > > > <
> > https://arrow.apache.org/faq/#what-about-the-feather-file-format>
> > >> > > > > [3]:
> > >> > > > > <
> > >> > > >
> > >>
> >
> https://stackoverflow.com/questions/67910612/arrow-ipc-vs-feather/67911190#67911190
> > >> > > > >
> > >> > > > >
> > >> > > > > -David
> > >> > > > >
> > >> > > > > On Mon, Aug 29, 2022, at 10:50, 島 達也 wrote:
> > >> > > > >>  Hi all.
> > >> > > > >>
> > >> > > > >>  I know the documentation (mainly pyarrow documentation)
> > sometimes
> > >> > > > >> refers
> > >> > > > >>  to IPC files as Feather files, but are there any guidelines
> > for
> > >> > > > >> when to
> > >> > > > >>  refer to an IPC file as a Feather file and when to refer to
> > it as
> > >> > > > >> an IPC
> > >> > > > >>  file?
> > >> > > > >>  I believe that calling the same file an Arrow IPC file at
> > times
> > >> and
> > >> > > > >> a
> > >> > > > >>  Feather file at other times is confusing to those unfamiliar
> > with
> > >> > > > >> Apache
> > >> > > > >>  Arrow (myself included).
> > >> > > > >>  Surprisingly, these files may even have completely different
> > >> > > > >> extensions,
> > >> > > > >>  ".arrow" and ".feather", which are not similar.
> > >> > > > >>
> > >> > > > >>  Perhaps there are several options for future use of the name
> > >> > > > >> Feather,
> > >> > > > >>  such as
> > >> > > > >>
> > >> > > > >>   1. The name Feather V2 is deprecated. Only the extension
> > >> ".arrow"
> > >> > > > >> will
> > >> > > > >>  be used for IPC files.
> > >> > > > >>   2. In some contexts(?), IPC files are referred to as
> Feather;
> > >> only
> > >> > > > >>  ".arrow" is used for the IPC file extension to clearly
> > >> > > > >> distinguis

Re: [DISC][C++] Conventions for Mentioning Classes and Methods in Documentation

2022-08-30 Thread Antoine Pitrou



Hello Kae,

Le 29/08/2022 à 19:28, Kae Suarez a écrit :


I personally like the idea of using namespace directives in Sphinx to keep
things less cluttered and easier to write, then using the class directive
each time so links are always available.


I would agree with this. As for the namespace, though, one question is 
how the user might derive in which namespace a class lives?


For example, 
https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset7DatasetE 
doesn't mention that the class lives in the arrow::dataset namespace.


>
 As for functions, I'd like to keep

them in "orange text," unless they are unconnected to a class in use in the
article -- if they are, I would like to use directives there, as well.


I would rather see functions/methods hyperlinked as well, as much as 
possible.


Regards

Antoine.


[RESULT][VOTE] Format: Rules and procedures for Canonical extension types

2022-08-30 Thread Antoine Pitrou



Hello,

With 3 binding +1 votes, 3 non-binding +1 votes, and no -1 vote, the
vote has passed.

Also, vote discussion has shown that the first rule should be updated to 
mandate the name starts with "arrow." instead of "org.apache.arrow.".


The next step will be to prepare a PR adding these rules to the specs 
chapter of the project documentation.


Regards

Antoine.


Le 24/08/2022 à 17:24, Antoine Pitrou a écrit :


Hello,

I would like to propose we vote for the following set of rules for
registering well-known ("canonical") extension types.


* Canonical extension types are described and maintained in a separate
document under the format specifications directory:
https://github.com/apache/arrow/tree/master/docs/source/format (note
this gets turned into HTML docs by Sphinx =>
https://arrow.apache.org/docs/index.html)

* Each canonical extension type requires a separate discussion and vote
on the mailing-list

* The specification text to be added *must* follow these requirements

1) It *must* have a well-defined name starting with "org.apache.arrow."
2) Its parameters, if any, *must* be described in the proposal
3) Its serialization *must* be described in the proposal and should
not require unduly work or unusual software dependencies (for example, a
trivial custom text format or JSON would be acceptable)
4) Its expected semantics *should* be described as well and any
potential ambiguities or pain points addressed or at least mentioned

* The extension type *should* have one implementation submitted;
preferably two if non-trivial (for example if parameterized)


The vote will be open for at least 72 hours.

[ ] +1 Accept this proposal
[ ] +0
[ ] -1 Do not accept this proposal because...


Regards

Antoine.


Re: [DISC][C++] Conventions for Mentioning Classes and Methods in Documentation

2022-08-30 Thread David Li
"Always linking" sounds reasonable to me. It also makes the decision easier, 
for both author and reviewer. 

I feel like there must be some way to configure Breathe or Sphinx to show the 
namespace in Antoine's example, but I don't see it…

On Tue, Aug 30, 2022, at 08:52, Antoine Pitrou wrote:
> Hello Kae,
>
> Le 29/08/2022 à 19:28, Kae Suarez a écrit :
>> 
>> I personally like the idea of using namespace directives in Sphinx to keep
>> things less cluttered and easier to write, then using the class directive
>> each time so links are always available.
>
> I would agree with this. As for the namespace, though, one question is 
> how the user might derive in which namespace a class lives?
>
> For example, 
> https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset7DatasetE
>  
> doesn't mention that the class lives in the arrow::dataset namespace.
>
>  >
>   As for functions, I'd like to keep
>> them in "orange text," unless they are unconnected to a class in use in the
>> article -- if they are, I would like to use directives there, as well.
>
> I would rather see functions/methods hyperlinked as well, as much as 
> possible.
>
> Regards
>
> Antoine.


Re: [DISC][C++] Conventions for Mentioning Classes and Methods in Documentation

2022-08-30 Thread Kae Suarez
I do not know about the namespace issue in the API reference, but when
focusing on the User's Guide and Getting Started sections, we can announce
at the top of the page what namespace is relevant. I personally recommend
using only the arrow namespace for ease in documentation and being more
manual with the ones after that (e.g., using compute::CallFunction()
instead of CallFunction() or arrow::compute::CallFunction()). This way,
users will know that we're generally using Arrow's namespace, and any
namespace is contained therein and important to keep an eye on.

As for "always linking," sure! I can do that, and we can see how it looks
in the PR I'm working on. Thanks for the feedback, I'll be back in a while
with some real-world results.

Kae Suarez

On Tue, Aug 30, 2022 at 1:57 PM David Li  wrote:

> "Always linking" sounds reasonable to me. It also makes the decision
> easier, for both author and reviewer.
>
> I feel like there must be some way to configure Breathe or Sphinx to show
> the namespace in Antoine's example, but I don't see it…
>
> On Tue, Aug 30, 2022, at 08:52, Antoine Pitrou wrote:
> > Hello Kae,
> >
> > Le 29/08/2022 à 19:28, Kae Suarez a écrit :
> >>
> >> I personally like the idea of using namespace directives in Sphinx to
> keep
> >> things less cluttered and easier to write, then using the class
> directive
> >> each time so links are always available.
> >
> > I would agree with this. As for the namespace, though, one question is
> > how the user might derive in which namespace a class lives?
> >
> > For example,
> >
> https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset7DatasetE
> > doesn't mention that the class lives in the arrow::dataset namespace.
> >
> >  >
> >   As for functions, I'd like to keep
> >> them in "orange text," unless they are unconnected to a class in use in
> the
> >> article -- if they are, I would like to use directives there, as well.
> >
> > I would rather see functions/methods hyperlinked as well, as much as
> > possible.
> >
> > Regards
> >
> > Antoine.
>


Arrow sync call August 31 at 12:00 US/Eastern, 16:00 UTC

2022-08-30 Thread Ian Cook
Hi all,

Our biweekly sync call is tomorrow at 12:00 noon Eastern time.

The Zoom meeting URL for this and other biweekly Arrow sync calls is:
https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09

Alternatively, enter this information into the Zoom website or app to
join the call:
Meeting ID: 876 4903 3008
Passcode: 958092

Thanks,
Ian


Re: [RESULT][VOTE] C++: switch to C++17

2022-08-30 Thread Sasha Krassovsky
Hi,
What kind of timeline did we decide on? Is this something that can be worked 
on/merged immediately or should we wait until after 10.0 is out? 

Sasha Krassovsky

> On Aug 29, 2022, at 2:20 AM, Antoine Pitrou  wrote:
> 
> 
> Hello,
> 
> With 5 binding +1 votes, 7 non-binding +1 votes, and no -1 vote, the vote has 
> passed.
> 
> The next steps will be conceptually as follows:
> - require C++17 instead of C++11 in the build configuration(s)
> - remove pre-C++17 compatibility measures in the codebase
> - start using C++17 idioms and features where desirable to reduce clutter and 
> improve maintainability
> 
> (there might be more than 3 PRs though :-))
> 
> Regards
> 
> Antoine.
> 
> 
> 
> Le 24/08/2022 à 17:31, Antoine Pitrou a écrit :
>> Hello,
>> I would like to propose that the Arrow C++ implementation switch to
>> C++17 as its baseline supported version (currently C++11).
>> The rationale and subsequent discussion can be read in the archives here:
>> https://lists.apache.org/thread/9g14n3odhj6kzsgjxr6k6d3q73hg2njr
>> The exact steps and timeline for switching can be decided later on, but
>> this proposal implies that it could happen soon, possibly next week :-)
>> ... or, more realistically, in the next Arrow C++ release, 10.0.0.
>> The vote will be open for at least 72 hours.
>> [ ] +1 Switch to C++17 in the impeding future
>> [ ] +0
>> [ ] -1 Do not switch to C++17 because...
>> Regards
>> Antoine.



Re: Usage of the name Feather?

2022-08-30 Thread Micah Kornfield
I think one source of ambiguity for Arrow files, at least for me, is
whether they are just a string of messages concatenated or they are the
files that contain the metadata footer.

On Tue, Aug 30, 2022 at 5:11 AM Dewey Dunnington
 wrote:

> Ian has a very good point...I would be in favour of calling them "Arrow
> files" wherever possible since there's no need to know or care what
> interprocess communication is to use them!
>
> On Mon, Aug 29, 2022 at 6:50 PM Ian Cook  wrote:
>
> > +1 We should explicitly discourage further use of “Feather” to refer to
> > Arrow IPC files.
> >
> > In this spirit of simplifying terminology: Does the “IPC” in the term
> > “Arrow IPC files” serve a truly necessary purpose? Is there another type
> of
> > “Arrow file” that the “IPC” serves to disambiguate? If not, can we simply
> > refer to these files as “Arrow files” in most places in the documentation
> > and website? (In a few important places we should clarify that when we
> say
> > “Arrow file” we are referring to a file that uses the Arrow IPC file
> > format.)
> >
> > Ian
> >
> > On Mon, Aug 29, 2022 at 17:33 Sutou Kouhei  wrote:
> >
> > > +1 for 1.
> > >
> > > Thanks,
> > > --
> > > kou
> > >
> > > In  >
> > >   "Re: Usage of the name Feather?" on Mon, 29 Aug 2022 20:18:37 +0200,
> > >   Jorge Cardoso Leitão  wrote:
> > >
> > > > I agree.
> > > >
> > > > I suspect that the most widely used API with "feather" is Pandas'
> > > > read_feather.
> > > >
> > > >
> > > >
> > > > On Mon, 29 Aug 2022, 19:55 Weston Pace, 
> wrote:
> > > >
> > > >> I agree as well.  I think most lingering uses of the term "feather"
> > > >> are in pyarrow and R however, so it might be good to hear from some
> of
> > > >> those maintainers.
> > > >>
> > > >>
> > > >>
> > > >> On Mon, Aug 29, 2022 at 9:35 AM Antoine Pitrou 
> > > wrote:
> > > >> >
> > > >> >
> > > >> > I agree with this as well.
> > > >> >
> > > >> > Regards
> > > >> >
> > > >> > Antoine.
> > > >> >
> > > >> >
> > > >> > On Mon, 29 Aug 2022 11:29:45 -0400
> > > >> > Andrew Lamb  wrote:
> > > >> > > In the rust implementation we use the term "Arrow IPC" and I
> > support
> > > >> your
> > > >> > > option 1:
> > > >> > >
> > > >> > > > The name Feather V2 is deprecated. Only the extension ".arrow"
> > > will
> > > >> be
> > > >> > > used for IPC files.
> > > >> > >
> > > >> > > Andrew
> > > >> > >
> > > >> > > On Mon, Aug 29, 2022 at 11:21 AM Matthew Topol
> > > >> 
> > > >> > > wrote:
> > > >> > >
> > > >> > > > When I wrote "In-Memory Analytics with Apache Arrow" I
> > definitely
> > > >> > > > treated "Feather" as deprecated and mentioned it only in
> passing
> > > >> > > > specifically indicating "Arrow IPC" as the terminology to
> use. I
> > > only
> > > >> > > > even mentioned "Feather" at all because there are still
> methods
> > in
> > > >> > > > pyarrow that reference it by name.
> > > >> > > >
> > > >> > > > That's just my opinion though...
> > > >> > > >
> > > >> > > > On Mon, Aug 29 2022 at 11:08:53 AM -0400, David Li
> > > >> > > >  wrote:
> > > >> > > > > This has come up before, e.g. see [1] [2] [3].
> > > >> > > > >
> > > >> > > > > I would say "Feather" is effectively deprecated and we are
> > using
> > > >> > > > > "Arrow IPC" now but I am not sure what others think. (From
> > that
> > > >> > > > > GitHub link, it seems to be mixed.) And ".arrow" is the
> > official
> > > >> > > > > extension now (since it is registered as part of our MIME
> > type).
> > > >> But
> > > >> > > > > there's existing documentation and not everything has been
> > > updated
> > > >> to
> > > >> > > > > be consistent (as you saw).
> > > >> > > > >
> > > >> > > > > [1]:
> > > >> > > > > <
> > > https://lists.apache.org/thread/0s6lgvd3g56ymd60vl5lgzhf4ro6hts5>
> > > >> > > > > [2]:
> > > >> > > > > <
> > > https://arrow.apache.org/faq/#what-about-the-feather-file-format>
> > > >> > > > > [3]:
> > > >> > > > > <
> > > >> > > >
> > > >>
> > >
> >
> https://stackoverflow.com/questions/67910612/arrow-ipc-vs-feather/67911190#67911190
> > > >> > > > >
> > > >> > > > >
> > > >> > > > > -David
> > > >> > > > >
> > > >> > > > > On Mon, Aug 29, 2022, at 10:50, 島 達也 wrote:
> > > >> > > > >>  Hi all.
> > > >> > > > >>
> > > >> > > > >>  I know the documentation (mainly pyarrow documentation)
> > > sometimes
> > > >> > > > >> refers
> > > >> > > > >>  to IPC files as Feather files, but are there any
> guidelines
> > > for
> > > >> > > > >> when to
> > > >> > > > >>  refer to an IPC file as a Feather file and when to refer
> to
> > > it as
> > > >> > > > >> an IPC
> > > >> > > > >>  file?
> > > >> > > > >>  I believe that calling the same file an Arrow IPC file at
> > > times
> > > >> and
> > > >> > > > >> a
> > > >> > > > >>  Feather file at other times is confusing to those
> unfamiliar
> > > with
> > > >> > > > >> Apache
> > > >> > > > >>  Arrow (myself included).
> > > >> > > > >>  Surprisingly, these files may even have completely
> different
> > > >> > > > >> extensions,
> > > >> > > > >>  ".arrow"