Re: Usage of the name Feather?
Ian has a very good point...I would be in favour of calling them "Arrow files" wherever possible since there's no need to know or care what interprocess communication is to use them! On Mon, Aug 29, 2022 at 6:50 PM Ian Cook wrote: > +1 We should explicitly discourage further use of “Feather” to refer to > Arrow IPC files. > > In this spirit of simplifying terminology: Does the “IPC” in the term > “Arrow IPC files” serve a truly necessary purpose? Is there another type of > “Arrow file” that the “IPC” serves to disambiguate? If not, can we simply > refer to these files as “Arrow files” in most places in the documentation > and website? (In a few important places we should clarify that when we say > “Arrow file” we are referring to a file that uses the Arrow IPC file > format.) > > Ian > > On Mon, Aug 29, 2022 at 17:33 Sutou Kouhei wrote: > > > +1 for 1. > > > > Thanks, > > -- > > kou > > > > In > > "Re: Usage of the name Feather?" on Mon, 29 Aug 2022 20:18:37 +0200, > > Jorge Cardoso Leitão wrote: > > > > > I agree. > > > > > > I suspect that the most widely used API with "feather" is Pandas' > > > read_feather. > > > > > > > > > > > > On Mon, 29 Aug 2022, 19:55 Weston Pace, wrote: > > > > > >> I agree as well. I think most lingering uses of the term "feather" > > >> are in pyarrow and R however, so it might be good to hear from some of > > >> those maintainers. > > >> > > >> > > >> > > >> On Mon, Aug 29, 2022 at 9:35 AM Antoine Pitrou > > wrote: > > >> > > > >> > > > >> > I agree with this as well. > > >> > > > >> > Regards > > >> > > > >> > Antoine. > > >> > > > >> > > > >> > On Mon, 29 Aug 2022 11:29:45 -0400 > > >> > Andrew Lamb wrote: > > >> > > In the rust implementation we use the term "Arrow IPC" and I > support > > >> your > > >> > > option 1: > > >> > > > > >> > > > The name Feather V2 is deprecated. Only the extension ".arrow" > > will > > >> be > > >> > > used for IPC files. > > >> > > > > >> > > Andrew > > >> > > > > >> > > On Mon, Aug 29, 2022 at 11:21 AM Matthew Topol > > >> > > >> > > wrote: > > >> > > > > >> > > > When I wrote "In-Memory Analytics with Apache Arrow" I > definitely > > >> > > > treated "Feather" as deprecated and mentioned it only in passing > > >> > > > specifically indicating "Arrow IPC" as the terminology to use. I > > only > > >> > > > even mentioned "Feather" at all because there are still methods > in > > >> > > > pyarrow that reference it by name. > > >> > > > > > >> > > > That's just my opinion though... > > >> > > > > > >> > > > On Mon, Aug 29 2022 at 11:08:53 AM -0400, David Li > > >> > > > wrote: > > >> > > > > This has come up before, e.g. see [1] [2] [3]. > > >> > > > > > > >> > > > > I would say "Feather" is effectively deprecated and we are > using > > >> > > > > "Arrow IPC" now but I am not sure what others think. (From > that > > >> > > > > GitHub link, it seems to be mixed.) And ".arrow" is the > official > > >> > > > > extension now (since it is registered as part of our MIME > type). > > >> But > > >> > > > > there's existing documentation and not everything has been > > updated > > >> to > > >> > > > > be consistent (as you saw). > > >> > > > > > > >> > > > > [1]: > > >> > > > > < > > https://lists.apache.org/thread/0s6lgvd3g56ymd60vl5lgzhf4ro6hts5> > > >> > > > > [2]: > > >> > > > > < > > https://arrow.apache.org/faq/#what-about-the-feather-file-format> > > >> > > > > [3]: > > >> > > > > < > > >> > > > > > >> > > > https://stackoverflow.com/questions/67910612/arrow-ipc-vs-feather/67911190#67911190 > > >> > > > > > > >> > > > > > > >> > > > > -David > > >> > > > > > > >> > > > > On Mon, Aug 29, 2022, at 10:50, 島 達也 wrote: > > >> > > > >> Hi all. > > >> > > > >> > > >> > > > >> I know the documentation (mainly pyarrow documentation) > > sometimes > > >> > > > >> refers > > >> > > > >> to IPC files as Feather files, but are there any guidelines > > for > > >> > > > >> when to > > >> > > > >> refer to an IPC file as a Feather file and when to refer to > > it as > > >> > > > >> an IPC > > >> > > > >> file? > > >> > > > >> I believe that calling the same file an Arrow IPC file at > > times > > >> and > > >> > > > >> a > > >> > > > >> Feather file at other times is confusing to those unfamiliar > > with > > >> > > > >> Apache > > >> > > > >> Arrow (myself included). > > >> > > > >> Surprisingly, these files may even have completely different > > >> > > > >> extensions, > > >> > > > >> ".arrow" and ".feather", which are not similar. > > >> > > > >> > > >> > > > >> Perhaps there are several options for future use of the name > > >> > > > >> Feather, > > >> > > > >> such as > > >> > > > >> > > >> > > > >> 1. The name Feather V2 is deprecated. Only the extension > > >> ".arrow" > > >> > > > >> will > > >> > > > >> be used for IPC files. > > >> > > > >> 2. In some contexts(?), IPC files are referred to as > Feather; > > >> only > > >> > > > >> ".arrow" is used for the IPC file extension to clearly > > >> > > > >> distinguis
Re: [DISC][C++] Conventions for Mentioning Classes and Methods in Documentation
Hello Kae, Le 29/08/2022 à 19:28, Kae Suarez a écrit : I personally like the idea of using namespace directives in Sphinx to keep things less cluttered and easier to write, then using the class directive each time so links are always available. I would agree with this. As for the namespace, though, one question is how the user might derive in which namespace a class lives? For example, https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset7DatasetE doesn't mention that the class lives in the arrow::dataset namespace. > As for functions, I'd like to keep them in "orange text," unless they are unconnected to a class in use in the article -- if they are, I would like to use directives there, as well. I would rather see functions/methods hyperlinked as well, as much as possible. Regards Antoine.
[RESULT][VOTE] Format: Rules and procedures for Canonical extension types
Hello, With 3 binding +1 votes, 3 non-binding +1 votes, and no -1 vote, the vote has passed. Also, vote discussion has shown that the first rule should be updated to mandate the name starts with "arrow." instead of "org.apache.arrow.". The next step will be to prepare a PR adding these rules to the specs chapter of the project documentation. Regards Antoine. Le 24/08/2022 à 17:24, Antoine Pitrou a écrit : Hello, I would like to propose we vote for the following set of rules for registering well-known ("canonical") extension types. * Canonical extension types are described and maintained in a separate document under the format specifications directory: https://github.com/apache/arrow/tree/master/docs/source/format (note this gets turned into HTML docs by Sphinx => https://arrow.apache.org/docs/index.html) * Each canonical extension type requires a separate discussion and vote on the mailing-list * The specification text to be added *must* follow these requirements 1) It *must* have a well-defined name starting with "org.apache.arrow." 2) Its parameters, if any, *must* be described in the proposal 3) Its serialization *must* be described in the proposal and should not require unduly work or unusual software dependencies (for example, a trivial custom text format or JSON would be acceptable) 4) Its expected semantics *should* be described as well and any potential ambiguities or pain points addressed or at least mentioned * The extension type *should* have one implementation submitted; preferably two if non-trivial (for example if parameterized) The vote will be open for at least 72 hours. [ ] +1 Accept this proposal [ ] +0 [ ] -1 Do not accept this proposal because... Regards Antoine.
Re: [DISC][C++] Conventions for Mentioning Classes and Methods in Documentation
"Always linking" sounds reasonable to me. It also makes the decision easier, for both author and reviewer. I feel like there must be some way to configure Breathe or Sphinx to show the namespace in Antoine's example, but I don't see it… On Tue, Aug 30, 2022, at 08:52, Antoine Pitrou wrote: > Hello Kae, > > Le 29/08/2022 à 19:28, Kae Suarez a écrit : >> >> I personally like the idea of using namespace directives in Sphinx to keep >> things less cluttered and easier to write, then using the class directive >> each time so links are always available. > > I would agree with this. As for the namespace, though, one question is > how the user might derive in which namespace a class lives? > > For example, > https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset7DatasetE > > doesn't mention that the class lives in the arrow::dataset namespace. > > > > As for functions, I'd like to keep >> them in "orange text," unless they are unconnected to a class in use in the >> article -- if they are, I would like to use directives there, as well. > > I would rather see functions/methods hyperlinked as well, as much as > possible. > > Regards > > Antoine.
Re: [DISC][C++] Conventions for Mentioning Classes and Methods in Documentation
I do not know about the namespace issue in the API reference, but when focusing on the User's Guide and Getting Started sections, we can announce at the top of the page what namespace is relevant. I personally recommend using only the arrow namespace for ease in documentation and being more manual with the ones after that (e.g., using compute::CallFunction() instead of CallFunction() or arrow::compute::CallFunction()). This way, users will know that we're generally using Arrow's namespace, and any namespace is contained therein and important to keep an eye on. As for "always linking," sure! I can do that, and we can see how it looks in the PR I'm working on. Thanks for the feedback, I'll be back in a while with some real-world results. Kae Suarez On Tue, Aug 30, 2022 at 1:57 PM David Li wrote: > "Always linking" sounds reasonable to me. It also makes the decision > easier, for both author and reviewer. > > I feel like there must be some way to configure Breathe or Sphinx to show > the namespace in Antoine's example, but I don't see it… > > On Tue, Aug 30, 2022, at 08:52, Antoine Pitrou wrote: > > Hello Kae, > > > > Le 29/08/2022 à 19:28, Kae Suarez a écrit : > >> > >> I personally like the idea of using namespace directives in Sphinx to > keep > >> things less cluttered and easier to write, then using the class > directive > >> each time so links are always available. > > > > I would agree with this. As for the namespace, though, one question is > > how the user might derive in which namespace a class lives? > > > > For example, > > > https://arrow.apache.org/docs/cpp/api/dataset.html#_CPPv4N5arrow7dataset7DatasetE > > doesn't mention that the class lives in the arrow::dataset namespace. > > > > > > > As for functions, I'd like to keep > >> them in "orange text," unless they are unconnected to a class in use in > the > >> article -- if they are, I would like to use directives there, as well. > > > > I would rather see functions/methods hyperlinked as well, as much as > > possible. > > > > Regards > > > > Antoine. >
Arrow sync call August 31 at 12:00 US/Eastern, 16:00 UTC
Hi all, Our biweekly sync call is tomorrow at 12:00 noon Eastern time. The Zoom meeting URL for this and other biweekly Arrow sync calls is: https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09 Alternatively, enter this information into the Zoom website or app to join the call: Meeting ID: 876 4903 3008 Passcode: 958092 Thanks, Ian
Re: [RESULT][VOTE] C++: switch to C++17
Hi, What kind of timeline did we decide on? Is this something that can be worked on/merged immediately or should we wait until after 10.0 is out? Sasha Krassovsky > On Aug 29, 2022, at 2:20 AM, Antoine Pitrou wrote: > > > Hello, > > With 5 binding +1 votes, 7 non-binding +1 votes, and no -1 vote, the vote has > passed. > > The next steps will be conceptually as follows: > - require C++17 instead of C++11 in the build configuration(s) > - remove pre-C++17 compatibility measures in the codebase > - start using C++17 idioms and features where desirable to reduce clutter and > improve maintainability > > (there might be more than 3 PRs though :-)) > > Regards > > Antoine. > > > > Le 24/08/2022 à 17:31, Antoine Pitrou a écrit : >> Hello, >> I would like to propose that the Arrow C++ implementation switch to >> C++17 as its baseline supported version (currently C++11). >> The rationale and subsequent discussion can be read in the archives here: >> https://lists.apache.org/thread/9g14n3odhj6kzsgjxr6k6d3q73hg2njr >> The exact steps and timeline for switching can be decided later on, but >> this proposal implies that it could happen soon, possibly next week :-) >> ... or, more realistically, in the next Arrow C++ release, 10.0.0. >> The vote will be open for at least 72 hours. >> [ ] +1 Switch to C++17 in the impeding future >> [ ] +0 >> [ ] -1 Do not switch to C++17 because... >> Regards >> Antoine.
Re: Usage of the name Feather?
I think one source of ambiguity for Arrow files, at least for me, is whether they are just a string of messages concatenated or they are the files that contain the metadata footer. On Tue, Aug 30, 2022 at 5:11 AM Dewey Dunnington wrote: > Ian has a very good point...I would be in favour of calling them "Arrow > files" wherever possible since there's no need to know or care what > interprocess communication is to use them! > > On Mon, Aug 29, 2022 at 6:50 PM Ian Cook wrote: > > > +1 We should explicitly discourage further use of “Feather” to refer to > > Arrow IPC files. > > > > In this spirit of simplifying terminology: Does the “IPC” in the term > > “Arrow IPC files” serve a truly necessary purpose? Is there another type > of > > “Arrow file” that the “IPC” serves to disambiguate? If not, can we simply > > refer to these files as “Arrow files” in most places in the documentation > > and website? (In a few important places we should clarify that when we > say > > “Arrow file” we are referring to a file that uses the Arrow IPC file > > format.) > > > > Ian > > > > On Mon, Aug 29, 2022 at 17:33 Sutou Kouhei wrote: > > > > > +1 for 1. > > > > > > Thanks, > > > -- > > > kou > > > > > > In > > > > "Re: Usage of the name Feather?" on Mon, 29 Aug 2022 20:18:37 +0200, > > > Jorge Cardoso Leitão wrote: > > > > > > > I agree. > > > > > > > > I suspect that the most widely used API with "feather" is Pandas' > > > > read_feather. > > > > > > > > > > > > > > > > On Mon, 29 Aug 2022, 19:55 Weston Pace, > wrote: > > > > > > > >> I agree as well. I think most lingering uses of the term "feather" > > > >> are in pyarrow and R however, so it might be good to hear from some > of > > > >> those maintainers. > > > >> > > > >> > > > >> > > > >> On Mon, Aug 29, 2022 at 9:35 AM Antoine Pitrou > > > wrote: > > > >> > > > > >> > > > > >> > I agree with this as well. > > > >> > > > > >> > Regards > > > >> > > > > >> > Antoine. > > > >> > > > > >> > > > > >> > On Mon, 29 Aug 2022 11:29:45 -0400 > > > >> > Andrew Lamb wrote: > > > >> > > In the rust implementation we use the term "Arrow IPC" and I > > support > > > >> your > > > >> > > option 1: > > > >> > > > > > >> > > > The name Feather V2 is deprecated. Only the extension ".arrow" > > > will > > > >> be > > > >> > > used for IPC files. > > > >> > > > > > >> > > Andrew > > > >> > > > > > >> > > On Mon, Aug 29, 2022 at 11:21 AM Matthew Topol > > > >> > > > >> > > wrote: > > > >> > > > > > >> > > > When I wrote "In-Memory Analytics with Apache Arrow" I > > definitely > > > >> > > > treated "Feather" as deprecated and mentioned it only in > passing > > > >> > > > specifically indicating "Arrow IPC" as the terminology to > use. I > > > only > > > >> > > > even mentioned "Feather" at all because there are still > methods > > in > > > >> > > > pyarrow that reference it by name. > > > >> > > > > > > >> > > > That's just my opinion though... > > > >> > > > > > > >> > > > On Mon, Aug 29 2022 at 11:08:53 AM -0400, David Li > > > >> > > > wrote: > > > >> > > > > This has come up before, e.g. see [1] [2] [3]. > > > >> > > > > > > > >> > > > > I would say "Feather" is effectively deprecated and we are > > using > > > >> > > > > "Arrow IPC" now but I am not sure what others think. (From > > that > > > >> > > > > GitHub link, it seems to be mixed.) And ".arrow" is the > > official > > > >> > > > > extension now (since it is registered as part of our MIME > > type). > > > >> But > > > >> > > > > there's existing documentation and not everything has been > > > updated > > > >> to > > > >> > > > > be consistent (as you saw). > > > >> > > > > > > > >> > > > > [1]: > > > >> > > > > < > > > https://lists.apache.org/thread/0s6lgvd3g56ymd60vl5lgzhf4ro6hts5> > > > >> > > > > [2]: > > > >> > > > > < > > > https://arrow.apache.org/faq/#what-about-the-feather-file-format> > > > >> > > > > [3]: > > > >> > > > > < > > > >> > > > > > > >> > > > > > > https://stackoverflow.com/questions/67910612/arrow-ipc-vs-feather/67911190#67911190 > > > >> > > > > > > > >> > > > > > > > >> > > > > -David > > > >> > > > > > > > >> > > > > On Mon, Aug 29, 2022, at 10:50, 島 達也 wrote: > > > >> > > > >> Hi all. > > > >> > > > >> > > > >> > > > >> I know the documentation (mainly pyarrow documentation) > > > sometimes > > > >> > > > >> refers > > > >> > > > >> to IPC files as Feather files, but are there any > guidelines > > > for > > > >> > > > >> when to > > > >> > > > >> refer to an IPC file as a Feather file and when to refer > to > > > it as > > > >> > > > >> an IPC > > > >> > > > >> file? > > > >> > > > >> I believe that calling the same file an Arrow IPC file at > > > times > > > >> and > > > >> > > > >> a > > > >> > > > >> Feather file at other times is confusing to those > unfamiliar > > > with > > > >> > > > >> Apache > > > >> > > > >> Arrow (myself included). > > > >> > > > >> Surprisingly, these files may even have completely > different > > > >> > > > >> extensions, > > > >> > > > >> ".arrow"