I'm in favour of moving things for the R package as well and would prefer
to do it all at once and make some noise about it, so there isn't lingering
out-of-date documentation or general ambiguity to cause confusion for users.

> I'm not familiar with R but read_arrow_file() may be better
> if users use it without any prefix. If users use the
> function is used with Apache Arrow related prefix such as
> arrow::read_ipc_file(), I think that read_ipc_file() isn't
> strange.

There'll likely be a mix of both, depending on whether users load the
package in a script or just want to call the function directly without
importing the whole namespace.  I think there's more thought needed around
the exact function names for the R package (there are other considerations,
such as consistency with our other file reading function names - none of
those end in "file" and maybe that matters but maybe it doesn't).

Perhaps, if we agree at a high level how we should be referring to these
files in our documentation (i.e. actual names that we use when writing full
sentences), the individual function names can fall out of that in later
discussions?

On Wed, 19 Oct 2022 at 05:01, Sutou Kouhei <k...@clear-code.com> wrote:

> Hi,
>
> > However, I think we need to be very careful in how we brand the
> > alternative, and think proactively about what terminology we want to be
> > used (and which terms to use in APIs, ..). Because I think the "IPC"
> aspect
> > of the naming can also become confusing (IPC is a generic term, does not
> > clearly indicate it is a *file* format, and also not that it is related
> to
> > *arrow*).
>
> I like "Apache Arrow File" and "Apache Arrow Stream" (no
> IPC) for format names because we use vnd.apache.arrow.file
> and vnd.apache.arrow.stream for IANA:
>
> *
> https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.file
> *
> https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.stream
>
> > - In pyarrow, we have a `feather` submodule with read/write_feather
> > functions. How do we want to replace this? The current alternative is the
> > pyarrow.ipc submodule (which has functionality to open files), but so
> this
> > is using the "IPC" terminology. Are we OK with making this the
> alternative,
> > or do we want to add new APIs?
>
> How about pyarrow.ipc.{read,write}_{file,stream}()?
> I think that .ipc isn't strange under Apache Arrow namespace
> (pyarrow.) because it means Apache Arrow's IPC.
>
> > - In pyarrow.dataset, we also use IpcFileFormat for Arrow files. Should
> we
> > rename this to `ArrowFileFormat`? (and keep IpcFileFormat as alias)
>
> I like this idea.
>
> > - In the R arrow package, the non-feather alternative for `read_feather`
> > currently is `read_ipc_file`
>
> I'm not familiar with R but read_arrow_file() may be better
> if users use it without any prefix. If users use the
> function is used with Apache Arrow related prefix such as
> arrow::read_ipc_file(), I think that read_ipc_file() isn't
> strange.
>
> > - In pandas, there is read_feather/to_feather. What do we think that
> pandas
> > should use instead?
>
> read_arrow_file/to_arrow_file?
>
> > If we want to move the (mostly Python and R) ecosystem away from
> "Feather",
> > I think we should have a clear recommendation of what to use instead.
>
> +1
>
>
> Thanks,
> --
> kou
>
> In <calqtmbbmaepukytnin5n4-jpahtwbacmeb0bsxjz8dmkkmo...@mail.gmail.com>
>   "Re: Usage of the name Feather?" on Tue, 6 Sep 2022 15:46:39 +0200,
>   Joris Van den Bossche <jorisvandenboss...@gmail.com> wrote:
>
> > Personally, I like the "Feather" name (and actually think it could help
> > disambiguate the file vs in-memory distinction), but I understand that we
> > have chosen a certain path (eg ".arrow" is the official registered
> > extension), and have to move on.
> >
> > However, I think we need to be very careful in how we brand the
> > alternative, and think proactively about what terminology we want to be
> > used (and which terms to use in APIs, ..). Because I think the "IPC"
> aspect
> > of the naming can also become confusing (IPC is a generic term, does not
> > clearly indicate it is a *file* format, and also not that it is related
> to
> > *arrow*).
> >
> > As an example, I just noticed a twitter thread (
> > https://twitter.com/braaannigan/status/1566715704937676800) that is
> > promoting the "IPC format". The specific library used here (polars) also
> > exposes this as a "read_ipc" function.
> > Other examples:
> >
> > - In pyarrow, we have a `feather` submodule with read/write_feather
> > functions. How do we want to replace this? The current alternative is the
> > pyarrow.ipc submodule (which has functionality to open files), but so
> this
> > is using the "IPC" terminology. Are we OK with making this the
> alternative,
> > or do we want to add new APIs?
> > - In pyarrow.dataset, we also use IpcFileFormat for Arrow files. Should
> we
> > rename this to `ArrowFileFormat`? (and keep IpcFileFormat as alias)
> > - In the R arrow package, the non-feather alternative for `read_feather`
> > currently is `read_ipc_file`
> > - In pandas, there is read_feather/to_feather. What do we think that
> pandas
> > should use instead?
> > - ...
> >
> > Personally, I think we should certainly avoid names that just use IPC
> (like
> > `read_ipc`). An alternative could be `read_arrow_ipc`, but if want to
> drop
> > the IPC part (as proposed earlier in this thread, although not yet agreed
> > on), that would become `read_arrow`/`to_arrow`. That might then be
> confused
> > with reading from / converting to in-memory arrow data or stream?
> > If we want to recommend using "Arrow file" terminology, so then APIs like
> > `read_arrow_file` could be used?
> >
> > If we want to move the (mostly Python and R) ecosystem away from
> "Feather",
> > I think we should have a clear recommendation of what to use instead.
> >
> > On Wed, 31 Aug 2022 at 20:33, Aldrin <akmon...@ucsc.edu.invalid> wrote:
> >
> >> similarly to Micah, I mentally think of "Arrow IPC" a format that is
> >> optimized for "IPC".
> >> Which I have assumed meant it minimizes CPU overhead when using data
> read
> >> from
> >> storage because it's already in a memory friendly format (e.g. minimal
> >> deserialization).
> >>
> >> Not sure the "IPC" is necessary, but it does push the intent into the
> name
> >> (unless it's
> >> actually a misnomer).
> >>
> >>
> >> Aldrin Montana
> >> Computer Science PhD Student
> >> UC Santa Cruz
> >>
> >>
> >> On Tue, Aug 30, 2022 at 8:29 PM Micah Kornfield <emkornfi...@gmail.com>
> >> wrote:
> >>
> >> > I think one source of ambiguity for Arrow files, at least for me, is
> >> > whether they are just a string of messages concatenated or they are
> the
> >> > files that contain the metadata footer.
> >> >
> >> > On Tue, Aug 30, 2022 at 5:11 AM Dewey Dunnington
> >> > <de...@voltrondata.com.invalid> wrote:
> >> >
> >> > > Ian has a very good point...I would be in favour of calling them
> "Arrow
> >> > > files" wherever possible since there's no need to know or care what
> >> > > interprocess communication is to use them!
> >> > >
> >> > > On Mon, Aug 29, 2022 at 6:50 PM Ian Cook <i...@ursacomputing.com>
> >> wrote:
> >> > >
> >> > > > +1 We should explicitly discourage further use of “Feather” to
> refer
> >> to
> >> > > > Arrow IPC files.
> >> > > >
> >> > > > In this spirit of simplifying terminology: Does the “IPC” in the
> term
> >> > > > “Arrow IPC files” serve a truly necessary purpose? Is there
> another
> >> > type
> >> > > of
> >> > > > “Arrow file” that the “IPC” serves to disambiguate? If not, can we
> >> > simply
> >> > > > refer to these files as “Arrow files” in most places in the
> >> > documentation
> >> > > > and website? (In a few important places we should clarify that
> when
> >> we
> >> > > say
> >> > > > “Arrow file” we are referring to a file that uses the Arrow IPC
> file
> >> > > > format.)
> >> > > >
> >> > > > Ian
> >> > > >
> >> > > > On Mon, Aug 29, 2022 at 17:33 Sutou Kouhei <k...@clear-code.com>
> >> wrote:
> >> > > >
> >> > > > > +1 for 1.
> >> > > > >
> >> > > > > Thanks,
> >> > > > > --
> >> > > > > kou
> >> > > > >
> >> > > > > In <CAOYPqDCAib2wBKaKnRij9=__OsUJJghVq1UUTNibK2T0Np+=
> >> > r...@mail.gmail.com
> >> > > >
> >> > > > >   "Re: Usage of the name Feather?" on Mon, 29 Aug 2022 20:18:37
> >> > +0200,
> >> > > > >   Jorge Cardoso Leitão <jorgecarlei...@gmail.com> wrote:
> >> > > > >
> >> > > > > > I agree.
> >> > > > > >
> >> > > > > > I suspect that the most widely used API with "feather" is
> Pandas'
> >> > > > > > read_feather.
> >> > > > > >
> >> > > > > >
> >> > > > > >
> >> > > > > > On Mon, 29 Aug 2022, 19:55 Weston Pace, <
> weston.p...@gmail.com>
> >> > > wrote:
> >> > > > > >
> >> > > > > >> I agree as well.  I think most lingering uses of the term
> >> > "feather"
> >> > > > > >> are in pyarrow and R however, so it might be good to hear
> from
> >> > some
> >> > > of
> >> > > > > >> those maintainers.
> >> > > > > >>
> >> > > > > >>
> >> > > > > >>
> >> > > > > >> On Mon, Aug 29, 2022 at 9:35 AM Antoine Pitrou <
> >> > anto...@python.org>
> >> > > > > wrote:
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> > I agree with this as well.
> >> > > > > >> >
> >> > > > > >> > Regards
> >> > > > > >> >
> >> > > > > >> > Antoine.
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> > On Mon, 29 Aug 2022 11:29:45 -0400
> >> > > > > >> > Andrew Lamb <al...@influxdata.com> wrote:
> >> > > > > >> > > In the rust implementation we use the term "Arrow IPC"
> and I
> >> > > > support
> >> > > > > >> your
> >> > > > > >> > > option 1:
> >> > > > > >> > >
> >> > > > > >> > > > The name Feather V2 is deprecated. Only the extension
> >> > ".arrow"
> >> > > > > will
> >> > > > > >> be
> >> > > > > >> > > used for IPC files.
> >> > > > > >> > >
> >> > > > > >> > > Andrew
> >> > > > > >> > >
> >> > > > > >> > > On Mon, Aug 29, 2022 at 11:21 AM Matthew Topol
> >> > > > > >> <m...@voltrondata.com.invalid>
> >> > > > > >> > > wrote:
> >> > > > > >> > >
> >> > > > > >> > > > When I wrote "In-Memory Analytics with Apache Arrow" I
> >> > > > definitely
> >> > > > > >> > > > treated "Feather" as deprecated and mentioned it only
> in
> >> > > passing
> >> > > > > >> > > > specifically indicating "Arrow IPC" as the terminology
> to
> >> > > use. I
> >> > > > > only
> >> > > > > >> > > > even mentioned "Feather" at all because there are still
> >> > > methods
> >> > > > in
> >> > > > > >> > > > pyarrow that reference it by name.
> >> > > > > >> > > >
> >> > > > > >> > > > That's just my opinion though...
> >> > > > > >> > > >
> >> > > > > >> > > > On Mon, Aug 29 2022 at 11:08:53 AM -0400, David Li
> >> > > > > >> > > > <lidav...@apache.org> wrote:
> >> > > > > >> > > > > This has come up before, e.g. see [1] [2] [3].
> >> > > > > >> > > > >
> >> > > > > >> > > > > I would say "Feather" is effectively deprecated and
> we
> >> are
> >> > > > using
> >> > > > > >> > > > > "Arrow IPC" now but I am not sure what others think.
> >> (From
> >> > > > that
> >> > > > > >> > > > > GitHub link, it seems to be mixed.) And ".arrow" is
> the
> >> > > > official
> >> > > > > >> > > > > extension now (since it is registered as part of our
> >> MIME
> >> > > > type).
> >> > > > > >> But
> >> > > > > >> > > > > there's existing documentation and not everything has
> >> been
> >> > > > > updated
> >> > > > > >> to
> >> > > > > >> > > > > be consistent (as you saw).
> >> > > > > >> > > > >
> >> > > > > >> > > > > [1]:
> >> > > > > >> > > > > <
> >> > > > >
> https://lists.apache.org/thread/0s6lgvd3g56ymd60vl5lgzhf4ro6hts5>
> >> > > > > >> > > > > [2]:
> >> > > > > >> > > > > <
> >> > > > >
> https://arrow.apache.org/faq/#what-about-the-feather-file-format>
> >> > > > > >> > > > > [3]:
> >> > > > > >> > > > > <
> >> > > > > >> > > >
> >> > > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> https://stackoverflow.com/questions/67910612/arrow-ipc-vs-feather/67911190#67911190
> >> > > > > >> > > > >
> >> > > > > >> > > > >
> >> > > > > >> > > > > -David
> >> > > > > >> > > > >
> >> > > > > >> > > > > On Mon, Aug 29, 2022, at 10:50, 島 達也 wrote:
> >> > > > > >> > > > >>  Hi all.
> >> > > > > >> > > > >>
> >> > > > > >> > > > >>  I know the documentation (mainly pyarrow
> >> documentation)
> >> > > > > sometimes
> >> > > > > >> > > > >> refers
> >> > > > > >> > > > >>  to IPC files as Feather files, but are there any
> >> > > guidelines
> >> > > > > for
> >> > > > > >> > > > >> when to
> >> > > > > >> > > > >>  refer to an IPC file as a Feather file and when to
> >> refer
> >> > > to
> >> > > > > it as
> >> > > > > >> > > > >> an IPC
> >> > > > > >> > > > >>  file?
> >> > > > > >> > > > >>  I believe that calling the same file an Arrow IPC
> file
> >> > at
> >> > > > > times
> >> > > > > >> and
> >> > > > > >> > > > >> a
> >> > > > > >> > > > >>  Feather file at other times is confusing to those
> >> > > unfamiliar
> >> > > > > with
> >> > > > > >> > > > >> Apache
> >> > > > > >> > > > >>  Arrow (myself included).
> >> > > > > >> > > > >>  Surprisingly, these files may even have completely
> >> > > different
> >> > > > > >> > > > >> extensions,
> >> > > > > >> > > > >>  ".arrow" and ".feather", which are not similar.
> >> > > > > >> > > > >>
> >> > > > > >> > > > >>  Perhaps there are several options for future use of
> >> the
> >> > > name
> >> > > > > >> > > > >> Feather,
> >> > > > > >> > > > >>  such as
> >> > > > > >> > > > >>
> >> > > > > >> > > > >>   1. The name Feather V2 is deprecated. Only the
> >> > extension
> >> > > > > >> ".arrow"
> >> > > > > >> > > > >> will
> >> > > > > >> > > > >>      be used for IPC files.
> >> > > > > >> > > > >>   2. In some contexts(?), IPC files are referred to
> as
> >> > > > Feather;
> >> > > > > >> only
> >> > > > > >> > > > >>      ".arrow" is used for the IPC file extension to
> >> > clearly
> >> > > > > >> > > > >> distinguish
> >> > > > > >> > > > >>      it from Feather V1's ".feather".
> >> > > > > >> > > > >>   3. When an IPC file is called Feather by some
> rule,
> >> > > > extension
> >> > > > > >> > > > >>      ".feather" is used, and when an IPC file is not
> >> > called
> >> > > > > >> Feather,
> >> > > > > >> > > > >>      extension ".arrow" is used.
> >> > > > > >> > > > >>
> >> > > > > >> > > > >>  I mistakenly thought the current status was 2, but
> >> > > according
> >> > > > > to
> >> > > > > >> the
> >> > > > > >> > > > >>  discussion in this PR
> >> > > > > >> > > > >> (<https://github.com/apache/arrow/pull/13677>),
> >> > > > > >> > > > >>  apparently the current status seems 3. (However,
> there
> >> > > seems
> >> > > > > to
> >> > > > > >> be
> >> > > > > >> > > > >> no
> >> > > > > >> > > > >>  rule as to when an IPC file should be called a
> >> Feather)
> >> > > > > >> > > > >>
> >> > > > > >> > > > >>  I am not very familiar with Arrow and this is my
> first
> >> > > post
> >> > > > to
> >> > > > > >> this
> >> > > > > >> > > > >>  mailing list so I apologize if I have done
> something
> >> > wrong
> >> > > > or
> >> > > > > >> > > > >> inappropriate.
> >> > > > > >> > > > >>
> >> > > > > >> > > > >>  Best,
> >> > > > > >> > > > >>  SHIMA Tatsuya
> >> > > > > >> > > >
> >> > > > > >> > > >
> >> > > > > >> > >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >> >
> >> > > > > >>
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
>

Reply via email to