Hi,

> However, I think we need to be very careful in how we brand the
> alternative, and think proactively about what terminology we want to be
> used (and which terms to use in APIs, ..). Because I think the "IPC" aspect
> of the naming can also become confusing (IPC is a generic term, does not
> clearly indicate it is a *file* format, and also not that it is related to
> *arrow*).

I like "Apache Arrow File" and "Apache Arrow Stream" (no
IPC) for format names because we use vnd.apache.arrow.file
and vnd.apache.arrow.stream for IANA:

* https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.file
* 
https://www.iana.org/assignments/media-types/application/vnd.apache.arrow.stream

> - In pyarrow, we have a `feather` submodule with read/write_feather
> functions. How do we want to replace this? The current alternative is the
> pyarrow.ipc submodule (which has functionality to open files), but so this
> is using the "IPC" terminology. Are we OK with making this the alternative,
> or do we want to add new APIs?

How about pyarrow.ipc.{read,write}_{file,stream}()?
I think that .ipc isn't strange under Apache Arrow namespace
(pyarrow.) because it means Apache Arrow's IPC.

> - In pyarrow.dataset, we also use IpcFileFormat for Arrow files. Should we
> rename this to `ArrowFileFormat`? (and keep IpcFileFormat as alias)

I like this idea.

> - In the R arrow package, the non-feather alternative for `read_feather`
> currently is `read_ipc_file`

I'm not familiar with R but read_arrow_file() may be better
if users use it without any prefix. If users use the
function is used with Apache Arrow related prefix such as
arrow::read_ipc_file(), I think that read_ipc_file() isn't
strange.

> - In pandas, there is read_feather/to_feather. What do we think that pandas
> should use instead?

read_arrow_file/to_arrow_file?

> If we want to move the (mostly Python and R) ecosystem away from "Feather",
> I think we should have a clear recommendation of what to use instead.

+1


Thanks,
-- 
kou

In <calqtmbbmaepukytnin5n4-jpahtwbacmeb0bsxjz8dmkkmo...@mail.gmail.com>
  "Re: Usage of the name Feather?" on Tue, 6 Sep 2022 15:46:39 +0200,
  Joris Van den Bossche <jorisvandenboss...@gmail.com> wrote:

> Personally, I like the "Feather" name (and actually think it could help
> disambiguate the file vs in-memory distinction), but I understand that we
> have chosen a certain path (eg ".arrow" is the official registered
> extension), and have to move on.
> 
> However, I think we need to be very careful in how we brand the
> alternative, and think proactively about what terminology we want to be
> used (and which terms to use in APIs, ..). Because I think the "IPC" aspect
> of the naming can also become confusing (IPC is a generic term, does not
> clearly indicate it is a *file* format, and also not that it is related to
> *arrow*).
> 
> As an example, I just noticed a twitter thread (
> https://twitter.com/braaannigan/status/1566715704937676800) that is
> promoting the "IPC format". The specific library used here (polars) also
> exposes this as a "read_ipc" function.
> Other examples:
> 
> - In pyarrow, we have a `feather` submodule with read/write_feather
> functions. How do we want to replace this? The current alternative is the
> pyarrow.ipc submodule (which has functionality to open files), but so this
> is using the "IPC" terminology. Are we OK with making this the alternative,
> or do we want to add new APIs?
> - In pyarrow.dataset, we also use IpcFileFormat for Arrow files. Should we
> rename this to `ArrowFileFormat`? (and keep IpcFileFormat as alias)
> - In the R arrow package, the non-feather alternative for `read_feather`
> currently is `read_ipc_file`
> - In pandas, there is read_feather/to_feather. What do we think that pandas
> should use instead?
> - ...
> 
> Personally, I think we should certainly avoid names that just use IPC (like
> `read_ipc`). An alternative could be `read_arrow_ipc`, but if want to drop
> the IPC part (as proposed earlier in this thread, although not yet agreed
> on), that would become `read_arrow`/`to_arrow`. That might then be confused
> with reading from / converting to in-memory arrow data or stream?
> If we want to recommend using "Arrow file" terminology, so then APIs like
> `read_arrow_file` could be used?
> 
> If we want to move the (mostly Python and R) ecosystem away from "Feather",
> I think we should have a clear recommendation of what to use instead.
> 
> On Wed, 31 Aug 2022 at 20:33, Aldrin <akmon...@ucsc.edu.invalid> wrote:
> 
>> similarly to Micah, I mentally think of "Arrow IPC" a format that is
>> optimized for "IPC".
>> Which I have assumed meant it minimizes CPU overhead when using data read
>> from
>> storage because it's already in a memory friendly format (e.g. minimal
>> deserialization).
>>
>> Not sure the "IPC" is necessary, but it does push the intent into the name
>> (unless it's
>> actually a misnomer).
>>
>>
>> Aldrin Montana
>> Computer Science PhD Student
>> UC Santa Cruz
>>
>>
>> On Tue, Aug 30, 2022 at 8:29 PM Micah Kornfield <emkornfi...@gmail.com>
>> wrote:
>>
>> > I think one source of ambiguity for Arrow files, at least for me, is
>> > whether they are just a string of messages concatenated or they are the
>> > files that contain the metadata footer.
>> >
>> > On Tue, Aug 30, 2022 at 5:11 AM Dewey Dunnington
>> > <de...@voltrondata.com.invalid> wrote:
>> >
>> > > Ian has a very good point...I would be in favour of calling them "Arrow
>> > > files" wherever possible since there's no need to know or care what
>> > > interprocess communication is to use them!
>> > >
>> > > On Mon, Aug 29, 2022 at 6:50 PM Ian Cook <i...@ursacomputing.com>
>> wrote:
>> > >
>> > > > +1 We should explicitly discourage further use of “Feather” to refer
>> to
>> > > > Arrow IPC files.
>> > > >
>> > > > In this spirit of simplifying terminology: Does the “IPC” in the term
>> > > > “Arrow IPC files” serve a truly necessary purpose? Is there another
>> > type
>> > > of
>> > > > “Arrow file” that the “IPC” serves to disambiguate? If not, can we
>> > simply
>> > > > refer to these files as “Arrow files” in most places in the
>> > documentation
>> > > > and website? (In a few important places we should clarify that when
>> we
>> > > say
>> > > > “Arrow file” we are referring to a file that uses the Arrow IPC file
>> > > > format.)
>> > > >
>> > > > Ian
>> > > >
>> > > > On Mon, Aug 29, 2022 at 17:33 Sutou Kouhei <k...@clear-code.com>
>> wrote:
>> > > >
>> > > > > +1 for 1.
>> > > > >
>> > > > > Thanks,
>> > > > > --
>> > > > > kou
>> > > > >
>> > > > > In <CAOYPqDCAib2wBKaKnRij9=__OsUJJghVq1UUTNibK2T0Np+=
>> > r...@mail.gmail.com
>> > > >
>> > > > >   "Re: Usage of the name Feather?" on Mon, 29 Aug 2022 20:18:37
>> > +0200,
>> > > > >   Jorge Cardoso Leitão <jorgecarlei...@gmail.com> wrote:
>> > > > >
>> > > > > > I agree.
>> > > > > >
>> > > > > > I suspect that the most widely used API with "feather" is Pandas'
>> > > > > > read_feather.
>> > > > > >
>> > > > > >
>> > > > > >
>> > > > > > On Mon, 29 Aug 2022, 19:55 Weston Pace, <weston.p...@gmail.com>
>> > > wrote:
>> > > > > >
>> > > > > >> I agree as well.  I think most lingering uses of the term
>> > "feather"
>> > > > > >> are in pyarrow and R however, so it might be good to hear from
>> > some
>> > > of
>> > > > > >> those maintainers.
>> > > > > >>
>> > > > > >>
>> > > > > >>
>> > > > > >> On Mon, Aug 29, 2022 at 9:35 AM Antoine Pitrou <
>> > anto...@python.org>
>> > > > > wrote:
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > I agree with this as well.
>> > > > > >> >
>> > > > > >> > Regards
>> > > > > >> >
>> > > > > >> > Antoine.
>> > > > > >> >
>> > > > > >> >
>> > > > > >> > On Mon, 29 Aug 2022 11:29:45 -0400
>> > > > > >> > Andrew Lamb <al...@influxdata.com> wrote:
>> > > > > >> > > In the rust implementation we use the term "Arrow IPC" and I
>> > > > support
>> > > > > >> your
>> > > > > >> > > option 1:
>> > > > > >> > >
>> > > > > >> > > > The name Feather V2 is deprecated. Only the extension
>> > ".arrow"
>> > > > > will
>> > > > > >> be
>> > > > > >> > > used for IPC files.
>> > > > > >> > >
>> > > > > >> > > Andrew
>> > > > > >> > >
>> > > > > >> > > On Mon, Aug 29, 2022 at 11:21 AM Matthew Topol
>> > > > > >> <m...@voltrondata.com.invalid>
>> > > > > >> > > wrote:
>> > > > > >> > >
>> > > > > >> > > > When I wrote "In-Memory Analytics with Apache Arrow" I
>> > > > definitely
>> > > > > >> > > > treated "Feather" as deprecated and mentioned it only in
>> > > passing
>> > > > > >> > > > specifically indicating "Arrow IPC" as the terminology to
>> > > use. I
>> > > > > only
>> > > > > >> > > > even mentioned "Feather" at all because there are still
>> > > methods
>> > > > in
>> > > > > >> > > > pyarrow that reference it by name.
>> > > > > >> > > >
>> > > > > >> > > > That's just my opinion though...
>> > > > > >> > > >
>> > > > > >> > > > On Mon, Aug 29 2022 at 11:08:53 AM -0400, David Li
>> > > > > >> > > > <lidav...@apache.org> wrote:
>> > > > > >> > > > > This has come up before, e.g. see [1] [2] [3].
>> > > > > >> > > > >
>> > > > > >> > > > > I would say "Feather" is effectively deprecated and we
>> are
>> > > > using
>> > > > > >> > > > > "Arrow IPC" now but I am not sure what others think.
>> (From
>> > > > that
>> > > > > >> > > > > GitHub link, it seems to be mixed.) And ".arrow" is the
>> > > > official
>> > > > > >> > > > > extension now (since it is registered as part of our
>> MIME
>> > > > type).
>> > > > > >> But
>> > > > > >> > > > > there's existing documentation and not everything has
>> been
>> > > > > updated
>> > > > > >> to
>> > > > > >> > > > > be consistent (as you saw).
>> > > > > >> > > > >
>> > > > > >> > > > > [1]:
>> > > > > >> > > > > <
>> > > > > https://lists.apache.org/thread/0s6lgvd3g56ymd60vl5lgzhf4ro6hts5>
>> > > > > >> > > > > [2]:
>> > > > > >> > > > > <
>> > > > > https://arrow.apache.org/faq/#what-about-the-feather-file-format>
>> > > > > >> > > > > [3]:
>> > > > > >> > > > > <
>> > > > > >> > > >
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>> https://stackoverflow.com/questions/67910612/arrow-ipc-vs-feather/67911190#67911190
>> > > > > >> > > > >
>> > > > > >> > > > >
>> > > > > >> > > > > -David
>> > > > > >> > > > >
>> > > > > >> > > > > On Mon, Aug 29, 2022, at 10:50, 島 達也 wrote:
>> > > > > >> > > > >>  Hi all.
>> > > > > >> > > > >>
>> > > > > >> > > > >>  I know the documentation (mainly pyarrow
>> documentation)
>> > > > > sometimes
>> > > > > >> > > > >> refers
>> > > > > >> > > > >>  to IPC files as Feather files, but are there any
>> > > guidelines
>> > > > > for
>> > > > > >> > > > >> when to
>> > > > > >> > > > >>  refer to an IPC file as a Feather file and when to
>> refer
>> > > to
>> > > > > it as
>> > > > > >> > > > >> an IPC
>> > > > > >> > > > >>  file?
>> > > > > >> > > > >>  I believe that calling the same file an Arrow IPC file
>> > at
>> > > > > times
>> > > > > >> and
>> > > > > >> > > > >> a
>> > > > > >> > > > >>  Feather file at other times is confusing to those
>> > > unfamiliar
>> > > > > with
>> > > > > >> > > > >> Apache
>> > > > > >> > > > >>  Arrow (myself included).
>> > > > > >> > > > >>  Surprisingly, these files may even have completely
>> > > different
>> > > > > >> > > > >> extensions,
>> > > > > >> > > > >>  ".arrow" and ".feather", which are not similar.
>> > > > > >> > > > >>
>> > > > > >> > > > >>  Perhaps there are several options for future use of
>> the
>> > > name
>> > > > > >> > > > >> Feather,
>> > > > > >> > > > >>  such as
>> > > > > >> > > > >>
>> > > > > >> > > > >>   1. The name Feather V2 is deprecated. Only the
>> > extension
>> > > > > >> ".arrow"
>> > > > > >> > > > >> will
>> > > > > >> > > > >>      be used for IPC files.
>> > > > > >> > > > >>   2. In some contexts(?), IPC files are referred to as
>> > > > Feather;
>> > > > > >> only
>> > > > > >> > > > >>      ".arrow" is used for the IPC file extension to
>> > clearly
>> > > > > >> > > > >> distinguish
>> > > > > >> > > > >>      it from Feather V1's ".feather".
>> > > > > >> > > > >>   3. When an IPC file is called Feather by some rule,
>> > > > extension
>> > > > > >> > > > >>      ".feather" is used, and when an IPC file is not
>> > called
>> > > > > >> Feather,
>> > > > > >> > > > >>      extension ".arrow" is used.
>> > > > > >> > > > >>
>> > > > > >> > > > >>  I mistakenly thought the current status was 2, but
>> > > according
>> > > > > to
>> > > > > >> the
>> > > > > >> > > > >>  discussion in this PR
>> > > > > >> > > > >> (<https://github.com/apache/arrow/pull/13677>),
>> > > > > >> > > > >>  apparently the current status seems 3. (However, there
>> > > seems
>> > > > > to
>> > > > > >> be
>> > > > > >> > > > >> no
>> > > > > >> > > > >>  rule as to when an IPC file should be called a
>> Feather)
>> > > > > >> > > > >>
>> > > > > >> > > > >>  I am not very familiar with Arrow and this is my first
>> > > post
>> > > > to
>> > > > > >> this
>> > > > > >> > > > >>  mailing list so I apologize if I have done something
>> > wrong
>> > > > or
>> > > > > >> > > > >> inappropriate.
>> > > > > >> > > > >>
>> > > > > >> > > > >>  Best,
>> > > > > >> > > > >>  SHIMA Tatsuya
>> > > > > >> > > >
>> > > > > >> > > >
>> > > > > >> > >
>> > > > > >> >
>> > > > > >> >
>> > > > > >> >
>> > > > > >>
>> > > > >
>> > > >
>> > >
>> >
>>

Reply via email to