Thank you all for sharing your opinions.
Many believe that the name Feather V2 should be deprecated.
However, as several have pointed out, simply deprecating the name
"Feather" is not enough to end the confusion; a recommended name seems
to need to be determined.
Perhaps we need to have a vote on the deprecation of Feather V2, and
further discussion on the recommended name for functions, etc. (is it
simply Arrow file, or should it always be called Arrow IPC file?) maybe
needed?
Best, SHIMA Tatsuya
On 2022/09/06 22:46, Joris Van den Bossche wrote:
Personally, I like the "Feather" name (and actually think it could help
disambiguate the file vs in-memory distinction), but I understand that we
have chosen a certain path (eg ".arrow" is the official registered
extension), and have to move on.
However, I think we need to be very careful in how we brand the
alternative, and think proactively about what terminology we want to be
used (and which terms to use in APIs, ..). Because I think the "IPC" aspect
of the naming can also become confusing (IPC is a generic term, does not
clearly indicate it is a *file* format, and also not that it is related to
*arrow*).
As an example, I just noticed a twitter thread (
https://twitter.com/braaannigan/status/1566715704937676800) that is
promoting the "IPC format". The specific library used here (polars) also
exposes this as a "read_ipc" function.
Other examples:
- In pyarrow, we have a `feather` submodule with read/write_feather
functions. How do we want to replace this? The current alternative is the
pyarrow.ipc submodule (which has functionality to open files), but so this
is using the "IPC" terminology. Are we OK with making this the alternative,
or do we want to add new APIs?
- In pyarrow.dataset, we also use IpcFileFormat for Arrow files. Should we
rename this to `ArrowFileFormat`? (and keep IpcFileFormat as alias)
- In the R arrow package, the non-feather alternative for `read_feather`
currently is `read_ipc_file`
- In pandas, there is read_feather/to_feather. What do we think that pandas
should use instead?
- ...
Personally, I think we should certainly avoid names that just use IPC (like
`read_ipc`). An alternative could be `read_arrow_ipc`, but if want to drop
the IPC part (as proposed earlier in this thread, although not yet agreed
on), that would become `read_arrow`/`to_arrow`. That might then be confused
with reading from / converting to in-memory arrow data or stream?
If we want to recommend using "Arrow file" terminology, so then APIs like
`read_arrow_file` could be used?
If we want to move the (mostly Python and R) ecosystem away from "Feather",
I think we should have a clear recommendation of what to use instead.
On Wed, 31 Aug 2022 at 20:33, Aldrin<akmon...@ucsc.edu.invalid> wrote:
similarly to Micah, I mentally think of "Arrow IPC" a format that is
optimized for "IPC".
Which I have assumed meant it minimizes CPU overhead when using data read
from
storage because it's already in a memory friendly format (e.g. minimal
deserialization).
Not sure the "IPC" is necessary, but it does push the intent into the name
(unless it's
actually a misnomer).
Aldrin Montana
Computer Science PhD Student
UC Santa Cruz
On Tue, Aug 30, 2022 at 8:29 PM Micah Kornfield<emkornfi...@gmail.com>
wrote:
I think one source of ambiguity for Arrow files, at least for me, is
whether they are just a string of messages concatenated or they are the
files that contain the metadata footer.
On Tue, Aug 30, 2022 at 5:11 AM Dewey Dunnington
<de...@voltrondata.com.invalid> wrote:
Ian has a very good point...I would be in favour of calling them "Arrow
files" wherever possible since there's no need to know or care what
interprocess communication is to use them!
On Mon, Aug 29, 2022 at 6:50 PM Ian Cook<i...@ursacomputing.com>
wrote:
+1 We should explicitly discourage further use of “Feather” to refer
to
Arrow IPC files.
In this spirit of simplifying terminology: Does the “IPC” in the term
“Arrow IPC files” serve a truly necessary purpose? Is there another
type
of
“Arrow file” that the “IPC” serves to disambiguate? If not, can we
simply
refer to these files as “Arrow files” in most places in the
documentation
and website? (In a few important places we should clarify that when
we
say
“Arrow file” we are referring to a file that uses the Arrow IPC file
format.)
Ian
On Mon, Aug 29, 2022 at 17:33 Sutou Kouhei<k...@clear-code.com>
wrote:
+1 for 1.
Thanks,
--
kou
In <CAOYPqDCAib2wBKaKnRij9=__OsUJJghVq1UUTNibK2T0Np+=
r...@mail.gmail.com
"Re: Usage of the name Feather?" on Mon, 29 Aug 2022 20:18:37
+0200,
Jorge Cardoso Leitão<jorgecarlei...@gmail.com> wrote:
I agree.
I suspect that the most widely used API with "feather" is Pandas'
read_feather.
On Mon, 29 Aug 2022, 19:55 Weston Pace,<weston.p...@gmail.com>
wrote:
I agree as well. I think most lingering uses of the term
"feather"
are in pyarrow and R however, so it might be good to hear from
some
of
those maintainers.
On Mon, Aug 29, 2022 at 9:35 AM Antoine Pitrou <
anto...@python.org>
wrote:
I agree with this as well.
Regards
Antoine.
On Mon, 29 Aug 2022 11:29:45 -0400
Andrew Lamb<al...@influxdata.com> wrote:
In the rust implementation we use the term "Arrow IPC" and I
support
your
option 1:
The name Feather V2 is deprecated. Only the extension
".arrow"
will
be
used for IPC files.
Andrew
On Mon, Aug 29, 2022 at 11:21 AM Matthew Topol
<m...@voltrondata.com.invalid>
wrote:
When I wrote "In-Memory Analytics with Apache Arrow" I
definitely
treated "Feather" as deprecated and mentioned it only in
passing
specifically indicating "Arrow IPC" as the terminology to
use. I
only
even mentioned "Feather" at all because there are still
methods
in
pyarrow that reference it by name.
That's just my opinion though...
On Mon, Aug 29 2022 at 11:08:53 AM -0400, David Li
<lidav...@apache.org> wrote:
This has come up before, e.g. see [1] [2] [3].
I would say "Feather" is effectively deprecated and we
are
using
"Arrow IPC" now but I am not sure what others think.
(From
that
GitHub link, it seems to be mixed.) And ".arrow" is the
official
extension now (since it is registered as part of our
MIME
type).
But
there's existing documentation and not everything has
been
updated
to
be consistent (as you saw).
[1]:
<
https://lists.apache.org/thread/0s6lgvd3g56ymd60vl5lgzhf4ro6hts5>
[2]:
<
https://arrow.apache.org/faq/#what-about-the-feather-file-format>
[3]:
<
https://stackoverflow.com/questions/67910612/arrow-ipc-vs-feather/67911190#67911190
-David
On Mon, Aug 29, 2022, at 10:50, 島 達也 wrote:
Hi all.
I know the documentation (mainly pyarrow
documentation)
sometimes
refers
to IPC files as Feather files, but are there any
guidelines
for
when to
refer to an IPC file as a Feather file and when to
refer
to
it as
an IPC
file?
I believe that calling the same file an Arrow IPC file
at
times
and
a
Feather file at other times is confusing to those
unfamiliar
with
Apache
Arrow (myself included).
Surprisingly, these files may even have completely
different
extensions,
".arrow" and ".feather", which are not similar.
Perhaps there are several options for future use of
the
name
Feather,
such as
1. The name Feather V2 is deprecated. Only the
extension
".arrow"
will
be used for IPC files.
2. In some contexts(?), IPC files are referred to as
Feather;
only
".arrow" is used for the IPC file extension to
clearly
distinguish
it from Feather V1's ".feather".
3. When an IPC file is called Feather by some rule,
extension
".feather" is used, and when an IPC file is not
called
Feather,
extension ".arrow" is used.
I mistakenly thought the current status was 2, but
according
to
the
discussion in this PR
(<https://github.com/apache/arrow/pull/13677>),
apparently the current status seems 3. (However, there
seems
to
be
no
rule as to when an IPC file should be called a
Feather)
I am not very familiar with Arrow and this is my first
post
to
this
mailing list so I apologize if I have done something
wrong
or
inappropriate.
Best,
SHIMA Tatsuya