+1 for a 0.15.0 before 1.0 if we go ahead with this.
I'm curious to hear other's thoughts about compatibility. I think we
should avoid breaking backwards compatibility if possible. It's common
for apps/libs to be pinned on specific Arrow versions, and I worry it'd
cause a lot of work for downstream devs to audit their tool suite for
full Arrow binary compatibility (and/or require their customers to do
the same).
Could we detect the 4-byte length, incur a penalty copying the memory to
an aligned buffer, then continue consuming the stream? (It's probably
fine if we only write the 8-byte length, since consumers on olderĀ
versions of Arrow could slice from the 4th byte before passing a buffer
to the reader).
I've always understood the metadata to be a few dozen/hundred KB, a
small percentage of the total message size. I could be underestimating
the ratios though -- is it common to have tables w/ 1000+ columns? I've
seen a few reports like that in cuDF, but I'm curious to hear
Jacques'/Dremio's experience too.
If copying is feasible, it doesn't seem so bad a trade-off to maintain
backwards-compatibility. As libraries and consumers upgrade their Arrow
dependencies, the 4-byte length will be less and less common, and
they'll be less likely to pay the cost.
On 7/23/19 2:22 AM, Uwe L. Korn wrote:
It is also a good way to test the change in public. We don't want to adjust something
like this anymore in a 1.0.0 release. Already doing this in 0.15.0 and then maybe doing
adjustments due to issues that appear "in the wild" is psychologically the
easier way. There is a lot of thinking of users bound with the magic 1.0, thus I would
plan to minimize what is changed between 1.0 and pre-1.0. This also should save us
maintainers some time as I would expect different behaviour in bug reports between 1.0
and pre-1.0 issues.
Uwe
On Tue, Jul 23, 2019, at 7:52 AM, Micah Kornfield wrote:
I think the main reason to do a release before 1.0.0 is if we want to make
the change that would give a good error message for forward incompatibility
(I think this could be done as 0.14.2 since it would just be clarifying an
error message). Otherwise, I think including it in 1.0.0 would be fine
(its still not clear to me if there is consensus to fix the issue).
Thanks,
Micah
On Monday, July 22, 2019, Wes McKinney <wesmck...@gmail.com> wrote:
I'd be satisfied with fixing the Flatbuffer alignment issue either in
a 0.15.0 or 1.0.0. In the interest of expediency, though, making a
0.15.0 with this change sooner rather than later might be prudent.
On Mon, Jul 22, 2019 at 12:35 PM Antoine Pitrou <anto...@python.org>
wrote:
Hello,
Recently we've discussed breaking the IPC format to fix a long-standing
alignment issue. See this discussion:
https://lists.apache.org/thread.html/8cea56f2069710ac128ff9129c744f0ef96a3e33a4d79d7e820019af@%3Cdev.arrow.apache.org%3E
Should we first do a 0.15.0 in order to get those format fixes right?
Once that is fine and settled we can move to the 1.0.0 release?
Regards
Antoine.