+1 for a 0.15.0 before 1.0 if we go ahead with this.

I'm curious to hear other's thoughts about compatibility. I think we should avoid breaking backwards compatibility if possible. It's common for apps/libs to be pinned on specific Arrow versions, and I worry it'd cause a lot of work for downstream devs to audit their tool suite for full Arrow binary compatibility (and/or require their customers to do the same).

Could we detect the 4-byte length, incur a penalty copying the memory to an aligned buffer, then continue consuming the stream? (It's probably fine if we only write the 8-byte length, since consumers on olderĀ  versions of Arrow could slice from the 4th byte before passing a buffer to the reader).

I've always understood the metadata to be a few dozen/hundred KB, a small percentage of the total message size. I could be underestimating the ratios though -- is it common to have tables w/ 1000+ columns? I've seen a few reports like that in cuDF, but I'm curious to hear Jacques'/Dremio's experience too.

If copying is feasible, it doesn't seem so bad a trade-off to maintain backwards-compatibility. As libraries and consumers upgrade their Arrow dependencies, the 4-byte length will be less and less common, and they'll be less likely to pay the cost.



On 7/23/19 2:22 AM, Uwe L. Korn wrote:
It is also a good way to test the change in public. We don't want to adjust something 
like this anymore in a 1.0.0 release. Already doing this in 0.15.0 and then maybe doing 
adjustments due to issues that appear "in the wild" is psychologically the 
easier way. There is a lot of thinking of users bound with the magic 1.0, thus I would 
plan to minimize what is changed between 1.0 and pre-1.0. This also should save us 
maintainers some time as I would expect different behaviour in bug reports between 1.0 
and pre-1.0 issues.

Uwe

On Tue, Jul 23, 2019, at 7:52 AM, Micah Kornfield wrote:
I think the main reason to do a release before 1.0.0 is if we want to make
the change that would give a good error message for forward incompatibility
(I think this could be done as 0.14.2 since it would just be clarifying an
error message).  Otherwise, I think including it in 1.0.0 would be fine
(its still not clear to me if there is consensus to fix the issue).

Thanks,
Micah


On Monday, July 22, 2019, Wes McKinney <wesmck...@gmail.com> wrote:

I'd be satisfied with fixing the Flatbuffer alignment issue either in
a 0.15.0 or 1.0.0. In the interest of expediency, though, making a
0.15.0 with this change sooner rather than later might be prudent.

On Mon, Jul 22, 2019 at 12:35 PM Antoine Pitrou <anto...@python.org>
wrote:

Hello,

Recently we've discussed breaking the IPC format to fix a long-standing
alignment issue.  See this discussion:

https://lists.apache.org/thread.html/8cea56f2069710ac128ff9129c744f0ef96a3e33a4d79d7e820019af@%3Cdev.arrow.apache.org%3E
Should we first do a 0.15.0 in order to get those format fixes right?
Once that is fine and settled we can move to the 1.0.0 release?

Regards

Antoine.


Reply via email to