+1 on a 0.15.0 release. At the minimum, if we could detect the stream and
provide a clear error message for Python and Java I think that would help
the transition. If we are also able to implement readers/writers that can
fallback to 4-byte prefix, then that would be nice to have.

On Wed, Jul 24, 2019 at 1:27 PM Jacques Nadeau <jacq...@apache.org> wrote:

> I'm ok with the change and 0.15 release to better manage it.
>
>
> > I've always understood the metadata to be a few dozen/hundred KB, a
> > small percentage of the total message size. I could be underestimating
> > the ratios though -- is it common to have tables w/ 1000+ columns? I've
> > seen a few reports like that in cuDF, but I'm curious to hear
> > Jacques'/Dremio's experience too.
> >
>
> Metadata size has been an issue at different points for us. We do
> definitely see datasets with 1000+ columns. It is also compounded by the
> fact that as we add more columns, we typically decrease row count so that
> the individual batches are still easily pipelined--which further increases
> the relative ratio between data and metadata.
>

Reply via email to