Re: Parquet vectorized reader DELTA_BYTE_ARRAY

Ryan Blue Mon, 22 May 2017 10:38:39 -0700

Michael is right, the delta byte array encoding is a Parquet v2 feature.
Parquet v2 isn't finished yet, though some features are in releases and
those features will be supported in future releases. In other words,
Parquet will maintain backward-compatibility for any released v2 features.


I don't recommend using Parquet v2 yet because Parquet doesn't guarantee
forward-compatibility for those features. For v1, old readers should be
able to read the data written by newer versions, but we won't make that
guarantee for v2 until the spec is considered finished.

rb

On Mon, May 22, 2017 at 10:16 AM, Michael Allman <[email protected]>
wrote:

> Hi AndreiL,
>
> Were these files written with the Parquet V2 writer? The Spark 2.1
> vectorized reader does not appear to support that format.
>
> Michael
>
>
> > On May 9, 2017, at 11:04 AM, andreiL <[email protected]> wrote:
> >
> > Hi, I am getting an exception in Spark 2.1 reading parquet files where
> some
> > columns are DELTA_BYTE_ARRAY encoded.
> >
> > java.lang.UnsupportedOperationException: Unsupported encoding:
> > DELTA_BYTE_ARRAY
> >
> > Is this exception by design, or am I missing something?
> >
> > If I turn off the vectorized reader, reading these files works fine.
> >
> > AndreiL
> >
> >
> >
> > --
> > View this message in context: http://apache-spark-
> developers-list.1001551.n3.nabble.com/Parquet-vectorized-
> reader-DELTA-BYTE-ARRAY-tp21538.html
> > Sent from the Apache Spark Developers List mailing list archive at
> Nabble.com.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe e-mail: [email protected]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe e-mail: [email protected]
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Parquet vectorized reader DELTA_BYTE_ARRAY

Reply via email to