Michael is right, the delta byte array encoding is a Parquet v2 feature. Parquet v2 isn't finished yet, though some features are in releases and those features will be supported in future releases. In other words, Parquet will maintain backward-compatibility for any released v2 features.
I don't recommend using Parquet v2 yet because Parquet doesn't guarantee forward-compatibility for those features. For v1, old readers should be able to read the data written by newer versions, but we won't make that guarantee for v2 until the spec is considered finished. rb On Mon, May 22, 2017 at 10:16 AM, Michael Allman <mich...@videoamp.com> wrote: > Hi AndreiL, > > Were these files written with the Parquet V2 writer? The Spark 2.1 > vectorized reader does not appear to support that format. > > Michael > > > > On May 9, 2017, at 11:04 AM, andreiL <leibov...@rogers.com> wrote: > > > > Hi, I am getting an exception in Spark 2.1 reading parquet files where > some > > columns are DELTA_BYTE_ARRAY encoded. > > > > java.lang.UnsupportedOperationException: Unsupported encoding: > > DELTA_BYTE_ARRAY > > > > Is this exception by design, or am I missing something? > > > > If I turn off the vectorized reader, reading these files works fine. > > > > AndreiL > > > > > > > > -- > > View this message in context: http://apache-spark- > developers-list.1001551.n3.nabble.com/Parquet-vectorized- > reader-DELTA-BYTE-ARRAY-tp21538.html > > Sent from the Apache Spark Developers List mailing list archive at > Nabble.com. > > > > --------------------------------------------------------------------- > > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Ryan Blue Software Engineer Netflix