Hi Ryan et al, The issue we’ve seen using a build of the Spark 2.2.0 branch from a downstream project is that parquet-avro uses one of the new Avro 1.8.0 methods, and you get a NoSuchMethodError since Spark puts Avro 1.7.7 as a dependency. My colleague Michael (who posted earlier on this thread) documented this in Spark-19697 <https://issues.apache.org/jira/browse/SPARK-19697>. I know that Spark has unit tests that check this compatibility issue, but it looks like there was a recent change that sets a test scope dependency on Avro 1.8.0 <https://github.com/apache/spark/commit/0077bfcb93832d93009f73f4b80f2e3d98fd2fa4>, which masks this issue in the unit tests. With this error, you can’t use the ParquetAvroOutputFormat from a application running on Spark 2.2.0.
Regards, Frank Austin Nothaft fnoth...@berkeley.edu fnoth...@eecs.berkeley.edu 202-340-0466 > On May 1, 2017, at 10:02 AM, Ryan Blue <rb...@netflix.com.INVALID> wrote: > > I agree with Sean. Spark only pulls in parquet-avro for tests. For execution, > it implements the record materialization APIs in Parquet to go directly to > Spark SQL rows. This doesn't actually leak an Avro 1.8 dependency into Spark > as far as I can tell. > > rb > > On Mon, May 1, 2017 at 8:34 AM, Sean Owen <so...@cloudera.com > <mailto:so...@cloudera.com>> wrote: > See discussion at https://github.com/apache/spark/pull/17163 > <https://github.com/apache/spark/pull/17163> -- I think the issue is that > fixing this trades one problem for a slightly bigger one. > > > On Mon, May 1, 2017 at 4:13 PM Michael Heuer <heue...@gmail.com > <mailto:heue...@gmail.com>> wrote: > Version 2.2.0 bumps the dependency version for parquet to 1.8.2 but does not > bump the dependency version for avro (currently at 1.7.7). Though perhaps > not clear from the issue I reported [0], this means that Spark is internally > inconsistent, in that a call through parquet (which depends on avro 1.8.0 > [1]) may throw errors at runtime when it hits avro 1.7.7 on the classpath. > Avro 1.8.0 is not binary compatible with 1.7.7. > > [0] - https://issues.apache.org/jira/browse/SPARK-19697 > <https://issues.apache.org/jira/browse/SPARK-19697> > [1] - > https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96 > <https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96> > > On Sun, Apr 30, 2017 at 3:28 AM, Sean Owen <so...@cloudera.com > <mailto:so...@cloudera.com>> wrote: > I have one more issue that, if it needs to be fixed, needs to be fixed for > 2.2.0. > > I'm fixing build warnings for the release and noticed that checkstyle > actually complains there are some Java methods named in TitleCase, like > `ProcessingTimeTimeout`: > > https://github.com/apache/spark/pull/17803/files#r113934080 > <https://github.com/apache/spark/pull/17803/files#r113934080> > > Easy enough to fix and it's right, that's not conventional. However I wonder > if it was done on purpose to match a class name? > > I think this is one for @tdas > > On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <mich...@databricks.com > <mailto:mich...@databricks.com>> wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if > a majority of at least 3 +1 PMC votes are cast. > > [ ] +1 Release this package as Apache Spark 2.2.0 > [ ] -1 Do not release this package because ... > > > To learn more about Apache Spark, please see http://spark.apache.org/ > <http://spark.apache.org/> > > The tag to be voted on is v2.2.0-rc1 > <https://github.com/apache/spark/tree/v2.2.0-rc1> > (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6) > > List of JIRA tickets resolved can be found with this filter > <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1>. > > The release files, including signatures, digests, etc. can be found at: > http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/ > <http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/> > > Release artifacts are signed with the following key: > https://people.apache.org/keys/committer/pwendell.asc > <https://people.apache.org/keys/committer/pwendell.asc> > > The staging repository for this release can be found at: > https://repository.apache.org/content/repositories/orgapachespark-1235/ > <https://repository.apache.org/content/repositories/orgapachespark-1235/> > > The documentation corresponding to this release can be found at: > http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/ > <http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/> > > > FAQ > > How can I help test this release? > > If you are a Spark user, you can help us test this release by taking an > existing Spark workload and running on this release candidate, then reporting > any regressions. > > What should happen to JIRA tickets still targeting 2.2.0? > > Committers should look at those and triage. Extremely important bug fixes, > documentation, and API tweaks that impact compatibility should be worked on > immediately. Everything else please retarget to 2.3.0 or 2.2.1. > > But my bug isn't fixed!??! > > In order to make timely releases, we will typically not hold the release > unless the bug in question is a regression from 2.1.1. > > > > > -- > Ryan Blue > Software Engineer > Netflix