Michael, I think that the problem is with your classpath. Spark has a dependency to 1.7.7, which can't be changed. Your project is what pulls in parquet-avro and transitively Avro 1.8. Spark has no runtime dependency on Avro 1.8. It is understandably annoying that using the same version of Parquet for your parquet-avro dependency is what causes your project to depend on Avro 1.8, but Spark's dependencies aren't a problem because its Parquet dependency doesn't bring in Avro.
There are a few ways around this: 1. Make sure Avro 1.8 is found in the classpath first 2. Shade Avro 1.8 in your project (assuming Avro classes aren't shared) 3. Use parquet-avro 1.8.1 in your project, which I think should work with 1.8.2 and avoid the Avro change The work-around in Spark is for tests, which do use parquet-avro. We can look at a Parquet 1.8.3 that avoids this issue, but I think this is reasonable for the 2.2.0 release. rb On Mon, May 1, 2017 at 12:08 PM, Michael Heuer <heue...@gmail.com> wrote: > Please excuse me if I'm misunderstanding -- the problem is not with our > library or our classpath. > > There is a conflict within Spark itself, in that Parquet 1.8.2 expects to > find Avro 1.8.0 on the runtime classpath and sees 1.7.7 instead. Spark > already has to work around this for unit tests to pass. > > > > On Mon, May 1, 2017 at 2:00 PM, Ryan Blue <rb...@netflix.com> wrote: > >> Thanks for the extra context, Frank. I agree that it sounds like your >> problem comes from the conflict between your Jars and what comes with >> Spark. Its the same concern that makes everyone shudder when anything has a >> public dependency on Jackson. :) >> >> What we usually do to get around situations like this is to relocate the >> problem library inside the shaded Jar. That way, Spark uses its version of >> Avro and your classes use a different version of Avro. This works if you >> don't need to share classes between the two. Would that work for your >> situation? >> >> rb >> >> On Mon, May 1, 2017 at 11:55 AM, Koert Kuipers <ko...@tresata.com> wrote: >> >>> sounds like you are running into the fact that you cannot really put >>> your classes before spark's on classpath? spark's switches to support this >>> never really worked for me either. >>> >>> inability to control the classpath + inconsistent jars => trouble ? >>> >>> On Mon, May 1, 2017 at 2:36 PM, Frank Austin Nothaft < >>> fnoth...@berkeley.edu> wrote: >>> >>>> Hi Ryan, >>>> >>>> We do set Avro to 1.8 in our downstream project. We also set Spark as a >>>> provided dependency, and build an überjar. We run via spark-submit, which >>>> builds the classpath with our überjar and all of the Spark deps. This leads >>>> to avro 1.7.1 getting picked off of the classpath at runtime, which causes >>>> the no such method exception to occur. >>>> >>>> Regards, >>>> >>>> Frank Austin Nothaft >>>> fnoth...@berkeley.edu >>>> fnoth...@eecs.berkeley.edu >>>> 202-340-0466 <(202)%20340-0466> >>>> >>>> On May 1, 2017, at 11:31 AM, Ryan Blue <rb...@netflix.com> wrote: >>>> >>>> Frank, >>>> >>>> The issue you're running into is caused by using parquet-avro with Avro >>>> 1.7. Can't your downstream project set the Avro dependency to 1.8? Spark >>>> can't update Avro because it is a breaking change that would force users to >>>> rebuilt specific Avro classes in some cases. But you should be free to use >>>> Avro 1.8 to avoid the problem. >>>> >>>> On Mon, May 1, 2017 at 11:08 AM, Frank Austin Nothaft < >>>> fnoth...@berkeley.edu> wrote: >>>> >>>>> Hi Ryan et al, >>>>> >>>>> The issue we’ve seen using a build of the Spark 2.2.0 branch from a >>>>> downstream project is that parquet-avro uses one of the new Avro 1.8.0 >>>>> methods, and you get a NoSuchMethodError since Spark puts Avro 1.7.7 as a >>>>> dependency. My colleague Michael (who posted earlier on this thread) >>>>> documented this in Spark-19697 >>>>> <https://issues.apache.org/jira/browse/SPARK-19697>. I know that >>>>> Spark has unit tests that check this compatibility issue, but it looks >>>>> like >>>>> there was a recent change that sets a test scope dependency on Avro >>>>> 1.8.0 >>>>> <https://github.com/apache/spark/commit/0077bfcb93832d93009f73f4b80f2e3d98fd2fa4>, >>>>> which masks this issue in the unit tests. With this error, you can’t use >>>>> the ParquetAvroOutputFormat from a application running on Spark 2.2.0. >>>>> >>>>> Regards, >>>>> >>>>> Frank Austin Nothaft >>>>> fnoth...@berkeley.edu >>>>> fnoth...@eecs.berkeley.edu >>>>> 202-340-0466 <(202)%20340-0466> >>>>> >>>>> On May 1, 2017, at 10:02 AM, Ryan Blue <rb...@netflix.com.INVALID >>>>> <rb...@netflix.com.invalid>> wrote: >>>>> >>>>> I agree with Sean. Spark only pulls in parquet-avro for tests. For >>>>> execution, it implements the record materialization APIs in Parquet to go >>>>> directly to Spark SQL rows. This doesn't actually leak an Avro 1.8 >>>>> dependency into Spark as far as I can tell. >>>>> >>>>> rb >>>>> >>>>> On Mon, May 1, 2017 at 8:34 AM, Sean Owen <so...@cloudera.com> wrote: >>>>> >>>>>> See discussion at https://github.com/apache/spark/pull/17163 -- I >>>>>> think the issue is that fixing this trades one problem for a slightly >>>>>> bigger one. >>>>>> >>>>>> >>>>>> On Mon, May 1, 2017 at 4:13 PM Michael Heuer <heue...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Version 2.2.0 bumps the dependency version for parquet to 1.8.2 but >>>>>>> does not bump the dependency version for avro (currently at 1.7.7). >>>>>>> Though >>>>>>> perhaps not clear from the issue I reported [0], this means that Spark >>>>>>> is >>>>>>> internally inconsistent, in that a call through parquet (which depends >>>>>>> on >>>>>>> avro 1.8.0 [1]) may throw errors at runtime when it hits avro 1.7.7 on >>>>>>> the >>>>>>> classpath. Avro 1.8.0 is not binary compatible with 1.7.7. >>>>>>> >>>>>>> [0] - https://issues.apache.org/jira/browse/SPARK-19697 >>>>>>> [1] - https://github.com/apache/parquet-mr/blob/apache-parquet-1.8 >>>>>>> .2/pom.xml#L96 >>>>>>> >>>>>>> On Sun, Apr 30, 2017 at 3:28 AM, Sean Owen <so...@cloudera.com> >>>>>>> wrote: >>>>>>> >>>>>>>> I have one more issue that, if it needs to be fixed, needs to be >>>>>>>> fixed for 2.2.0. >>>>>>>> >>>>>>>> I'm fixing build warnings for the release and noticed that >>>>>>>> checkstyle actually complains there are some Java methods named in >>>>>>>> TitleCase, like `ProcessingTimeTimeout`: >>>>>>>> >>>>>>>> https://github.com/apache/spark/pull/17803/files#r113934080 >>>>>>>> >>>>>>>> Easy enough to fix and it's right, that's not conventional. However >>>>>>>> I wonder if it was done on purpose to match a class name? >>>>>>>> >>>>>>>> I think this is one for @tdas >>>>>>>> >>>>>>>> On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust < >>>>>>>> mich...@databricks.com> wrote: >>>>>>>> >>>>>>>>> Please vote on releasing the following candidate as Apache Spark >>>>>>>>> version 2.2.0. The vote is open until Tues, May 2nd, 2017 at >>>>>>>>> 12:00 PST and passes if a majority of at least 3 +1 PMC votes are >>>>>>>>> cast. >>>>>>>>> >>>>>>>>> [ ] +1 Release this package as Apache Spark 2.2.0 >>>>>>>>> [ ] -1 Do not release this package because ... >>>>>>>>> >>>>>>>>> >>>>>>>>> To learn more about Apache Spark, please see >>>>>>>>> http://spark.apache.org/ >>>>>>>>> >>>>>>>>> The tag to be voted on is v2.2.0-rc1 >>>>>>>>> <https://github.com/apache/spark/tree/v2.2.0-rc1> (8ccb4a57c82146c >>>>>>>>> 1a8f8966c7e64010cf5632cb6) >>>>>>>>> >>>>>>>>> List of JIRA tickets resolved can be found with this filter >>>>>>>>> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1> >>>>>>>>> . >>>>>>>>> >>>>>>>>> The release files, including signatures, digests, etc. can be >>>>>>>>> found at: >>>>>>>>> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0- >>>>>>>>> rc1-bin/ >>>>>>>>> >>>>>>>>> Release artifacts are signed with the following key: >>>>>>>>> https://people.apache.org/keys/committer/pwendell.asc >>>>>>>>> >>>>>>>>> The staging repository for this release can be found at: >>>>>>>>> https://repository.apache.org/content/repositories/orgapache >>>>>>>>> spark-1235/ >>>>>>>>> >>>>>>>>> The documentation corresponding to this release can be found at: >>>>>>>>> http://people.apache.org/~pwendell/spark-releases/spark-2.2. >>>>>>>>> 0-rc1-docs/ >>>>>>>>> >>>>>>>>> >>>>>>>>> *FAQ* >>>>>>>>> >>>>>>>>> *How can I help test this release?* >>>>>>>>> >>>>>>>>> If you are a Spark user, you can help us test this release by >>>>>>>>> taking an existing Spark workload and running on this release >>>>>>>>> candidate, >>>>>>>>> then reporting any regressions. >>>>>>>>> >>>>>>>>> *What should happen to JIRA tickets still targeting 2.2.0?* >>>>>>>>> >>>>>>>>> Committers should look at those and triage. Extremely important >>>>>>>>> bug fixes, documentation, and API tweaks that impact compatibility >>>>>>>>> should >>>>>>>>> be worked on immediately. Everything else please retarget to 2.3.0 or >>>>>>>>> 2.2.1. >>>>>>>>> >>>>>>>>> *But my bug isn't fixed!??!* >>>>>>>>> >>>>>>>>> In order to make timely releases, we will typically not hold the >>>>>>>>> release unless the bug in question is a regression from 2.1.1. >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>>>> >>>>> -- >>>>> Ryan Blue >>>>> Software Engineer >>>>> Netflix >>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Ryan Blue >>>> Software Engineer >>>> Netflix >>>> >>>> >>>> >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> > > -- Ryan Blue Software Engineer Netflix