Hi Ryan et al,

The issue we’ve seen using a build of the Spark 2.2.0 branch from a downstream 
project is that parquet-avro uses one of the new Avro 1.8.0 methods, and you 
get a NoSuchMethodError since Spark puts Avro 1.7.7 as a dependency. My 
colleague Michael (who posted earlier on this thread) documented this in 
Spark-19697 <https://issues.apache.org/jira/browse/SPARK-19697>. I know that 
Spark has unit tests that check this compatibility issue, but it looks like 
there was a recent change that sets a test scope dependency on Avro 1.8.0 
<https://github.com/apache/spark/commit/0077bfcb93832d93009f73f4b80f2e3d98fd2fa4>,
 which masks this issue in the unit tests. With this error, you can’t use the 
ParquetAvroOutputFormat from a application running on Spark 2.2.0.

Regards,

Frank Austin Nothaft
fnoth...@berkeley.edu
fnoth...@eecs.berkeley.edu
202-340-0466

> On May 1, 2017, at 10:02 AM, Ryan Blue <rb...@netflix.com.INVALID> wrote:
> 
> I agree with Sean. Spark only pulls in parquet-avro for tests. For execution, 
> it implements the record materialization APIs in Parquet to go directly to 
> Spark SQL rows. This doesn't actually leak an Avro 1.8 dependency into Spark 
> as far as I can tell.
> 
> rb
> 
> On Mon, May 1, 2017 at 8:34 AM, Sean Owen <so...@cloudera.com 
> <mailto:so...@cloudera.com>> wrote:
> See discussion at https://github.com/apache/spark/pull/17163 
> <https://github.com/apache/spark/pull/17163> -- I think the issue is that 
> fixing this trades one problem for a slightly bigger one.
> 
> 
> On Mon, May 1, 2017 at 4:13 PM Michael Heuer <heue...@gmail.com 
> <mailto:heue...@gmail.com>> wrote:
> Version 2.2.0 bumps the dependency version for parquet to 1.8.2 but does not 
> bump the dependency version for avro (currently at 1.7.7).  Though perhaps 
> not clear from the issue I reported [0], this means that Spark is internally 
> inconsistent, in that a call through parquet (which depends on avro 1.8.0 
> [1]) may throw errors at runtime when it hits avro 1.7.7 on the classpath.  
> Avro 1.8.0 is not binary compatible with 1.7.7.
> 
> [0] - https://issues.apache.org/jira/browse/SPARK-19697 
> <https://issues.apache.org/jira/browse/SPARK-19697>
> [1] - 
> https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96 
> <https://github.com/apache/parquet-mr/blob/apache-parquet-1.8.2/pom.xml#L96>
> 
> On Sun, Apr 30, 2017 at 3:28 AM, Sean Owen <so...@cloudera.com 
> <mailto:so...@cloudera.com>> wrote:
> I have one more issue that, if it needs to be fixed, needs to be fixed for 
> 2.2.0.
> 
> I'm fixing build warnings for the release and noticed that checkstyle 
> actually complains there are some Java methods named in TitleCase, like 
> `ProcessingTimeTimeout`:
> 
> https://github.com/apache/spark/pull/17803/files#r113934080 
> <https://github.com/apache/spark/pull/17803/files#r113934080>
> 
> Easy enough to fix and it's right, that's not conventional. However I wonder 
> if it was done on purpose to match a class name?
> 
> I think this is one for @tdas
> 
> On Thu, Apr 27, 2017 at 7:31 PM Michael Armbrust <mich...@databricks.com 
> <mailto:mich...@databricks.com>> wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 2.2.0. The vote is open until Tues, May 2nd, 2017 at 12:00 PST and passes if 
> a majority of at least 3 +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Spark 2.2.0
> [ ] -1 Do not release this package because ...
> 
> 
> To learn more about Apache Spark, please see http://spark.apache.org/ 
> <http://spark.apache.org/>
> 
> The tag to be voted on is v2.2.0-rc1 
> <https://github.com/apache/spark/tree/v2.2.0-rc1> 
> (8ccb4a57c82146c1a8f8966c7e64010cf5632cb6)
> 
> List of JIRA tickets resolved can be found with this filter 
> <https://issues.apache.org/jira/browse/SPARK-20134?jql=project%20%3D%20SPARK%20AND%20fixVersion%20%3D%202.1.1>.
> 
> The release files, including signatures, digests, etc. can be found at:
> http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/ 
> <http://home.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-bin/>
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc 
> <https://people.apache.org/keys/committer/pwendell.asc>
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1235/ 
> <https://repository.apache.org/content/repositories/orgapachespark-1235/>
> 
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/ 
> <http://people.apache.org/~pwendell/spark-releases/spark-2.2.0-rc1-docs/>
> 
> 
> FAQ
> 
> How can I help test this release?
> 
> If you are a Spark user, you can help us test this release by taking an 
> existing Spark workload and running on this release candidate, then reporting 
> any regressions.
> 
> What should happen to JIRA tickets still targeting 2.2.0?
> 
> Committers should look at those and triage. Extremely important bug fixes, 
> documentation, and API tweaks that impact compatibility should be worked on 
> immediately. Everything else please retarget to 2.3.0 or 2.2.1.
> 
> But my bug isn't fixed!??!
> 
> In order to make timely releases, we will typically not hold the release 
> unless the bug in question is a regression from 2.1.1.
> 
> 
> 
> 
> -- 
> Ryan Blue
> Software Engineer
> Netflix

Reply via email to