Hello, I currently run a Spark project based on cities, local authorities, enterprises, local communities, etc. Ten Datasets written in Java are doing operations going from simple join to elaborate ones. Language used is Java. 20 integrations tests with the whole data (20 GB) takes seven hour.
*All work perfectly under Spark 2.4.6 - Scala 2.12 - Java 11 or 8*. I remember it was working well on Spark 2.4.5 too, but had many troubles in the past with Spark 2.4.3 (if I remember well from L4Z algorithms often). I attempted to run my integration tests on Spark 3.0.1. Many of them has failed, with strange messages. Something about lambda or about Map that where no more taken into account when in a Java Dataset, object or schema ? I then gone back, but to Spark 2.4.7. To make a try. And Spark 2.4.7. also encounters troubles that 2.4.6. didn't have. My question : May I create an issue on JIRA based on the comparison of the executions of my project with different versions of Spark, reporting error messages received, call stacks and showing the lines around the one that encountered a problem if available, even if I can't provide you test cases for each trouble ? Would this be able to give you hints about things that are going wrong ? I could then have a try with some development version if needed (when asked for) to see if my project returns to stability. -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org