hi, all,
We deploy sparksql in standalone mode without HDFS on 1 machine with 256G RAM and 64 cores. The spark session props like below: SparkSession.builder().appName("MYAPP") > .config("spark.sql.crossJoin.enabled", "true") > .config("spark.executor.memory", this.memory_limit) > .config("spark.executor.cores", 2) > .config("spark.driver.memory", "2g") > .config("spark.storage.memoryFraction", 0.3) > .config("spark.serializer", > "org.apache.spark.serializer.KryoSerializer") > .config("spark.executor.extraJavaOptions", > "-XX:+UseG1GC -XX:+PrintFlagsFinal > -XX:+PrintReferenceGC " + > "-verbose:gc -XX:+PrintGCDetails " + > "-XX:+PrintGCTimeStamps > -XX:+PrintAdaptiveSizePolicy") > .master(this.spark_master) > .getOrCreate(); The MySQL JDBC connection props like below: Properties connProp = new Properties(); > connProp.put("driver", "com.mysql.jdbc.Driver"); > connProp.put("useSSL", "false"); > connProp.put("user", this.user); > connProp.put("password", this.password); The we register the MySQL table as Dataset : Dataset<Row> jdbcDF1 = ss.read().jdbc(this.url, "(select * from bigtable) > t1", connProp); > jdbcDF1.createOrReplaceTempView("t1"); Dataset<Row> jdbcDF2 = ss.read().jdbc(this.url, "(select * from smalltable) > t2", connProp); > jdbcDF2.createOrReplaceTempView("t2"); > Dataset<Row> result = sparksession.sql("select * from t1, t2 where xxxx"); When run the job, we got the OOM error in our java program: > Lost task 6.0 in stage 1156.0 (TID 16686, 172.16.50.103, executor 5): > java.lang.OutOfMemoryError: Java heap space > at com.mysql.jdbc.MysqlIO.nextRowFast(MysqlIO.java:2213) > at com.mysql.jdbc.MysqlIO.nextRow(MysqlIO.java:1992) > at com.mysql.jdbc.MysqlIO.readSingleRowSet(MysqlIO.java:3413) > at com.mysql.jdbc.MysqlIO.getResultSet(MysqlIO.java:471) > at > com.mysql.jdbc.MysqlIO.readResultsForQueryOrUpdate(MysqlIO.java:3115) > at com.mysql.jdbc.MysqlIO.readAllResults(MysqlIO.java:2344) > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2739) > at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2486) > at > com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1858) > at > com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1966) > at > org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD.compute(JDBCRDD.scala:301) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) If there something configuration wrong ? how to fix that? will sparksql use disk when memory not enough?