Why fetchSize should be bigger than 0 in JDBCOptions.scala?

Young Fri, 30 Mar 2018 00:44:15 -0700

My executor will be OOM when use spark-sql to read data from Mysql.

In 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JDBCOptions.scala,
 I see the following lines.I'm wandering why JDBC_BATCH_FETCH_SIZE should be 
bigger than 0？
val fetchSize = {
val size = parameters.getOrElse(JDBC_BATCH_FETCH_SIZE, "0").toInt
  require(size >= 0,
s"Invalid value `${size.toString}` for parameter " +s"`$JDBC_BATCH_FETCH_SIZE`. 
The minimum value is 0. When the value is 0, " +
"the JDBC driver ignores the value and does the estimates.")
  size


}




According to this,  fetchSize should be Integer.MIN_VALUE  to stream result 
sets row-by-row. And in core/src/main/scala/org/apache/spark/rdd/JdbcRDD.scala 
(looks like an old style? ),  I see fetchSize is set to Integer.MIN_VALUE too.
override def compute(thePart: Partition, context: TaskContext): Iterator[T] = 
new NextIterator[T]
{
  context.addTaskCompletionListener{ context => closeIfNeeded() }
val part = thePart.asInstanceOf[JdbcPartition]
val conn = getConnection()
val stmt = conn.prepareStatement(sql, ResultSet.TYPE_FORWARD_ONLY, 
ResultSet.CONCUR_READ_ONLY)

val url = conn.getMetaData.getURL
if (url.startsWith("jdbc:mysql:")) {
// setFetchSize(Integer.MIN_VALUE) is a mysql driver specific way to force
    // streaming results, rather than pulling entire resultset into memory.
    // See the below URL
    // 
dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html

stmt.setFetchSize(Integer.MIN_VALUE)
  } else {
stmt.setFetchSize(100)
  }



Thanks

Why fetchSize should be bigger than 0 in JDBCOptions.scala?

Reply via email to