[
https://issues.apache.org/jira/browse/SPARK-18413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
lichenglin closed SPARK-18413.
------------------------------
Resolution: Invalid
> Add a property to control the number of partitions when save a jdbc rdd
> -----------------------------------------------------------------------
>
> Key: SPARK-18413
> URL: https://issues.apache.org/jira/browse/SPARK-18413
> Project: Spark
> Issue Type: Wish
> Components: SQL
> Affects Versions: 2.0.1
> Reporter: lichenglin
>
> {code}
> CREATE or replace TEMPORARY VIEW resultview
> USING org.apache.spark.sql.jdbc
> OPTIONS (
> url "jdbc:oracle:thin:@10.129.10.111:1521:BKDB",
> dbtable "result",
> user "HIVE",
> password "HIVE"
> );
> --set spark.sql.shuffle.partitions=200
> insert overwrite table resultview select g,count(1) as count from
> tnet.DT_LIVE_INFO group by g
> {code}
> I'm tring to save a spark sql result to oracle.
> And I found spark will create a jdbc connection for each partition.
> if the sql create to many partitions , the database can't hold so many
> connections and return exception.
> In above situation is 200 because of the "group by" and
> "spark.sql.shuffle.partitions"
> the spark source code JdbcUtil is
> {code}
> def saveTable(
> df: DataFrame,
> url: String,
> table: String,
> properties: Properties) {
> val dialect = JdbcDialects.get(url)
> val nullTypes: Array[Int] = df.schema.fields.map { field =>
> getJdbcType(field.dataType, dialect).jdbcNullType
> }
> val rddSchema = df.schema
> val getConnection: () => Connection = createConnectionFactory(url,
> properties)
> val batchSize = properties.getProperty(JDBC_BATCH_INSERT_SIZE,
> "1000").toInt
> df.foreachPartition { iterator =>
> savePartition(getConnection, table, iterator, rddSchema, nullTypes,
> batchSize, dialect)
> }
> }
> {code}
> May be we can add a property for df.repartition(num).foreachPartition ?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]