[ 
https://issues.apache.org/jira/browse/SPARK-18687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15714630#comment-15714630
 ] 

Vinayak Joshi commented on SPARK-18687:
---------------------------------------

[~srowen] Understand - however, it appears we have users who code written that 
way and it used to work with 1.6 but breaks with 2.0. Since SQLContext has been 
preserved for backward compatibility, we're looking to see if this can problem 
can be plugged. 

A similar sequence of calls using Scala in spark-shell remains fine and looking 
into the code I figure it's because of the way the scala impl of SQLContext 
reuses the existing SparkSession internally. I am going to submit a PR on the 
same lines for the python impl of SQLContext that appears to fix the problem. 
However, I am not an expert in this particular part of the code so hopefully my 
change can be reviewed and considered for this issue.

> Backward compatibility - creating a Dataframe on a new SQLContext object 
> fails with a Derby error
> -------------------------------------------------------------------------------------------------
>
>                 Key: SPARK-18687
>                 URL: https://issues.apache.org/jira/browse/SPARK-18687
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark, SQL
>    Affects Versions: 2.0.0, 2.0.1, 2.0.2
>         Environment: Spark built with hive support
>            Reporter: Vinayak Joshi
>
> With a local spark instance built with hive support, (-Pyarn -Phadoop-2.6 
> -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver)
> The following script/sequence works in Pyspark without any error in 1.6.x, 
> but fails in 2.x.
> {code}
> people = sc.parallelize(["Michael,30", "Andy,12", "Justin,19"])
> peoplePartsRDD = people.map(lambda p: p.split(","))
> peopleRDD = peoplePartsRDD.map(lambda p: pyspark.sql.Row(name=p[0], 
> age=int(p[1])))
> peopleDF= sqlContext.createDataFrame(peopleRDD)
> peopleDF.first()
> sqlContext2 = SQLContext(sc)
> people2 = sc.parallelize(["Abcd,40", "Efgh,14", "Ijkl,16"])
> peoplePartsRDD2 = people2.map(lambda l: l.split(","))
> peopleRDD2 = peoplePartsRDD2.map(lambda p: pyspark.sql.Row(fname=p[0], 
> age=int(p[1])))
> peopleDF2 = sqlContext2.createDataFrame(peopleRDD2) # <==== error here
> {code}
> The error produced is:
> {noformat}
> 16/12/01 22:35:36 ERROR Schema: Failed initialising database.
> Unable to open a test connection to the given database. JDBC url = 
> jdbc:derby:;databaseName=metastore_db;create=true, username = APP. 
> Terminating connection pool (set lazyInit to true if you expect to start your 
> database after your app). Original Exception: ------
> java.sql.SQLException: Failed to start database 'metastore_db' with class 
> loader org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@4494053, 
> see the next exception for details.
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>         at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
> .
> .
> ------
> org.datanucleus.exceptions.NucleusDataStoreException: Unable to open a test 
> connection to the given database. JDBC url = 
> jdbc:derby:;databaseName=metastore_db;create=true, username = APP. 
> Terminating connection pool (set lazyInit to true if you expect to start your 
> database after your app). Original Exception: ------
> java.sql.SQLException: Failed to start database 'metastore_db' with class 
> loader 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@519dabfd, see 
> the next exception for details.
>         at org.apache.derby.impl.jdb
> .
> .
> .
> NestedThrowables:
> java.sql.SQLException: Unable to open a test connection to the given 
> database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, 
> username = APP. Terminating connection pool (set lazyInit to true if you 
> expect to start your database after your app). Original Exception: ------
> java.sql.SQLException: Failed to start database 'metastore_db' with class 
> loader 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@519dabfd, see 
> the next exception for details.
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
> .
> .
> .
> Caused by: java.sql.SQLException: Unable to open a test connection to the 
> given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, 
> username = APP. Terminating connection pool (set lazyInit to true if you 
> expect to start your database after your app). Original Exception: ------
> java.sql.SQLException: Failed to start database 'metastore_db' with class 
> loader 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@519dabfd, see 
> the next exception for details.
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>         at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
>         at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown 
> Source)
>         at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
> .
> .
> .
> 16/12/01 22:48:09 ERROR Schema: Failed initialising database.
> Unable to open a test connection to the given database. JDBC url = 
> jdbc:derby:;databaseName=metastore_db;create=true, username = APP. 
> Terminating connection pool (set lazyInit to true if you expect to start your 
> database after your app). Original Exception: ------
> java.sql.SQLException: Failed to start database 'metastore_db' with class 
> loader 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@519dabfd, see 
> the next exception for details.
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>         at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
>         at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown 
> Source)
>         at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
> .
> .
> .
> Caused by: java.sql.SQLException: Failed to start database 'metastore_db' 
> with class loader 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@519dabfd, see 
> the next exception for details.
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
>         at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
>         at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown 
> Source)
>         at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
> .
> .
> .
> Caused by: ERROR XJ040: Failed to start database 'metastore_db' with class 
> loader 
> org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@519dabfd, see 
> the next exception for details.
>         at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>         at 
> org.apache.derby.impl.jdbc.SQLExceptionFactory.wrapArgsForTransportAcrossDRDA(Unknown
>  Source)
>         ... 111 more
> Caused by: ERROR XSDB6: Another instance of Derby may have already booted the 
> database 
> /Users/vinayak/devel/spark-stc/git_repo/spark-master-x/spark/metastore_db.
>         at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>         at org.apache.derby.iapi.error.StandardException.newException(Unknown 
> Source)
>         at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.privGetJBMSLockOnDB(Unknown
>  Source)
>         at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.run(Unknown Source)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.getJBMSLockOnDB(Unknown
>  Source)
>         at 
> org.apache.derby.impl.store.raw.data.BaseDataFileFactory.boot(Unknown Source)
>         at org.apache.derby.impl.services.monitor.BaseMonitor.boot(Unknown 
> Source)
> {noformat}
> The error goes away if sqlContext2 is replaced with sqlContext in the last 
> (error) line. Since the SQLContext class is preserved for backward 
> compatibility, the changes in 2.x break scripts/notebooks that follow the 
> above pattern of calls and used to run fine with 1.6.x.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to