> > df = spark.sqlContext.read.csv('out/df_in.csv') > shouldn't this be just -
df = spark.read.csv('out/df_in.csv') sparkSession itself is in entry point to dataframes and SQL functionality . Thank you, *Pushkar Gujar* On Tue, May 9, 2017 at 6:09 PM, Mark Hamstra <m...@clearstorydata.com> wrote: > Looks to me like it is a conflict between a Databricks library and Spark > 2.1. That's an issue for Databricks to resolve or provide guidance. > > On Tue, May 9, 2017 at 2:36 PM, lucas.g...@gmail.com <lucas.g...@gmail.com > > wrote: > >> I'm a bit confused by that answer, I'm assuming it's spark deciding which >> lib to use. >> >> On 9 May 2017 at 14:30, Mark Hamstra <m...@clearstorydata.com> wrote: >> >>> This looks more like a matter for Databricks support than spark-user. >>> >>> On Tue, May 9, 2017 at 2:02 PM, lucas.g...@gmail.com < >>> lucas.g...@gmail.com> wrote: >>> >>>> df = spark.sqlContext.read.csv('out/df_in.csv') >>>>> >>>> >>>> >>>>> 17/05/09 15:51:29 WARN ObjectStore: Version information not found in >>>>> metastore. hive.metastore.schema.verification is not enabled so >>>>> recording the schema version 1.2.0 >>>>> 17/05/09 15:51:29 WARN ObjectStore: Failed to get database default, >>>>> returning NoSuchObjectException >>>>> 17/05/09 15:51:30 WARN ObjectStore: Failed to get database >>>>> global_temp, returning NoSuchObjectException >>>>> >>>> >>>> >>>>> Py4JJavaError: An error occurred while calling o72.csv. >>>>> : java.lang.RuntimeException: Multiple sources found for csv >>>>> (*com.databricks.spark.csv.DefaultSource15, >>>>> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat*), >>>>> please specify the fully qualified class name. >>>>> at scala.sys.package$.error(package.scala:27) >>>>> at org.apache.spark.sql.execution.datasources.DataSource$.looku >>>>> pDataSource(DataSource.scala:591) >>>>> at org.apache.spark.sql.execution.datasources.DataSource.provid >>>>> ingClass$lzycompute(DataSource.scala:86) >>>>> at org.apache.spark.sql.execution.datasources.DataSource.provid >>>>> ingClass(DataSource.scala:86) >>>>> at org.apache.spark.sql.execution.datasources.DataSource.resolv >>>>> eRelation(DataSource.scala:325) >>>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.sc >>>>> ala:152) >>>>> at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:415) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >>>>> ssorImpl.java:57) >>>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >>>>> thodAccessorImpl.java:43) >>>>> at java.lang.reflect.Method.invoke(Method.java:606) >>>>> at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) >>>>> at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) >>>>> at py4j.Gateway.invoke(Gateway.java:280) >>>>> at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.j >>>>> ava:132) >>>>> at py4j.commands.CallCommand.execute(CallCommand.java:79) >>>>> at py4j.GatewayConnection.run(GatewayConnection.java:214) at >>>>> java.lang.Thread.run(Thread.java:745) >>>> >>>> >>>> When I change our call to: >>>> >>>> df = spark.hiveContext.read \ >>>> .format('org.apache.spark.sql.execution.datasources.csv.CSVFileFormat') >>>> \ >>>> .load('df_in.csv) >>>> >>>> No such issue, I was under the impression (obviously wrongly) that >>>> spark would automatically pick the local lib. We have the databricks >>>> library because other jobs still explicitly call it. >>>> >>>> Is the 'correct answer' to go through and modify so as to remove the >>>> databricks lib / remove it from our deploy? Or should this just work? >>>> >>>> One of the things I find less helpful in the spark docs are when >>>> there's multiple ways to do it but no clear guidance on what those methods >>>> are intended to accomplish. >>>> >>>> Thanks! >>>> >>> >>> >> >