CSV spark package not working in v0.6.1, spark 2.0, scala 2.11

Abul Basar Wed, 17 Aug 2016 02:43:54 -0700

Hello,

It is exciting to see new release 0.6.1 in a short span after 0.6 release.


I am test driving 0.6.1 with spark 2.0 (Scala 2.11). RDD, DF operations are
working fine. I am facing a problem while using csv package (
https://github.com/databricks/spark-csv).

i added "com.databricks:spark-csv_2.11:1.4.0" in the interpreter
dependencies using UI and  I am trying the following code. I restarted
zeppelin.


val df = spark.sqlContext.read.
format("com.databricks.spark.csv").
options(Map("header" -> "true", "inferSchema" -> "true")).
load("hdfs:// ... /S&P")

df.printSchema


The above statement errors out with the follow message

java.lang.NoSuchMethodError:
com.univocity.parsers.csv.CsvParserSettings.setUnescapedQuoteHandling(Lcom/univocity/parsers/csv/UnescapedQuoteHandling;)V
at
org.apache.spark.sql.execution.datasources.csv.CsvReader.parser$lzycompute(CSVParser.scala:50)
at
org.apache.spark.sql.execution.datasources.csv.CsvReader.parser(CSVParser.scala:35)
at
org.apache.spark.sql.execution.datasources.csv.LineCsvReader.parseLine(CSVParser.scala:117)
at
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.inferSchema(CSVFileFormat.scala:59)
at
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:392)
at
org.apache.spark.sql.execution.datasources.DataSource$$anonfun$15.apply(DataSource.scala:392)
at scala.Option.orElse(Option.scala:289)
at
org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:391)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
... 46 elided


I successfully tested the same code using REPL.The above error seems a bug
introduced in 0.6.1. It works fine in 0.6.0.

Any ideas about how to resolve the issue?

Thanks!
- AB

CSV spark package not working in v0.6.1, spark 2.0, scala 2.11

Reply via email to