Re: CSV spark package not working in v0.6.1, spark 2.0, scala 2.11

Abul Basar Fri, 19 Aug 2016 06:07:20 -0700

Hello Vinay, Mina

Sorry about the late response. I got a chance today to verify the feature
of csv reader in Spar 2.0. It worked for me. Thanks for the direction.



AB

On Wed, Aug 17, 2016 at 6:24 PM, Vinay Shukla <vinayshu...@gmail.com> wrote:

> Abul,
>
> Mina is right, until Spark 1.6 csv parsing was bailable as a separate
> spark package. With Spark 2.0 csv parsing is built in. Zeppelin 0.6.1 ships
> with Spark 2.0.
>
> Thanks,
> Vinay
>
>
> On Wednesday, August 17, 2016, Mina Lee <mina...@apache.org> wrote:
>
>> Hi Abul,
>>
>> spark-csv is integrated into spark itself so you don't need to load
>> spark-csv dependencies anymore.
>>
>> Could you try below instead?
>>
>> val df = sqlContext.read.
>> options(Map("header" -> "true", "inferSchema" -> "true")).
>> csv("hdfs:// ... /S&P")
>>
>> df.printSchema
>>
>> Hope this solves your issue!
>>
>> Mina
>>
>> On Wed, Aug 17, 2016 at 11:43 AM Abul Basar <aba...@einext.com> wrote:
>>
>>> Hello,
>>>
>>> It is exciting to see new release 0.6.1 in a short span after 0.6
>>> release.
>>>
>>> I am test driving 0.6.1 with spark 2.0 (Scala 2.11). RDD, DF operations
>>> are working fine. I am facing a problem while using csv package (
>>> https://github.com/databricks/spark-csv).
>>>
>>> i added "com.databricks:spark-csv_2.11:1.4.0" in the interpreter
>>> dependencies using UI and  I am trying the following code. I restarted
>>> zeppelin.
>>>
>>>
>>> val df = spark.sqlContext.read.
>>> format("com.databricks.spark.csv").
>>> options(Map("header" -> "true", "inferSchema" -> "true")).
>>> load("hdfs:// ... /S&P")
>>>
>>> df.printSchema
>>>
>>>
>>> The above statement errors out with the follow message
>>>
>>> java.lang.NoSuchMethodError: com.univocity.parsers.csv.CsvP
>>> arserSettings.setUnescapedQuoteHandling(Lcom/univocity/
>>> parsers/csv/UnescapedQuoteHandling;)V
>>> at org.apache.spark.sql.execution.datasources.csv.CsvReader.
>>> parser$lzycompute(CSVParser.scala:50)
>>> at org.apache.spark.sql.execution.datasources.csv.CsvReader.
>>> parser(CSVParser.scala:35)
>>> at org.apache.spark.sql.execution.datasources.csv.LineCsvReader
>>> .parseLine(CSVParser.scala:117)
>>> at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat
>>> .inferSchema(CSVFileFormat.scala:59)
>>> at org.apache.spark.sql.execution.datasources.DataSource$$
>>> anonfun$15.apply(DataSource.scala:392)
>>> at org.apache.spark.sql.execution.datasources.DataSource$$
>>> anonfun$15.apply(DataSource.scala:392)
>>> at scala.Option.orElse(Option.scala:289)
>>> at org.apache.spark.sql.execution.datasources.DataSource.
>>> resolveRelation(DataSource.scala:391)
>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149)
>>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132)
>>> ... 46 elided
>>>
>>>
>>> I successfully tested the same code using REPL.The above error seems a
>>> bug introduced in 0.6.1. It works fine in 0.6.0.
>>>
>>> Any ideas about how to resolve the issue?
>>>
>>> Thanks!
>>> - AB
>>>
>>>
>>>

Re: CSV spark package not working in v0.6.1, spark 2.0, scala 2.11

Reply via email to