Hello Vinay, Mina Sorry about the late response. I got a chance today to verify the feature of csv reader in Spar 2.0. It worked for me. Thanks for the direction.
AB On Wed, Aug 17, 2016 at 6:24 PM, Vinay Shukla <vinayshu...@gmail.com> wrote: > Abul, > > Mina is right, until Spark 1.6 csv parsing was bailable as a separate > spark package. With Spark 2.0 csv parsing is built in. Zeppelin 0.6.1 ships > with Spark 2.0. > > Thanks, > Vinay > > > On Wednesday, August 17, 2016, Mina Lee <mina...@apache.org> wrote: > >> Hi Abul, >> >> spark-csv is integrated into spark itself so you don't need to load >> spark-csv dependencies anymore. >> >> Could you try below instead? >> >> val df = sqlContext.read. >> options(Map("header" -> "true", "inferSchema" -> "true")). >> csv("hdfs:// ... /S&P") >> >> df.printSchema >> >> Hope this solves your issue! >> >> Mina >> >> On Wed, Aug 17, 2016 at 11:43 AM Abul Basar <aba...@einext.com> wrote: >> >>> Hello, >>> >>> It is exciting to see new release 0.6.1 in a short span after 0.6 >>> release. >>> >>> I am test driving 0.6.1 with spark 2.0 (Scala 2.11). RDD, DF operations >>> are working fine. I am facing a problem while using csv package ( >>> https://github.com/databricks/spark-csv). >>> >>> i added "com.databricks:spark-csv_2.11:1.4.0" in the interpreter >>> dependencies using UI and I am trying the following code. I restarted >>> zeppelin. >>> >>> >>> val df = spark.sqlContext.read. >>> format("com.databricks.spark.csv"). >>> options(Map("header" -> "true", "inferSchema" -> "true")). >>> load("hdfs:// ... /S&P") >>> >>> df.printSchema >>> >>> >>> The above statement errors out with the follow message >>> >>> java.lang.NoSuchMethodError: com.univocity.parsers.csv.CsvP >>> arserSettings.setUnescapedQuoteHandling(Lcom/univocity/ >>> parsers/csv/UnescapedQuoteHandling;)V >>> at org.apache.spark.sql.execution.datasources.csv.CsvReader. >>> parser$lzycompute(CSVParser.scala:50) >>> at org.apache.spark.sql.execution.datasources.csv.CsvReader. >>> parser(CSVParser.scala:35) >>> at org.apache.spark.sql.execution.datasources.csv.LineCsvReader >>> .parseLine(CSVParser.scala:117) >>> at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat >>> .inferSchema(CSVFileFormat.scala:59) >>> at org.apache.spark.sql.execution.datasources.DataSource$$ >>> anonfun$15.apply(DataSource.scala:392) >>> at org.apache.spark.sql.execution.datasources.DataSource$$ >>> anonfun$15.apply(DataSource.scala:392) >>> at scala.Option.orElse(Option.scala:289) >>> at org.apache.spark.sql.execution.datasources.DataSource. >>> resolveRelation(DataSource.scala:391) >>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:149) >>> at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:132) >>> ... 46 elided >>> >>> >>> I successfully tested the same code using REPL.The above error seems a >>> bug introduced in 0.6.1. It works fine in 0.6.0. >>> >>> Any ideas about how to resolve the issue? >>> >>> Thanks! >>> - AB >>> >>> >>>