Hi Jey, This solves the class not found problem. Thanks.
But still the inputs format is not yet resolved. Looks like it is still trying to create a HadoopRDD I don't know why. The error message goes like - java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190) at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) at scala.Option.getOrElse(Option.scala:120) at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1251) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) at org.apache.spark.rdd.RDD.take(RDD.scala:1246) at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1286) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) at org.apache.spark.rdd.RDD.first(RDD.scala:1285) at com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:129) at com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:127) at com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:109) at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:62) at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:115) at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:40) at com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:28) at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:265) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26) at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30) at $iwC$$iwC$$iwC.<init>(<console>:32) at $iwC$$iwC.<init>(<console>:34) at $iwC.<init>(<console>:36) at <init>(<console>:38) at .<init>(<console>:42) at .<clinit>(<console>) at java.lang.J9VMInternals.initializeImpl(Native Method) at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) at .<init>(<console>:7) at .<clinit>(<console>) at java.lang.J9VMInternals.initializeImpl(Native Method) at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) at $print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) at org.apache.spark.repl.SparkILoop.org $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) at org.apache.spark.repl.SparkILoop.org $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) at org.apache.spark.repl.Main$.main(Main.scala:31) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) at java.lang.reflect.Method.invoke(Method.java:611) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) ... 83 more Regards, Sourav On Mon, Jun 29, 2015 at 6:53 PM, Jey Kottalam <j...@cs.berkeley.edu> wrote: > The format is still "com.databricks.spark.csv", but the parameter passed > to spark-shell is "--packages com.databricks:spark-csv_2.11:1.1.0". > > On Mon, Jun 29, 2015 at 2:59 PM, Sourav Mazumder < > sourav.mazumde...@gmail.com> wrote: > >> HI Jey, >> >> Not much of luck. >> >> If I use the class com.databricks:spark-csv_2. >> 11:1.1.0 or com.databricks.spark.csv_2.11.1.1.0 I get class not found >> error. With com.databricks.spark.csv I don't get the class not found error >> but I still get the previous error even after using file:/// in the URI. >> >> Regards, >> Sourav >> >> On Mon, Jun 29, 2015 at 1:13 PM, Jey Kottalam <j...@cs.berkeley.edu> >> wrote: >> >>> Hi Sourav, >>> >>> The error seems to be caused by the fact that your URL starts with >>> "file://" instead of "file:///". >>> >>> Also, I believe the current version of the package for Spark 1.4 with >>> Scala 2.11 should be "com.databricks:spark-csv_2.11:1.1.0". >>> >>> -Jey >>> >>> On Mon, Jun 29, 2015 at 12:23 PM, Sourav Mazumder < >>> sourav.mazumde...@gmail.com> wrote: >>> >>>> Hi Jey, >>>> >>>> Thanks for your inputs. >>>> >>>> Probably I'm getting error as I'm trying to read a csv file from local >>>> file using com.databricks.spark.csv package. Probably this package has hard >>>> coded dependency on Hadoop as it is trying to read input format from >>>> HadoopRDD. >>>> >>>> Can you please confirm ? >>>> >>>> Here is what I did - >>>> >>>> Ran the spark-shell as >>>> >>>> bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3. >>>> >>>> Then in the shell I ran : >>>> val df = >>>> sqlContext.read.format("com.databricks.spark.csv").load("file://home/biadmin/DataScience/PlutoMN.csv") >>>> >>>> >>>> >>>> Regards, >>>> Sourav >>>> >>>> 15/06/29 15:14:59 INFO spark.SparkContext: Created broadcast 0 from >>>> textFile at CsvRelation.scala:114 >>>> java.lang.RuntimeException: Error in configuring object >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109) >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75) >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) >>>> at >>>> org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190) >>>> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203) >>>> at >>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >>>> at >>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >>>> at scala.Option.getOrElse(Option.scala:120) >>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >>>> at >>>> org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32) >>>> at >>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219) >>>> at >>>> org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217) >>>> at scala.Option.getOrElse(Option.scala:120) >>>> at org.apache.spark.rdd.RDD.partitions(RDD.scala:217) >>>> at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1251) >>>> at >>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) >>>> at >>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) >>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) >>>> at org.apache.spark.rdd.RDD.take(RDD.scala:1246) >>>> at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1286) >>>> at >>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148) >>>> at >>>> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109) >>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:286) >>>> at org.apache.spark.rdd.RDD.first(RDD.scala:1285) >>>> at >>>> com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:114) >>>> at >>>> com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:112) >>>> at >>>> com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:95) >>>> at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:53) >>>> at >>>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:89) >>>> at >>>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:39) >>>> at >>>> com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:27) >>>> at >>>> org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:265) >>>> at >>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114) >>>> at >>>> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104) >>>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19) >>>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24) >>>> at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26) >>>> at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28) >>>> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30) >>>> at $iwC$$iwC$$iwC.<init>(<console>:32) >>>> at $iwC$$iwC.<init>(<console>:34) >>>> at $iwC.<init>(<console>:36) >>>> at <init>(<console>:38) >>>> at .<init>(<console>:42) >>>> at .<clinit>(<console>) >>>> at java.lang.J9VMInternals.initializeImpl(Native Method) >>>> at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) >>>> at .<init>(<console>:7) >>>> at .<clinit>(<console>) >>>> at java.lang.J9VMInternals.initializeImpl(Native Method) >>>> at java.lang.J9VMInternals.initialize(J9VMInternals.java:200) >>>> at $print(<console>) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>>> at java.lang.reflect.Method.invoke(Method.java:611) >>>> at >>>> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065) >>>> at >>>> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338) >>>> at >>>> org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840) >>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871) >>>> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819) >>>> at >>>> org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857) >>>> at >>>> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902) >>>> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814) >>>> at >>>> org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657) >>>> at >>>> org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665) >>>> at org.apache.spark.repl.SparkILoop.org >>>> $apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670) >>>> at >>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997) >>>> at >>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >>>> at >>>> org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945) >>>> at >>>> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135) >>>> at org.apache.spark.repl.SparkILoop.org >>>> $apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945) >>>> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059) >>>> at org.apache.spark.repl.Main$.main(Main.scala:31) >>>> at org.apache.spark.repl.Main.main(Main.scala) >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>>> at java.lang.reflect.Method.invoke(Method.java:611) >>>> at >>>> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664) >>>> at >>>> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169) >>>> at >>>> org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192) >>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111) >>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>>> Caused by: java.lang.reflect.InvocationTargetException >>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>>> at >>>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60) >>>> at >>>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37) >>>> at java.lang.reflect.Method.invoke(Method.java:611) >>>> at >>>> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106) >>>> ... 83 more >>>> >>>> >>>> >>>> On Mon, Jun 29, 2015 at 10:02 AM, Jey Kottalam <j...@cs.berkeley.edu> >>>> wrote: >>>> >>>>> Actually, Hadoop InputFormats can still be used to read and write from >>>>> "file://", "s3n://", and similar schemes. You just won't be able to >>>>> read/write to HDFS without installing Hadoop and setting up an HDFS >>>>> cluster. >>>>> >>>>> To summarize: Sourav, you can use any of the prebuilt packages (i.e. >>>>> anything other than "source code"). >>>>> >>>>> Hope that helps, >>>>> -Jey >>>>> >>>>> On Mon, Jun 29, 2015 at 7:33 AM, ayan guha <guha.a...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi >>>>>> >>>>>> You really donot need hadoop installation. You can dowsload a >>>>>> pre-built version with any hadoop and unzip it and you are good to go. >>>>>> Yes >>>>>> it may complain while launching master and workers, safely ignore them. >>>>>> The >>>>>> only problem is while writing to a directory. Of course you will not be >>>>>> able to use any hadoop inputformat etc. out of the box. >>>>>> >>>>>> ** I am assuming its a learning question :) For production, I would >>>>>> suggest build it from source. >>>>>> >>>>>> If you are using python and need some help, please drop me a note off >>>>>> line. >>>>>> >>>>>> Best >>>>>> Ayan >>>>>> >>>>>> On Tue, Jun 30, 2015 at 12:24 AM, Sourav Mazumder < >>>>>> sourav.mazumde...@gmail.com> wrote: >>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I'm trying to run Spark without Hadoop where the data would be read >>>>>>> and written to local disk. >>>>>>> >>>>>>> For this I have few Questions - >>>>>>> >>>>>>> 1. Which download I need to use ? In the download option I don't see >>>>>>> any binary download which does not need Hadoop. Is the only way to do >>>>>>> this >>>>>>> to download the source code version and compile the same ? >>>>>>> >>>>>>> 2. Which installation/quick start guideline I should use for the >>>>>>> same. So far I didn't see any documentation which specifically addresses >>>>>>> the Spark without Hadoop installation/setup unless I'm missing out one. >>>>>>> >>>>>>> Regards, >>>>>>> Sourav >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Best Regards, >>>>>> Ayan Guha >>>>>> >>>>> >>>>> >>>> >>> >> >