Hi Jey,
Thanks for your inputs.
Probably I'm getting error as I'm trying to read a csv file from local file
using com.databricks.spark.csv package. Probably this package has hard
coded dependency on Hadoop as it is trying to read input format from
HadoopRDD.
Can you please confirm ?
Here is what I did -
Ran the spark-shell as
bin/spark-shell --packages com.databricks:spark-csv_2.10:1.0.3.
Then in the shell I ran :
val df =
sqlContext.read.format("com.databricks.spark.csv").load("file://home/biadmin/DataScience/PlutoMN.csv")
Regards,
Sourav
15/06/29 15:14:59 INFO spark.SparkContext: Created broadcast 0 from
textFile at CsvRelation.scala:114
java.lang.RuntimeException: Error in configuring object
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:109)
at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:75)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:190)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at
org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1251)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at org.apache.spark.rdd.RDD.take(RDD.scala:1246)
at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1286)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:148)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:109)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:286)
at org.apache.spark.rdd.RDD.first(RDD.scala:1285)
at
com.databricks.spark.csv.CsvRelation.firstLine$lzycompute(CsvRelation.scala:114)
at com.databricks.spark.csv.CsvRelation.firstLine(CsvRelation.scala:112)
at
com.databricks.spark.csv.CsvRelation.inferSchema(CsvRelation.scala:95)
at com.databricks.spark.csv.CsvRelation.<init>(CsvRelation.scala:53)
at
com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:89)
at
com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:39)
at
com.databricks.spark.csv.DefaultSource.createRelation(DefaultSource.scala:27)
at org.apache.spark.sql.sources.ResolvedDataSource$.apply(ddl.scala:265)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:114)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:104)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:19)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:24)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:26)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:28)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:30)
at $iwC$$iwC$$iwC.<init>(<console>:32)
at $iwC$$iwC.<init>(<console>:34)
at $iwC.<init>(<console>:36)
at <init>(<console>:38)
at .<init>(<console>:42)
at .<clinit>(<console>)
at java.lang.J9VMInternals.initializeImpl(Native Method)
at java.lang.J9VMInternals.initialize(J9VMInternals.java:200)
at .<init>(<console>:7)
at .<clinit>(<console>)
at java.lang.J9VMInternals.initializeImpl(Native Method)
at java.lang.J9VMInternals.initialize(J9VMInternals.java:200)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org
$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at
org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org
$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:664)
at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:169)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:111)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:60)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:37)
at java.lang.reflect.Method.invoke(Method.java:611)
at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:106)
... 83 more
On Mon, Jun 29, 2015 at 10:02 AM, Jey Kottalam <[email protected]> wrote:
> Actually, Hadoop InputFormats can still be used to read and write from
> "file://", "s3n://", and similar schemes. You just won't be able to
> read/write to HDFS without installing Hadoop and setting up an HDFS cluster.
>
> To summarize: Sourav, you can use any of the prebuilt packages (i.e.
> anything other than "source code").
>
> Hope that helps,
> -Jey
>
> On Mon, Jun 29, 2015 at 7:33 AM, ayan guha <[email protected]> wrote:
>
>> Hi
>>
>> You really donot need hadoop installation. You can dowsload a pre-built
>> version with any hadoop and unzip it and you are good to go. Yes it may
>> complain while launching master and workers, safely ignore them. The only
>> problem is while writing to a directory. Of course you will not be able to
>> use any hadoop inputformat etc. out of the box.
>>
>> ** I am assuming its a learning question :) For production, I would
>> suggest build it from source.
>>
>> If you are using python and need some help, please drop me a note off
>> line.
>>
>> Best
>> Ayan
>>
>> On Tue, Jun 30, 2015 at 12:24 AM, Sourav Mazumder <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to run Spark without Hadoop where the data would be read and
>>> written to local disk.
>>>
>>> For this I have few Questions -
>>>
>>> 1. Which download I need to use ? In the download option I don't see any
>>> binary download which does not need Hadoop. Is the only way to do this to
>>> download the source code version and compile the same ?
>>>
>>> 2. Which installation/quick start guideline I should use for the same.
>>> So far I didn't see any documentation which specifically addresses the
>>> Spark without Hadoop installation/setup unless I'm missing out one.
>>>
>>> Regards,
>>> Sourav
>>>
>>
>>
>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>