Hello folks, I am a newbie, and am running Spark on a small Cloudera CDH 5.5.1 cluster at our lab. I am trying to use the PySpark shell for the first time. and am attempting to duplicate the documentation example of creating an RDD which I called "lines" using a text file.
I placed a a text file called Warehouse.java in this HDFS location: [rtaylor@bigdatann ~]$ hadoop fs -ls /user/rtaylor/Spark -rw-r--r-- 3 rtaylor supergroup 1155355 2016-02-28 18:09 /user/rtaylor/Spark/Warehouse.java [rtaylor@bigdatann ~]$ I then invoked sc.textFile()in the PySpark shell.That did not work. See below. Apparently a class is not found? Don't know why that would be the case. Any guidance would be very much appreciated. The Cloudera Manager for the cluster says that Spark is operating in the "green", for whatever that is worth. - Ron Taylor >>> lines = sc.textFile("file:///user/taylor/Spark/Warehouse.java") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/context.py", line 451, in textFile return RDD(self._jsc.textFile(name, minPartitions), self, File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 538, in __call__ File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/pyspark/sql/utils.py", line 36, in deco return f(*a, **kw) File "/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/spark/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 300, in get_return_value py4j.protocol.Py4JJavaError: An error occurred while calling o9.textFile. : java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.rdd.RDDOperationScope$ at org.apache.spark.SparkContext.withScope(SparkContext.scala:709) at org.apache.spark.SparkContext.textFile(SparkContext.scala:825) at org.apache.spark.api.java.JavaSparkContext.textFile(JavaSparkContext.scala:191) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) at py4j.Gateway.invoke(Gateway.java:259) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:207) at java.lang.Thread.run(Thread.java:745) >>>