Nevermind I was doing something dumb On Sun, Aug 20, 2023 at 9:53 PM Kal Stevens <kalgstev...@gmail.com> wrote:
> Are there installation instructions for Spark 3.4.1? > > I defined SPARK_HOME as it describes here > > https://spark.apache.org/docs/latest/api/python/getting_started/install.html > > ls $SPARK_HOME/python/lib > py4j-0.10.9.7-src.zip PY4J_LICENSE.txt pyspark.zip > > > I am getting a class not found error > import org.apache.spark.SparkContext > > I also unzipped those files just in case but that gives the same error. > > > It sounds like this is because pyspark is not installed, but as far as I > can tell it is. > Pyspark is installed in the correct python verison > > > root@namenode:/home/spark/# pip3.10 install pyspark > Requirement already satisfied: pyspark in > /usr/local/lib/python3.10/dist-packages (3.4.1) > Requirement already satisfied: py4j==0.10.9.7 in > /usr/local/lib/python3.10/dist-packages (from pyspark) (0.10.9.7) > > > ____ __ > / __/__ ___ _____/ /__ > _\ \/ _ \/ _ `/ __/ '_/ > /__ / .__/\_,_/_/ /_/\_\ version 3.4.1 > /_/ > > Using Python version 3.10.12 (main, Jun 11 2023 05:26:28) > Spark context Web UI available at http://namenode:4040 > Spark context available as 'sc' (master = yarn, app id = > application_1692452853354_0008). > SparkSession available as 'spark'. > Traceback (most recent call last): > File "/home/spark/real-estate/pullhttp/pull_apartments.py", line 11, in > <module> > import org.apache.spark.SparkContext > ModuleNotFoundError: No module named 'org.apache.spark.SparkContext' > 2023-08-20T19:45:19,242 INFO [Thread-5] spark.SparkContext: SparkContext > is stopping with exitCode 0. > 2023-08-20T19:45:19,246 INFO [Thread-5] server.AbstractConnector: Stopped > Spark@467be156{HTTP/1.1, (http/1.1)}{0.0.0.0:4040} > 2023-08-20T19:45:19,247 INFO [Thread-5] ui.SparkUI: Stopped Spark web UI > at http://namenode:4040 > 2023-08-20T19:45:19,251 INFO [YARN application state monitor] > cluster.YarnClientSchedulerBackend: Interrupting monitor thread > 2023-08-20T19:45:19,260 INFO [Thread-5] > cluster.YarnClientSchedulerBackend: Shutting down all executors > 2023-08-20T19:45:19,260 INFO [dispatcher-CoarseGrainedScheduler] > cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to > shut down > 2023-08-20T19:45:19,263 INFO [Thread-5] > cluster.YarnClientSchedulerBackend: YARN client scheduler backend Stopped > 2023-08-20T19:45:19,267 INFO [dispatcher-event-loop-29] > spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint > stopped! > 2023-08-20T19:45:19,271 INFO [Thread-5] memory.MemoryStore: MemoryStore > cleared > 2023-08-20T19:45:19,271 INFO [Thread-5] storage.BlockManager: > BlockManager stopped > 2023-08-20T19:45:19,275 INFO [Thread-5] storage.BlockManagerMaster: > BlockManagerMaster stopped > 2023-08-20T19:45:19,276 INFO [dispatcher-event-loop-8] > scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: > OutputCommitCoordinator stopped! > 2023-08-20T19:45:19,279 INFO [Thread-5] spark.SparkContext: Successfully > stopped SparkContext > 2023-08-20T19:45:19,687 INFO [shutdown-hook-0] util.ShutdownHookManager: > Shutdown hook called > 2023-08-20T19:45:19,688 INFO [shutdown-hook-0] util.ShutdownHookManager: > Deleting directory > /tmp/spark-9375452d-1989-4df5-9d85-950f751ce034/pyspark-2fcfbc8e-fd40-41f5-bf8d-e4c460332895 > 2023-08-20T19:45:19,689 INFO [shutdown-hook-0] util.ShutdownHookManager: > Deleting directory /tmp/spark-bf6cbc46-ad8b-429a-9d7a-7d98b7d7912e > 2023-08-20T19:45:19,690 INFO [shutdown-hook-0] util.ShutdownHookManager: > Deleting directory /tmp/spark-9375452d-1989-4df5-9d85-950f751ce034 > 2023-08-20T19:45:19,691 INFO [shutdown-hook-0] util.ShutdownHookManager: > Deleting directory /tmp/localPyFiles-6c113b2b-9ac3-45e3-9032-d1c83419aa64 > >