Are there installation instructions for Spark 3.4.1? I defined SPARK_HOME as it describes here https://spark.apache.org/docs/latest/api/python/getting_started/install.html
ls $SPARK_HOME/python/lib py4j-0.10.9.7-src.zip PY4J_LICENSE.txt pyspark.zip I am getting a class not found error import org.apache.spark.SparkContext I also unzipped those files just in case but that gives the same error. It sounds like this is because pyspark is not installed, but as far as I can tell it is. Pyspark is installed in the correct python verison root@namenode:/home/spark/# pip3.10 install pyspark Requirement already satisfied: pyspark in /usr/local/lib/python3.10/dist-packages (3.4.1) Requirement already satisfied: py4j==0.10.9.7 in /usr/local/lib/python3.10/dist-packages (from pyspark) (0.10.9.7) ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 3.4.1 /_/ Using Python version 3.10.12 (main, Jun 11 2023 05:26:28) Spark context Web UI available at http://namenode:4040 Spark context available as 'sc' (master = yarn, app id = application_1692452853354_0008). SparkSession available as 'spark'. Traceback (most recent call last): File "/home/spark/real-estate/pullhttp/pull_apartments.py", line 11, in <module> import org.apache.spark.SparkContext ModuleNotFoundError: No module named 'org.apache.spark.SparkContext' 2023-08-20T19:45:19,242 INFO [Thread-5] spark.SparkContext: SparkContext is stopping with exitCode 0. 2023-08-20T19:45:19,246 INFO [Thread-5] server.AbstractConnector: Stopped Spark@467be156{HTTP/1.1, (http/1.1)}{0.0.0.0:4040} 2023-08-20T19:45:19,247 INFO [Thread-5] ui.SparkUI: Stopped Spark web UI at http://namenode:4040 2023-08-20T19:45:19,251 INFO [YARN application state monitor] cluster.YarnClientSchedulerBackend: Interrupting monitor thread 2023-08-20T19:45:19,260 INFO [Thread-5] cluster.YarnClientSchedulerBackend: Shutting down all executors 2023-08-20T19:45:19,260 INFO [dispatcher-CoarseGrainedScheduler] cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 2023-08-20T19:45:19,263 INFO [Thread-5] cluster.YarnClientSchedulerBackend: YARN client scheduler backend Stopped 2023-08-20T19:45:19,267 INFO [dispatcher-event-loop-29] spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 2023-08-20T19:45:19,271 INFO [Thread-5] memory.MemoryStore: MemoryStore cleared 2023-08-20T19:45:19,271 INFO [Thread-5] storage.BlockManager: BlockManager stopped 2023-08-20T19:45:19,275 INFO [Thread-5] storage.BlockManagerMaster: BlockManagerMaster stopped 2023-08-20T19:45:19,276 INFO [dispatcher-event-loop-8] scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 2023-08-20T19:45:19,279 INFO [Thread-5] spark.SparkContext: Successfully stopped SparkContext 2023-08-20T19:45:19,687 INFO [shutdown-hook-0] util.ShutdownHookManager: Shutdown hook called 2023-08-20T19:45:19,688 INFO [shutdown-hook-0] util.ShutdownHookManager: Deleting directory /tmp/spark-9375452d-1989-4df5-9d85-950f751ce034/pyspark-2fcfbc8e-fd40-41f5-bf8d-e4c460332895 2023-08-20T19:45:19,689 INFO [shutdown-hook-0] util.ShutdownHookManager: Deleting directory /tmp/spark-bf6cbc46-ad8b-429a-9d7a-7d98b7d7912e 2023-08-20T19:45:19,690 INFO [shutdown-hook-0] util.ShutdownHookManager: Deleting directory /tmp/spark-9375452d-1989-4df5-9d85-950f751ce034 2023-08-20T19:45:19,691 INFO [shutdown-hook-0] util.ShutdownHookManager: Deleting directory /tmp/localPyFiles-6c113b2b-9ac3-45e3-9032-d1c83419aa64