The primer for this was the process of developing code for accessing BigQuery data from PyCharm on premises so that advanced analytics and graphics can be done on local.
Writes are an issue as BiqQuery buffers data in a temporary storage on GS bucket before pushing it into BigQuery database One option is to use Dataproc clusters for doing write intensive activities there ($$$) and thereafter do the reads on on-premises (Linux) and on local (assuming you have a powerful enough Windows Box). The issue was more with writes. To make this work believe or not is a bit of art as you need to find the correct versions of Spark plus the correct versions of JAR files to BigQuery that work in tandem Anyhow the read and write to BigQuery work with Spark-3.0.1-bin-hadoop3.2/ and the following two JAR files -rwxr--r-- 1 hduser hadoop 33943429 Jan 12 23:30 spark-bigquery-latest_2.12.jar -rwxr--r-- 1 hduser hadoop 17663298 Jan 13 19:20 gcs-connector-hadoop3-2.2.0-shaded.jar lrwxrwxrwx 1 hduser hadoop 38 Jan 13 19:22 gcs-connector.jar -> gcs-connector-hadoop3-2.2.0-shaded.jar For me the option that worked *was to put these two jar files in directory * *$SPARK_HOME/jars*. Adding them to spark.driver.extraClassPath in $SPARK_HOME/conf/spark-defaults.conf did not work. Using spark-submit on PyCharm terminal with --jars added other issues. So in short I put these two files in $SPARK_HOME/jars and it worked. I am not sure this is ideal but one advantage it has would be to create a container jar file spark-libs.jar jar cv0f spark-libs.jar -C $SPARK_HOME/jars/ . and put it under HDFS directory so all nodes of the cluster can access it. You need to add it to $SPARK_HOME/conf/spark-defaults.conf spark.yarn.archive=hdfs://rhes75:9000/jars/spark-libs.jar If anyone has any suggestions please let me know. Thanks