I use jupyterlab on k8s with minio as s3 storage.
https://github.com/bjornjorgensen/jlpyk8s
With this code to start it all :)
from pyspark import pandas as ps
import re
import numpy as np
import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
from pyspark.sql.types import StructType, StructField, StringType,IntegerType
spark = SparkSession.builder \
.master("k8s://https://kubernetes.default.svc.cluster.local:443") \
.config("spark.kubernetes.container.image",
"bjornjorgensen/spark-py:master-041021") \
.config("spark.kubernetes.authenticate.caCertFile",
"/var/run/secrets/kubernetes.io/serviceaccount/ca.crt") \
.config("spark.kubernetes.authenticate.oauthTokenFile",
"/var/run/secrets/kubernetes.io/serviceaccount/token") \
.config("spark.kubernetes.authenticate.driver.serviceAccountName",
"my-pyspark-notebook") \
.config("spark.executor.instances", "9") \
.config("spark.driver.host",
"my-pyspark-notebook-spark-driver.default.svc.cluster.local") \
.config("spark.driver.port", "29413") \
.config("spark.kubernetes.driver.volumes.persistentVolumeClaim.nfs100.options.claimName",
"nfs100") \
.config("spark.kubernetes.driver.volumes.persistentVolumeClaim.nfs100.mount.path",
"/home/test_files") \
.config("spark.kubernetes.executor.volumes.persistentVolumeClaim.nfs100.options.claimName",
"nfs100") \
.config("spark.kubernetes.executor.volumes.persistentVolumeClaim.nfs100.mount.path",
"/home/test_files") \
.config("spark.kubernetes.driver.volumes.persistentVolumeClaim.nfs100.mount.readOnly",
"False") \
.config("spark.kubernetes.executor.volumes.persistentVolumeClaim.nfs100.mount.readOnly",
"False") \
.config("fs.s3a.access.key", "minio") \
.config("fs.s3a.secret.key", "minio123") \
.config("fs.s3a.endpoint", "http://192.168.1.129:9000") \
.config("spark.hadoop.fs.s3a.impl",
"org.apache.hadoop.fs.s3a.S3AFileSystem") \
.config("spark.hadoop.fs.s3a.path.style.access", "true") \
.appName("Falk") \
.config("spark.sql.repl.eagerEval.enabled", "True") \
.config("spark.driver.memory", "4g") \
.config("spark.executor.memory", "5g") \
.config("spark.executor.cores", "2") \
.config("spark.kubernetes.executor.request.cores", "1800m") \
.config("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MILLIS") \
.getOrCreate()
sc = spark.sparkContext
On 2021/09/30 10:59:54, Mich Talebzadeh <[email protected]> wrote:
> Hi,
>
> This may look like a redundant question but it comes about because of the
> advent of Cloud workstation usage like Amazon workspaces and others.
>
> With IntelliJ you are OK with Spark & Scala. With PyCharm you are fine with
> PySpark and the virtual environment. Mind you as far as I know PyCharm only
> executes spark-submit in local mode. For yarn, one needs to open a
> terminal and submit from there.
>
> However, in Amazon workstation, you get Visual Studio Code
> <https://code.visualstudio.com/> (VSC, an MS product) and openoffice
> installed. With VSC, you get stuff for working with json files but I am not
> sure with a plugin for Python etc, will it be as good as PyCharm? Has
> anyone used VSC in anger for Spark and if so what is the experience?
>
> Thanks
>
>
>
> view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]