Re: Choice of IDE for Spark

Bjørn Jørgensen Wed, 06 Oct 2021 09:55:22 -0700

I use jupyterlab on k8s with minio as s3 storage. 

https://github.com/bjornjorgensen/jlpyk8s


With this code to start it all :) 

from pyspark import pandas as ps
import re
import numpy as np
import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql.functions import concat, concat_ws, lit, col, trim, expr
from pyspark.sql.types import StructType, StructField, StringType,IntegerType

spark = SparkSession.builder \
    .master("k8s://https://kubernetes.default.svc.cluster.local:443";) \
    .config("spark.kubernetes.container.image", 
"bjornjorgensen/spark-py:master-041021") \
    .config("spark.kubernetes.authenticate.caCertFile", 
"/var/run/secrets/kubernetes.io/serviceaccount/ca.crt") \
    .config("spark.kubernetes.authenticate.oauthTokenFile", 
"/var/run/secrets/kubernetes.io/serviceaccount/token") \
    .config("spark.kubernetes.authenticate.driver.serviceAccountName", 
"my-pyspark-notebook") \
    .config("spark.executor.instances", "9") \
    .config("spark.driver.host", 
"my-pyspark-notebook-spark-driver.default.svc.cluster.local") \
    .config("spark.driver.port", "29413") \
    
.config("spark.kubernetes.driver.volumes.persistentVolumeClaim.nfs100.options.claimName",
 "nfs100") \
    
.config("spark.kubernetes.driver.volumes.persistentVolumeClaim.nfs100.mount.path",
 "/home/test_files") \
    
.config("spark.kubernetes.executor.volumes.persistentVolumeClaim.nfs100.options.claimName",
 "nfs100") \
    
.config("spark.kubernetes.executor.volumes.persistentVolumeClaim.nfs100.mount.path",
 "/home/test_files") \
    
.config("spark.kubernetes.driver.volumes.persistentVolumeClaim.nfs100.mount.readOnly",
 "False") \
    
.config("spark.kubernetes.executor.volumes.persistentVolumeClaim.nfs100.mount.readOnly",
 "False") \
    .config("fs.s3a.access.key", "minio") \
    .config("fs.s3a.secret.key", "minio123") \
    .config("fs.s3a.endpoint", "http://192.168.1.129:9000";) \
    .config("spark.hadoop.fs.s3a.impl", 
"org.apache.hadoop.fs.s3a.S3AFileSystem") \
    .config("spark.hadoop.fs.s3a.path.style.access", "true") \
    .appName("Falk") \
    .config("spark.sql.repl.eagerEval.enabled", "True") \
    .config("spark.driver.memory", "4g") \
    .config("spark.executor.memory", "5g") \
    .config("spark.executor.cores", "2") \
    .config("spark.kubernetes.executor.request.cores", "1800m") \
    .config("spark.sql.parquet.outputTimestampType", "TIMESTAMP_MILLIS") \
    .getOrCreate()
sc = spark.sparkContext



On 2021/09/30 10:59:54, Mich Talebzadeh <[email protected]> wrote: 
> Hi,
> 
> This may look like a redundant question but it comes about because of the
> advent of Cloud workstation usage like Amazon workspaces and others.
> 
> With IntelliJ you are OK with Spark & Scala. With PyCharm you are fine with
> PySpark and the virtual environment. Mind you as far as I know PyCharm only
> executes spark-submit in local mode. For yarn, one needs to open a
> terminal and submit from there.
> 
> However, in Amazon workstation, you get Visual Studio Code
> <https://code.visualstudio.com/> (VSC, an MS product) and openoffice
> installed. With VSC, you get stuff for working with json files but I am not
> sure with a plugin for Python etc, will it be as good as PyCharm? Has
> anyone used VSC in anger for Spark and if so what is the experience?
> 
> Thanks
> 
> 
> 
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
> 
> 
> 
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
> 

---------------------------------------------------------------------
To unsubscribe e-mail: [email protected]

Re: Choice of IDE for Spark

Reply via email to