Hi Everyone, I need help on my Airflow DAG which has Spark Submit and Now I have Kubernetes Cluster instead Hortonworks Linux Distributed Spark Cluster.My existing Spark-Submit is through BashOperator as below:
calculation1 = '/usr/hdp/2.6.5.0-292/spark2/bin/spark-submit --conf spark.yarn.maxAppAttempts=1 --conf spark.dynamicAllocation.executorAllocationRatio=1 --conf spark.executor.heartbeatInterval=30s --conf spark.dynamicAllocation.executorIdleTimeout=60s --conf spark.dynamicAllocation.sustainedSchedulerBacklogTimeout=15s --conf spark.network.timeout=800s --conf spark.dynamicAllocation.schedulerBacklogTimeout=15s --conf spark.shuffle.service.enabled=true --conf spark.dynamicAllocation.enabled=true --conf spark.dynamicAllocation.minExecutors=4 --conf spark.dynamicAllocation.initialExecutors=4 --conf spark.dynamicAllocation.maxExecutors=8 --conf "spark.driver.extraJavaOptions=-Djava.util.logging.config.file=/opt/airflow/dags/logging.properties" --executor-cores 4 --executor-memory 8g --driver-memory 12g --master yarn --class com.wkelms.phoenix.incremental.invoice.Calculations /opt/airflow/dags/nextgen-phoenix-incremental-assembly-0.1.jar 1 "Incremental" "/opt/airflow/dags/load_batch_configuration.json"' tCalculateBatch1 = BashOperator( task_id="calculate_batch_1", dag=dag, trigger_rule="all_success", bash_command=calculation1, ) But Now I have Kubernetes Cluster and SparkMaster, SparkWorker, and Airflow are pods, so How it should be written/designed?from airflow-scheduler how can I submit the Spark Job on spark-worker? *Kubernetes Pods are as below* [root@spark-phoenix ~]# kubectl get pods -A NAMESPACE NAME READY STATUS RESTARTS AGE kube-system helm-install-traefik-crd-dn82j 0/1 Completed 0 37d kube-system helm-install-traefik-vrcz8 0/1 Completed 1 37d kube-system local-path-provisioner-5ff76fc89d-mrgzd 1/1 Running 16 37d kube-system coredns-7448499f4d-92xhx 1/1 Running 11 37d airflow airflow-statsd-7586f9998-j29h7 1/1 Running 1 2d10h kube-system metrics-server-86cbb8457f-q9tt2 1/1 Running 11 37d kube-system svclb-traefik-vt9xw 2/2 Running 22 37d airflow airflow-postgresql-0 1/1 Running 1 2d10h kube-system traefik-6b84f7cbc-csffr 1/1 Running 11 37d spark spark-worker-0 1/1 Running 11 37d spark spark-master-0 1/1 Running 11 37d spark spark-worker-1 1/1 Running 11 37d airflow airflow-triggerer-6cc8c54495-w4jzz 1/1 Running 1 2d10h airflow airflow-scheduler-7694ccf55-5r9kw 2/2 Running 2 2d10h airflow airflow-webserver-68655785c7-lmgzg 1/1 Running 0 21h