alanwake created ZEPPELIN-4946: ---------------------------------- Summary: zeppelin server failed to connect spark interpreter on k8s Key: ZEPPELIN-4946 URL: https://issues.apache.org/jira/browse/ZEPPELIN-4946 Project: Zeppelin Issue Type: Bug Components: zeppelin-server Affects Versions: 0.9.0 Environment: zeppelin:0.9.0
k8s: 1.16.2 spark: spark-py:3.0-2.7 here are my development environment: 1. standalone spark cluster on k8s , master service expose at spark://master-0.spark-master.spark.svc.cluster.local:7077 2. zeppelin is deploy at zeppelin namespace. {code:java} // deployment.yml apiVersion: v1 kind: ConfigMap metadata: name: zeppelin-server-conf-map namespace: zeppelin data: # 'serviceDomain' is a Domain name to use for accessing Zeppelin UI. # Should point IP address of 'zeppelin-server' service. # # Wildcard subdomain need to be point the same IP address to access service inside of Pod (such as SparkUI). # i.e. if service domain is 'local.zeppelin-project.org', DNS configuration should make 'local.zeppelin-project.org' and '*.local.zeppelin-project.org' point the same address. # # Default value is 'local.zeppelin-project.org' while it points 127.0.0.1 and `kubectl port-forward zeppelin-server` will give localhost to connects. # If you have your ingress controller configured to connect to `zeppelin-server` service and have a domain name for it (with wildcard subdomain point the same address), you can replace serviceDomain field with your own domain. #SERVICE_DOMAIN: zeppelin-server.zeppelin.svc.cluster.local:8080 SERVICE_DOMAIN: local.zeppelin-project.org:8080 ZEPPELIN_K8S_SPARK_CONTAINER_IMAGE: spark-py:3.0-2.7 ZEPPELIN_K8S_CONTAINER_IMAGE: apache/zeppelin:0.9.0 ZEPPELIN_HOME: /zeppelin ZEPPELIN_SERVER_RPC_PORTRANGE: 12320:12320 # default value of 'master' property for spark interpreter. #SPARK_MASTER: k8s://https://kubernetes.default.svc SPARK_MASTER: spark://master-0.spark-master.spark.svc.cluster.local:7077 # default value of 'SPARK_HOME' property for spark interpreter. SPARK_HOME: /spark---apiVersion: apps/v1 kind: Deployment metadata: name: zeppelin namespace: zeppelin labels: app: zeppelin spec: replicas: 1 selector: matchLabels: app: zeppelin template: metadata: labels: app: zeppelin spec: nodeSelector: role: worker containers: - name: zeppelin image: apache/zeppelin:0.9.0 securityContext: runAsUser: 0 envFrom: - configMapRef: name: zeppelin-server-conf-map ports: - containerPort: 8080 name: web - containerPort: 12320 name: rpc resources: requests: cpu: 0.2 memory: 200m volumeMounts: - name: podyaml mountPath: /zeppelin/k8s/interpreter volumes: - name: podyaml hostPath: path: /datadisk/nfs/zeppelin/k8s/interpreter/ {code} {code:java} //100-interpreter-spec.yaml here may be a bug -c {{zeppelin.k8s.server.rpc.service}} can not work, it's empty. so i replace it with hard code -c zeppelin-server.zeppelin.svc.cluster.local {code} {code:java} kind: Service apiVersion: v1 metadata: name: zeppelin-server namespace: zeppelin spec: type: NodePort ports: - port: 8080 targetPort: 8080 nodePort: 30080 name: web - port: 12320 name: rpc # port name is referenced in the code. So it shouldn't be changed. selector: app: zeppelin {code} Reporter: alanwake Attachments: 1.txt, 2.txt HELP, Dears! i am new to here and unfamiliar with java projects. the logs show nothing about remote address. {code:java} [root@master zeppelin]# kubectl get pods -n=zeppelin -o=wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES spark-hpvbft 1/1 Running 0 13s 10.244.1.23 node01.51vrk8s.local <none> <none> zeppelin-df54795fb-wddqs 1/1 Running 0 5m50s 10.244.1.22 node01.51vrk8s.local <none> <none> {code} {code:java} [root@master ~]# kubectl logs spark-hpvbft -n=zeppelin INFO [2020-07-10 04:19:37,576] ({FIFOScheduler-interpreter_1257482730-Worker-1} Logging.scala[logInfo]:57) - Initialized BlockManager: BlockManagerId(driver, spark-hpvbft, 36344, None) INFO [2020-07-10 04:19:37,681] ({FIFOScheduler-interpreter_1257482730-Worker-1} ContextHandler.java[doStart]:855) - Started o.s.j.s.ServletContextHandler@69b63f8c{/metrics/json,null,AVAILABLE,@Spark} INFO [2020-07-10 04:19:37,754] ({FIFOScheduler-interpreter_1257482730-Worker-1} BaseSparkScalaInterpreter.scala[spark2CreateContext]:293) - Created Spark session (without Hive support) INFO [2020-07-10 04:19:41,316] ({FIFOScheduler-interpreter_1257482730-Worker-1} SparkShims.java[loadShims]:61) - Initializing shims for Spark 3.x INFO [2020-07-10 04:19:42,727] ({FIFOScheduler-interpreter_1257482730-Worker-1} AbstractScheduler.java[runJob]:152) - Job 20150210-015259_1403135953 finished by scheduler interpreter_1257482730 {code} see details file 1 {code:java} [root@master ~]# kubectl logs zeppelin-df54795fb-wddqs -n=zeppelin INFO [2020-07-10 04:19:34,427] ({SchedulerFactory2} RemoteInterpreter.java[call]:141) - Open RemoteInterpreter org.apache.zeppelin.spark.SparkInterpreter INFO [2020-07-10 04:19:34,427] ({SchedulerFactory2} RemoteInterpreter.java[pushAngularObjectRegistryToRemote]:431) - Push local angular object registry from ZeppelinServer to remote interpreter group spark-shared_process WARN [2020-07-10 04:19:42,736] ({SchedulerFactory2} NotebookServer.java[onStatusChange]:1901) - Job 20150210-015259_1403135953 is finished, status: ERROR, exception: null, result: %text warning: there was one deprecation warning (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation' java.net.ConnectException: Connection refused (Connection refused) at java.net.PlainSocketImpl.socketConnect(Native Method) {code} see details file 2 -- This message was sent by Atlassian Jira (v8.3.4#803005)