andygrove opened a new issue, #44: URL: https://github.com/apache/datafusion-ray/issues/44
I cannot get benchmarks running in k8s. I suspect that too many tasks are being scheduled in parallel. I added resource constraints in the code: ```python @ray.remote(num_cpus=1) def execute_query_stage( ... @ray.remote(num_cpus=1) def execute_query_partition( ``` I am running the benchmark with ```shell RAY_ADDRESS='http://localhost:8265' ray job submit --working-dir `pwd` -- python3 tpcbench.py --benchmark tpch --queries /home/ray/datafusion-benchmarks/tpch/queries/ --data /mnt/bigdata/tpch/sf100 --concurrency 4 ``` My cluster definition is: ```yaml apiVersion: ray.io/v1alpha1 kind: RayCluster metadata: name: datafusion-ray-cluster spec: headGroupSpec: rayStartParams: num-cpus: "0" template: spec: containers: - name: ray-head image: andygrove/datafusion-ray-tpch:latest imagePullPolicy: Always resources: limits: cpu: 2 memory: 8Gi requests: cpu: 2 memory: 8Gi volumeMounts: - mountPath: /mnt/bigdata # Mount path inside the container name: ray-storage volumes: - name: ray-storage persistentVolumeClaim: claimName: ray-pvc # Reference the PVC name here workerGroupSpecs: - replicas: 2 groupName: "datafusion-ray" rayStartParams: num-cpus: "4" template: spec: containers: - name: ray-worker image: andygrove/datafusion-ray-tpch:latest imagePullPolicy: Always resources: limits: cpu: 5 memory: 64Gi requests: cpu: 5 memory: 64Gi volumeMounts: - mountPath: /mnt/bigdata name: ray-storage volumes: - name: ray-storage persistentVolumeClaim: claimName: ray-pvc ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
