[I] Cannot run benchmarks in k8s due to excessive spilling & OOM [datafusion-ray]

via GitHub Wed, 13 Nov 2024 19:31:28 -0800


andygrove opened a new issue, #44:
URL: https://github.com/apache/datafusion-ray/issues/44


   I cannot get benchmarks running in k8s. I suspect that too many tasks are 
being scheduled in parallel.
   
   I added resource constraints in the code:
   
   ```python
   @ray.remote(num_cpus=1)
   def execute_query_stage(
   
   ...
   
   @ray.remote(num_cpus=1)
   def execute_query_partition(
   ```
   
   I am running the benchmark with
   
   ```shell
   RAY_ADDRESS='http://localhost:8265' ray job submit --working-dir `pwd` -- 
python3 tpcbench.py --benchmark tpch --queries 
/home/ray/datafusion-benchmarks/tpch/queries/ --data /mnt/bigdata/tpch/sf100  
--concurrency 4
   ```
   
   My cluster definition is:
   
   ```yaml
   apiVersion: ray.io/v1alpha1
   kind: RayCluster
   metadata:
     name: datafusion-ray-cluster
   spec:
     headGroupSpec:
       rayStartParams:
         num-cpus: "0"
       template:
         spec:
           containers:
             - name: ray-head
               image: andygrove/datafusion-ray-tpch:latest
               imagePullPolicy: Always
               resources:
                 limits:
                   cpu: 2
                   memory: 8Gi
                 requests:
                   cpu: 2
                   memory: 8Gi
               volumeMounts:
                 - mountPath: /mnt/bigdata  # Mount path inside the container
                   name: ray-storage
           volumes:
             - name: ray-storage
               persistentVolumeClaim:
                 claimName: ray-pvc  # Reference the PVC name here
     workerGroupSpecs:
       - replicas: 2
         groupName: "datafusion-ray"
         rayStartParams:
           num-cpus: "4"
         template:
           spec:
             containers:
               - name: ray-worker
                 image: andygrove/datafusion-ray-tpch:latest
                 imagePullPolicy: Always
                 resources:
                   limits:
                     cpu: 5
                     memory: 64Gi
                   requests:
                     cpu: 5
                     memory: 64Gi
                 volumeMounts:
                   - mountPath: /mnt/bigdata
                     name: ray-storage
             volumes:
               - name: ray-storage
                 persistentVolumeClaim:
                   claimName: ray-pvc
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[I] Cannot run benchmarks in k8s due to excessive spilling & OOM [datafusion-ray]

Reply via email to