[ https://issues.apache.org/jira/browse/YUNIKORN-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wilfred Spiegelenburg resolved YUNIKORN-3038. --------------------------------------------- Fix Version/s: 1.7.0 Resolution: Duplicate > Nil pointer dereference on GetQueuePath > --------------------------------------- > > Key: YUNIKORN-3038 > URL: https://issues.apache.org/jira/browse/YUNIKORN-3038 > Project: Apache YuniKorn > Issue Type: Bug > Reporter: Thomas Cassaert > Priority: Major > Fix For: 1.7.0 > > > We're observing quite some occurences of following panic: > {code:java} > panic: runtime error: invalid memory address or nil pointer dereference > [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 > pc=0x1b037c9]goroutine 81 [running]: > github.com/apache/yunikorn-core/pkg/scheduler/objects.(*Queue).GetQueuePath(0x0) > > github.com/apache/yunikorn-core@v1.6.1-2/pkg/scheduler/objects/queue.go:548 > +0x29 > github.com/apache/yunikorn-core/pkg/scheduler.(*PartitionContext).removeAllocation(0xc00b4d0b60, > 0xc0135d9a40) > github.com/apache/yunikorn-core@v1.6.1-2/pkg/scheduler/partition.go:1501 > +0xf1c > github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).processAllocationReleases(0xc0000fa080, > {0xc0146546a0, 0x1, 0xc0000a82a0?}, {0xc00645a340, 0x9}) > github.com/apache/yunikorn-core@v1.6.1-2/pkg/scheduler/context.go:780 > +0xa9 > github.com/apache/yunikorn-core/pkg/scheduler.(*ClusterContext).handleRMUpdateAllocationEvent(0xc00b9ebf88?, > 0xc015e3ff08?) > github.com/apache/yunikorn-core@v1.6.1-2/pkg/scheduler/context.go:716 > +0x5d > github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).handleRMEvent(0xc0004002d0) > github.com/apache/yunikorn-core@v1.6.1-2/pkg/scheduler/scheduler.go:133 > +0x18e > created by > github.com/apache/yunikorn-core/pkg/scheduler.(*Scheduler).StartService in > goroutine 1 > github.com/apache/yunikorn-core@v1.6.1-2/pkg/scheduler/scheduler.go:60 > +0x9c{code} > This is always preceded by > {code:java} > 2025-03-03T14:35:07.138Z WARN core.scheduler.partition > scheduler/partition.go:1483 failed to release resources from queue > {"appID": "spark-045e89205af548a6b2661e82fd3a0704", "allocationKey": > "bb7808f0-4a77-4469-a394-86a9de766609", "error": "queue is nil"} {code} > On inspection of the mentioned appID, we do see a queue defined: > {code:java} > annotations: > yunikorn.apache.org/task-groups: > '[{"name":"spark-driver","minMember":1,"minResource":{"cpu":"1","memory":"5120Mi"},"labels":{"deploy_env":"prod","driver-type":"batch","job_id":"j-250303112642423baa5a25458a7b2037","name":"2503031126-d > river","openeo-role":"batch-driver","openeo_component":"batchjobs","queue":"root.default","role":"driver","user_id":"9e001a2a-1186-4b46-8f90-0f44cbcb13a9","version":"3.2.0"}},{"name":"spark-executor","minMember":2,"minResource":{"cpu":"50 > 0.0m","memory":"6920Mi"},"labels":{"deploy_env":"prod","job_id":"j-250303112642423baa5a25458a7b2037","openeo-role":"executor","openeo_component":"batchjobs","queue":"root.default","user_id":"9e001a2a-1186-4b46-8f90-0f44cbcb13a9","version" > :"3.2.0"}}]'{code} > Also, the label `queue: root.default` exists > {*}Yunikorn version{*}: 1.6.1 > {*}Yunikorn config{*}: > {code:java} > queues.yaml: | > partitions: > - name: default > placementrules: > - name: provided > create: true > queues: > - name: root > parent: true > submitacl: "*" > queues: > - name: default > parent: false > properties: > preemption.policy: disabled > - name: cdse-prod > parent: true > queues: > - name: batch > childtemplate: > maxapplications: 2 > properties: > preemption.policy: disabled > service.clusterId: cdse-prod{code} > {*}Kubernetes version{*}: 1.25.7 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@yunikorn.apache.org For additional commands, e-mail: issues-h...@yunikorn.apache.org